Crossover + Fine-tune Search Space

Related issues: #8 (Distillation, Quantization, and Crossover pipeline).

See also: crossover_strategies.md for strategy semantics.

1. Overview

This document specifies the search space used by scripts/run_crossover_search.py (backed by farm.core.decision.training.crossover_search) to find the Pareto-optimal crossover + fine-tune strategy for Q-network recombination.

The search is a Cartesian product of crossover recipes × fine-tune regimes. Each combination yields one child network that is evaluated with a fixed harness and scored on a primary metric.

2. Primary Metric

primary_metric = min(child_vs_parent_a_agreement, child_vs_parent_b_agreement)

Higher is better: a child that achieves high agreement with both parents simultaneously is a well-blended offspring. Taking the minimum penalises children that collapse to one parent.

Additional metrics (KL divergence, MSE, MAE, cosine similarity, oracle agreement) are stored in the leaderboard for secondary analysis.

A child is flagged degenerate when primary_metric < degenerate_threshold (default 0.0, disabled; set to e.g. 0.3 to flag poor blends).

3. Crossover Knobs

Knob	Values (default search)	Notes
`mode`	`random`, `layer`, `weighted`	All three strategies explored
`alpha` (random)	0.5	Probability of selecting from parent A per tensor
`seed` (random)	0, 1, 2	Controls random tensor selection; three seeds give diversity estimate
`alpha` (weighted)	0.3, 0.5, 0.7	Linear blend weight; 0.5 = midpoint, 0.3/0.7 = parent-biased
`alpha` (layer)	—	Ignored; layer mode is fully deterministic

Strategy semantics (summary)

Strategy	Diversity	Coherence	Key property
`random`	High	Low	Per-tensor coin flip; high variance across seeds
`layer`	Low	High	Structural blocks preserved; only two possible children per pair
`weighted`	Medium	Medium	Smooth interpolation; predictable, no randomness

For full semantics see crossover_strategies.md §2.

4. Fine-tune Regimes

Each regime specifies a named hyperparameter set passed to FineTuner (using parent A as the reference teacher with a soft KL-divergence objective).

Regime name	Epochs	LR	Batch size	Notes
`short`	5	1e-3	32	Quick recovery; good baseline
`medium`	10	5e-4	32	Balanced; default for `minimal` grid
`long`	20	1e-4	32	Deeper adaptation; captures slower convergence
`lr_high`	5	5e-3	32	High-LR exploration; may overshoot on easy tasks
`short_qat`	5	1e-4	16	Same epoch budget as `short` but `quantization_applied="ptq_dynamic"` (fake-quant fine-tune)

Reference teacher: parent A (frozen, eval mode).

Loss function: KL divergence with temperature=3.0 (soft distillation, pure soft loss α=1.0).

QAT fine-tuning: set quantization_applied to "ptq_dynamic" / "ptq_static" / "qat_float" in a custom regime to enable QAT-aware fine-tuning (see crossover_strategies.md §8).

5. Pre-defined Search Spaces

5.1 `default` (14 children = 7 recipes × 2 regimes)

SearchConfig.default()

#	Crossover recipe	Fine-tune regime
1–2	random, α=0.5, seed=0	short, long
3–4	random, α=0.5, seed=1	short, long
5–6	random, α=0.5, seed=2	short, long
7–8	layer	short, long
9–10	weighted, α=0.3	short, long
11–12	weighted, α=0.5	short, long
13–14	weighted, α=0.7	short, long

5.2 `minimal` (9 children = 3 recipes × 3 regimes)

SearchConfig.minimal()

One recipe per crossover mode; three fine-tune regimes. Good for a fast first leaderboard.

#	Crossover recipe	Fine-tune regime
1–3	random, α=0.5, seed=0	short, medium, long
4–6	layer	short, medium, long
7–9	weighted, α=0.5	short, medium, long

5.3 `default-qat` (21 children) and `minimal-qat` (9 children)

Python:

SearchConfig.default_with_qat() — same seven crossover recipes as default, but three fine-tune columns: short (float), long (float), short_qat (quantization_applied="ptq_dynamic", weight-only fake quant during fine-tune per crossover_strategies.md §8).
SearchConfig.minimal_with_qat() — three recipes × (short, short_qat, long).

CLI: --search-space default-qat or minimal-qat.

Custom mode can add the short_qat preset via --finetune-regimes short_qat.

5.4 Parallel execution (`--workers N`)

When N > 1, run_crossover_search uses ProcessPoolExecutor: each child runs in a separate process. Parent state dicts and the states array are written under <run-dir>/.crossover_parallel_cache/ and reloaded per worker. Requirements: both parents must be BaseQNetwork with identical architecture. Quantized parent modules are not supported on this path (use --workers 1). CPU is used inside workers for broad compatibility.

Convenience: make crossover-search-smoke runs a two-child minimal search with synthetic states.

5.5 Custom

python scripts/run_crossover_search.py \
    --search-space custom \
    --crossover-modes random weighted \
    --alpha-values 0.3 0.5 0.7 \
    --crossover-seeds 0 1 2 \
    --finetune-regimes short long \
    --max-runs 12

6. Evaluation Harness

Every child is evaluated with the same fixed harness:

Parameter	Value
Evaluator	`RecombinationEvaluator` (`recombination_eval.py`)
State buffer	Fixed NumPy array (`--states-file` or synthetic; shared across all children)
Metrics	top-1 action agreement, top-k (k=1,2,3), KL divergence, MSE, MAE, cosine similarity, oracle agreement
Latency warmup	3 forward passes
Latency repeats	20 timed passes (median)
Baseline	Parent A vs Parent B comparison (informational, not threshold-checked)

7. Reproducibility

Each child run writes a run_config.json capturing:

{
  "child_id": "000_random_a0p50_s0_short",
  "crossover": { "mode": "random", "alpha": 0.5, "seed": 0 },
  "finetune": {
    "regime": "short", "epochs": 5, "lr": 0.001,
    "batch_size": 32, "val_fraction": 0.1,
    "loss_fn": "kl", "seed": 42,
    "quantization_applied": "none"
  },
  "finetune_metrics": { "..." : "..." },
  "torch_version": "2.x.y"
}

Re-running with the same run_config.json parameters, the same state buffer, and the same parent checkpoints produces identical rankings (within floating-point tolerance for deterministic crossover modes). The random crossover mode is fully reproducible given its seed.

8. Budget Guidance

Grid size	Children	Approx time (CPU, 1000 states)
Smoke (3 pairs)	3	< 2 min
Minimal (3×3)	9	5–15 min
Minimal-qat (3×3)	9	longer (includes QAT fine-tune)
Default (7×2)	14	10–25 min
Default-qat (7×3)	21	longer (includes QAT column)
Extended (7×4)	28	20–50 min

Times are indicative for synthetic states on CPU. Real state buffers (larger N) and longer fine-tune regimes scale proportionally.

Use --max-runs N to cap the total number of children for CI or smoke tests.

9. Interpreting the Leaderboard

rank | child_id                | primary | agree_a | agree_b | degenerate
-----|-------------------------|---------|---------|---------|----------
   1 | 011_weighted_a0p50_...  | 0.8120  | 0.8500  | 0.8120  | False
   2 | 010_weighted_a0p30_...  | 0.7943  | 0.7943  | 0.8210  | False
   ...
 n+1 | parent_a (baseline)    | 0.6500  | 1.0000  | 0.6500  | —
 n+1 | parent_b (baseline)    | 0.6500  | 0.6500  | 1.0000  | —

primary_metric: min(agree_a, agree_b) — the leaderboard sort key.
Parent baselines (agreement = parent A vs parent B): provide context for the inter-parent diversity. A child that exceeds the baseline on both sides is genuinely blending information from both parents.
Degenerate flag: set when primary_metric < degenerate_threshold. Degenerate children should be inspected (possible weight collapse or training instability).

10. Conclusions and Recommended Defaults

Run the search and read recommendation.txt for a data-driven recommendation. Expected findings based on the strategy tradeoffs described in crossover_strategies.md §6:

Scenario	Recommended default
Maximum diversity (population search)	`random`, α=0.5, 3+ seeds + `long` fine-tune
Structural coherence (stable convergence)	`layer` + `long` fine-tune
Smooth interpolation (ensembling / warm-start)	`weighted`, α=0.5 + `short` fine-tune
Unknown parents (default recommendation)	Run `minimal` grid first; pick top-ranked strategy

These recommendations are updated by the search output; see runs/crossover_search/recommendation.txt after running the experiment.

11. References

Implementation: farm/core/decision/training/crossover_search.py
CLI runner: scripts/run_crossover_search.py
Crossover operators: farm/core/decision/training/crossover.py
Fine-tuning pipeline: farm/core/decision/training/finetune.py
Evaluation harness: farm/core/decision/training/recombination_eval.py
Strategy semantics: crossover_strategies.md
Parent epic: Dooders/AgentFarm#8