Crossover + Fine-tune Search Space
Related issues: #8 (Distillation, Quantization, and Crossover pipeline).
See also: crossover_strategies.md for strategy semantics.
1. Overview
This document specifies the search space used by scripts/run_crossover_search.py (backed by farm.core.decision.training.crossover_search) to find the Pareto-optimal crossover + fine-tune strategy for Q-network recombination.
The search is a Cartesian product of crossover recipes × fine-tune regimes. Each combination yields one child network that is evaluated with a fixed harness and scored on a primary metric.
2. Primary Metric
primary_metric = min(child_vs_parent_a_agreement, child_vs_parent_b_agreement)
Higher is better: a child that achieves high agreement with both parents simultaneously is a well-blended offspring. Taking the minimum penalises children that collapse to one parent.
Additional metrics (KL divergence, MSE, MAE, cosine similarity, oracle agreement) are stored in the leaderboard for secondary analysis.
A child is flagged degenerate when primary_metric < degenerate_threshold (default 0.0, disabled; set to e.g. 0.3 to flag poor blends).
3. Crossover Knobs
| Knob | Values (default search) | Notes |
|---|---|---|
mode |
random, layer, weighted |
All three strategies explored |
alpha (random) |
0.5 | Probability of selecting from parent A per tensor |
seed (random) |
0, 1, 2 | Controls random tensor selection; three seeds give diversity estimate |
alpha (weighted) |
0.3, 0.5, 0.7 | Linear blend weight; 0.5 = midpoint, 0.3/0.7 = parent-biased |
alpha (layer) |
— | Ignored; layer mode is fully deterministic |
Strategy semantics (summary)
| Strategy | Diversity | Coherence | Key property |
|---|---|---|---|
random |
High | Low | Per-tensor coin flip; high variance across seeds |
layer |
Low | High | Structural blocks preserved; only two possible children per pair |
weighted |
Medium | Medium | Smooth interpolation; predictable, no randomness |
For full semantics see crossover_strategies.md §2.
4. Fine-tune Regimes
Each regime specifies a named hyperparameter set passed to FineTuner (using parent A as the reference teacher with a soft KL-divergence objective).
| Regime name | Epochs | LR | Batch size | Notes |
|---|---|---|---|---|
short |
5 | 1e-3 | 32 | Quick recovery; good baseline |
medium |
10 | 5e-4 | 32 | Balanced; default for minimal grid |
long |
20 | 1e-4 | 32 | Deeper adaptation; captures slower convergence |
lr_high |
5 | 5e-3 | 32 | High-LR exploration; may overshoot on easy tasks |
short_qat |
5 | 1e-4 | 16 | Same epoch budget as short but quantization_applied="ptq_dynamic" (fake-quant fine-tune) |
Reference teacher: parent A (frozen, eval mode).
Loss function: KL divergence with temperature=3.0 (soft distillation, pure soft loss α=1.0).
QAT fine-tuning: set quantization_applied to "ptq_dynamic" / "ptq_static" / "qat_float" in a custom regime to enable QAT-aware fine-tuning (see crossover_strategies.md §8).
5. Pre-defined Search Spaces
5.1 default (14 children = 7 recipes × 2 regimes)
SearchConfig.default()
| # | Crossover recipe | Fine-tune regime |
|---|---|---|
| 1–2 | random, α=0.5, seed=0 | short, long |
| 3–4 | random, α=0.5, seed=1 | short, long |
| 5–6 | random, α=0.5, seed=2 | short, long |
| 7–8 | layer | short, long |
| 9–10 | weighted, α=0.3 | short, long |
| 11–12 | weighted, α=0.5 | short, long |
| 13–14 | weighted, α=0.7 | short, long |
5.2 minimal (9 children = 3 recipes × 3 regimes)
SearchConfig.minimal()
One recipe per crossover mode; three fine-tune regimes. Good for a fast first leaderboard.
| # | Crossover recipe | Fine-tune regime |
|---|---|---|
| 1–3 | random, α=0.5, seed=0 | short, medium, long |
| 4–6 | layer | short, medium, long |
| 7–9 | weighted, α=0.5 | short, medium, long |
5.3 default-qat (21 children) and minimal-qat (9 children)
Python:
SearchConfig.default_with_qat()— same seven crossover recipes as default, but three fine-tune columns:short(float),long(float),short_qat(quantization_applied="ptq_dynamic", weight-only fake quant during fine-tune percrossover_strategies.md§8).SearchConfig.minimal_with_qat()— three recipes × (short,short_qat,long).
CLI: --search-space default-qat or minimal-qat.
Custom mode can add the short_qat preset via --finetune-regimes short_qat.
5.4 Parallel execution (--workers N)
When N > 1, run_crossover_search uses ProcessPoolExecutor: each child runs in a separate process. Parent state dicts and the states array are written under <run-dir>/.crossover_parallel_cache/ and reloaded per worker. Requirements: both parents must be BaseQNetwork with identical architecture. Quantized parent modules are not supported on this path (use --workers 1). CPU is used inside workers for broad compatibility.
Convenience: make crossover-search-smoke runs a two-child minimal search with synthetic states.
5.5 Custom
python scripts/run_crossover_search.py \
--search-space custom \
--crossover-modes random weighted \
--alpha-values 0.3 0.5 0.7 \
--crossover-seeds 0 1 2 \
--finetune-regimes short long \
--max-runs 12
6. Evaluation Harness
Every child is evaluated with the same fixed harness:
| Parameter | Value |
|---|---|
| Evaluator | RecombinationEvaluator (recombination_eval.py) |
| State buffer | Fixed NumPy array (--states-file or synthetic; shared across all children) |
| Metrics | top-1 action agreement, top-k (k=1,2,3), KL divergence, MSE, MAE, cosine similarity, oracle agreement |
| Latency warmup | 3 forward passes |
| Latency repeats | 20 timed passes (median) |
| Baseline | Parent A vs Parent B comparison (informational, not threshold-checked) |
7. Reproducibility
Each child run writes a run_config.json capturing:
{
"child_id": "000_random_a0p50_s0_short",
"crossover": { "mode": "random", "alpha": 0.5, "seed": 0 },
"finetune": {
"regime": "short", "epochs": 5, "lr": 0.001,
"batch_size": 32, "val_fraction": 0.1,
"loss_fn": "kl", "seed": 42,
"quantization_applied": "none"
},
"finetune_metrics": { "..." : "..." },
"torch_version": "2.x.y"
}
Re-running with the same run_config.json parameters, the same state buffer, and the same parent checkpoints produces identical rankings (within floating-point tolerance for deterministic crossover modes). The random crossover mode is fully reproducible given its seed.
8. Budget Guidance
| Grid size | Children | Approx time (CPU, 1000 states) |
|---|---|---|
| Smoke (3 pairs) | 3 | < 2 min |
| Minimal (3×3) | 9 | 5–15 min |
| Minimal-qat (3×3) | 9 | longer (includes QAT fine-tune) |
| Default (7×2) | 14 | 10–25 min |
| Default-qat (7×3) | 21 | longer (includes QAT column) |
| Extended (7×4) | 28 | 20–50 min |
Times are indicative for synthetic states on CPU. Real state buffers (larger N) and longer fine-tune regimes scale proportionally.
Use --max-runs N to cap the total number of children for CI or smoke tests.
9. Interpreting the Leaderboard
rank | child_id | primary | agree_a | agree_b | degenerate
-----|-------------------------|---------|---------|---------|----------
1 | 011_weighted_a0p50_... | 0.8120 | 0.8500 | 0.8120 | False
2 | 010_weighted_a0p30_... | 0.7943 | 0.7943 | 0.8210 | False
...
n+1 | parent_a (baseline) | 0.6500 | 1.0000 | 0.6500 | —
n+1 | parent_b (baseline) | 0.6500 | 0.6500 | 1.0000 | —
primary_metric:min(agree_a, agree_b)— the leaderboard sort key.- Parent baselines (agreement = parent A vs parent B): provide context for the inter-parent diversity. A child that exceeds the baseline on both sides is genuinely blending information from both parents.
- Degenerate flag: set when
primary_metric < degenerate_threshold. Degenerate children should be inspected (possible weight collapse or training instability).
10. Conclusions and Recommended Defaults
Run the search and read recommendation.txt for a data-driven recommendation. Expected findings based on the strategy tradeoffs described in crossover_strategies.md §6:
| Scenario | Recommended default |
|---|---|
| Maximum diversity (population search) | random, α=0.5, 3+ seeds + long fine-tune |
| Structural coherence (stable convergence) | layer + long fine-tune |
| Smooth interpolation (ensembling / warm-start) | weighted, α=0.5 + short fine-tune |
| Unknown parents (default recommendation) | Run minimal grid first; pick top-ranked strategy |
These recommendations are updated by the search output; see runs/crossover_search/recommendation.txt after running the experiment.
11. References
- Implementation:
farm/core/decision/training/crossover_search.py - CLI runner:
scripts/run_crossover_search.py - Crossover operators:
farm/core/decision/training/crossover.py - Fine-tuning pipeline:
farm/core/decision/training/finetune.py - Evaluation harness:
farm/core/decision/training/recombination_eval.py - Strategy semantics:
crossover_strategies.md - Parent epic: Dooders/AgentFarm#8