Crossover Strategies: Design Note
Related issues: #611 (crossover operators), #612 (child initialization), #613 (this note).
1. Overview
The farm.core.decision.training.crossover module provides three strategies for combining two Q-network state dicts (float or quantized) into a child network. Crossover enables evolutionary or multi-parent search after PTQ / QAT rather than only mixing float weights during training.
All three strategies accept paired state dicts with identical keys and tensor shapes, dequantize any torch.qint8 tensors to float32 before operating, and return a float32 offspring state dict compatible with nn.Module.load_state_dict.
2. Strategies
2.1 random — per-tensor selection
For each parameter tensor independently, flip a (possibly biased) coin and take the tensor from parent A (probability alpha) or parent B (probability 1 - alpha).
child_sd = crossover_quantized_state_dict(
state_dict_a, state_dict_b,
mode="random",
seed=42, # for reproducibility
alpha=0.5, # default: 50/50
)
Determinism: fully reproducible given seed or a seeded np.random.Generator.
Edge cases:
alpha=1.0→ exact copy of parent A.alpha=0.0→ exact copy of parent B.
2.2 layer — layer-group alternation
Parameters are first grouped by the first two name segments (for example, network.0, network.1, network.4, network.5, …) and then merged into logical blocks before alternating parents. In the current network layout, this means network.0 + network.1 form block 0, network.4 + network.5 form block 1, network.8 forms block 2, and so on. Even-numbered blocks come from parent A, odd-numbered blocks come from parent B. This keeps each Linear layer’s weight and bias aligned with the following LayerNorm parameters from the same parent, avoiding inconsistent feature scaling.
child_sd = crossover_quantized_state_dict(
state_dict_a, state_dict_b,
mode="layer",
)
Determinism: fully deterministic (no RNG).
2.3 weighted — parameter-wise averaging
For every aligned tensor compute child = alpha * a + (1 - alpha) * b in float32.
child_sd = crossover_quantized_state_dict(
state_dict_a, state_dict_b,
mode="weighted",
alpha=0.5, # midpoint blend
)
Determinism: fully deterministic (arithmetic only).
Edge cases:
alpha=1.0→ exact copy of parent A.alpha=0.0→ exact copy of parent B.
3. High-level API
initialize_child_from_crossover resolves parents (live nn.Module, .pt paths, or state dicts), infers architecture (or uses ChildArchitectureSpec), instantiates a fresh BaseQNetwork or StudentQNetwork, runs crossover_quantized_state_dict, loads float weights with strict=True, and returns the child in eval() mode.
PTQ checkpoints (dynamic only): If <path>.json looks like dynamic PTQ metadata from PostTrainingQuantizer.save_checkpoint (quantization.mode == "dynamic", dtype == "qint8"), the .pt is loaded with load_quantized_checkpoint and dequantized into a plain float state dict (packed dynamic-quant layers are not fed directly into crossover). That load uses full-model unpickling (weights_only=False); you must pass allow_unsafe_unpickle=True for trusted checkpoints only—otherwise resolution raises with guidance to opt in. Static PTQ sidecars are not treated as auto-loadable on this path.
Optional kwargs: auto_load_ptq_checkpoints (default True), architecture=ChildArchitectureSpec(...), network_class=StudentQNetwork when parents are dicts/paths. allow_unsafe_unpickle (default False): must be explicitly set to True for the dynamic PTQ sidecar path above. Also set True for trusted non-PTQ full-model pickles when weights_only=True fails. Never pass True for untrusted checkpoints.
from farm.core.decision.training.crossover import (
ChildArchitectureSpec,
initialize_child_from_crossover,
)
child = initialize_child_from_crossover(
parent_a, # nn.Module, path, or state dict
parent_b,
strategy="weighted",
alpha=0.7,
)
out = child(state_batch)
4. Experimental Setup
The numbers in Section 5 were produced by:
| Parameter | Value |
|---|---|
input_dim |
8 |
hidden_size |
64 |
output_dim |
4 |
seed_a |
0 |
seed_b |
1 |
state_seed |
42 |
n_states |
256 |
n_repeats |
20 |
alpha |
0.5 |
| Hardware | CPU |
Quality reference: parent A’s Q-values (float32) on the fixed 256-state batch.
Metrics:
| Metric | Definition |
|---|---|
mean_q_error |
Mean absolute difference between child and reference Q-values, averaged across states and actions |
max_q_error |
Maximum absolute difference across all (state, action) pairs |
action_agreement |
Fraction of states where argmax of child Q-values matches argmax of reference Q-values (higher = more similar to parent A) |
mean_time_ms |
Mean wall-clock milliseconds for crossover_quantized_state_dict + load_state_dict, averaged over n_repeats |
To regenerate:
# From the repository root
source venv/bin/activate
python scripts/benchmark_crossover.py --n-repeats 20 --output-csv reports/crossover_bench.csv
# Or via pytest (slow + ml markers; benchmarks are both)
pytest tests/decision/test_crossover_performance.py -m "ml and slow" -v -s
5. Results
Note: Reference numbers below; re-run
scripts/benchmark_crossover.pyon your machine.
Latest recorded run: 2026-04-08, python scripts/benchmark_crossover.py --n-repeats 20 from the repository root (Linux, development CPU).
| Strategy | Alpha | Time (ms) | Mean Q Err | Max Q Err | Act. Agree |
|---|---|---|---|---|---|
random |
N/A | 0.204 | 0.8350 | 3.2098 | 0.383 |
layer |
N/A | 0.152 | 0.8350 | 3.2098 | 0.383 |
weighted |
0.5 | 0.324 | 0.6045 | 2.7700 | 0.461 |
Wall-clock is mean milliseconds for crossover_quantized_state_dict + load_state_dict over 20 repeats. Quality columns are vs parent A on 256 synthetic states (state_seed=42).
Interpreting random vs layer here: For the default benchmark seeds, random and layer matched exactly on mean/max Q error and action agreement (same reported child vs A) while times differed. That can happen for this small untrained net and fixed RNG (per-tensor draws may align with the layer block assignment). It is not a general guarantee—change seeds or use trained parents (distinct parameters in every block, including LayerNorm) and the two strategies usually diverge. weighted at α=0.5 stayed a clear interpolation in this run: lower Q error and higher agreement with A than random/layer.
Older artifact (historical): An earlier doc revision showed layer with zero Q-error vs A; that was tied to untrained LayerNorm defaults (weight=1, bias=0) being identical across parents so some block swaps barely changed outputs. Trained checkpoints avoid that pitfall; always re-run the benchmark after changing parents or seeds.
6. Tradeoff Interpretation
random — maximises offspring diversity: each parameter tensor is independently drawn from either parent, so children can explore a wide range of policy combinations. The downside is high variance: the child’s quality relative to both parents is unpredictable. Good for population-based search where diversity is the objective.
layer — preserves structural coherence: all parameters within a layer block (weight, bias, LayerNorm scale/shift) always come from the same parent. This avoids the representational inconsistency of mixing, say, a weight from parent A’s distribution with a LayerNorm learned for parent B’s activations. Tradeoff: only two possible children per parent pair (up to group-order symmetry), so it produces less diversity than random.
weighted — provides smooth interpolation: at alpha=0.5 the child sits at the arithmetic midpoint of the two parents in weight space. This can smooth sharp features and reduce maximum Q-error relative to both parents, but may also blur distinctive policy structure from either parent. Good for model ensembling or as a warm-start for further training. The child’s quality degrades gracefully as alpha moves away from 0 or 1.
7. Test Coverage
| Test file | Markers | Content |
|---|---|---|
tests/decision/test_crossover.py |
(default) | Correctness / regression: all three modes on synthetic fixtures, edge cases (alpha=0/1), quantized inputs, round-trip forward pass, crossover_checkpoints, initialize_child_from_crossover |
tests/decision/test_crossover_performance.py |
ml (+ slow for benchmarks) |
Fast smokes tagged ml (default run); wall-clock + quality benchmarks tagged ml and slow (pytest -m "ml and slow"); diversity + strategy summary |
8. Post-Crossover Fine-tuning and QAT
After crossover produces a float32 child, the child is often fine-tuned against a frozen reference model (one of the parents, or the distilled teacher) to recover performance. See farm/core/decision/training/finetune.py and scripts/finetune_child.py.
When to use float fine-tuning vs QAT fine-tuning
| Scenario | Recommended mode |
|---|---|
| Child will be deployed in float32 | quantization_applied="none" (default) |
Child will be converted to int8 via quantize_dynamic |
quantization_applied="ptq_dynamic" |
| Parents were statically quantized before crossover | quantization_applied="ptq_static" |
| Parents were QAT-trained float checkpoints (pre-convert) | quantization_applied="qat_float" |
Why this matters: Crossover produces float32 children even when the parents were int8. If the child is fine-tuned in full float32 but then converted to int8 for deployment, the optimiser adapted to float noise instead of quantization noise. QAT-aware fine-tuning inserts WeightOnlyFakeQuantLinear (STE, weight-only, same scope as QATTrainer) so the loss is minimised under the same approximation error as int8 inference.
Recipe for QAT-aware fine-tuning
from farm.core.decision.training.finetune import FineTuningConfig, FineTuner
from farm.core.decision.training.crossover import crossover_quantized_state_dict
# 1. Crossover (parents may have been quantized)
child_sd = crossover_quantized_state_dict(sd_a, sd_b, mode="weighted")
child.load_state_dict(child_sd)
# 2. QAT fine-tune (fake-quant weights during training)
cfg = FineTuningConfig(
quantization_applied="ptq_dynamic", # or "ptq_static" / "qat_float"
epochs=10,
learning_rate=1e-4,
)
tuner = FineTuner(reference=parent_a, child=child, config=cfg)
metrics = tuner.finetune(states, checkpoint_path="child_qat.pt")
# 3. Convert and save as int8 (PTQ-compatible format)
q_model = tuner.convert()
tuner.save_quantized(q_model, "child_qat_int8.pt")
From the CLI:
python scripts/finetune_child.py \
--parent-a-ckpt checkpoints/parent_a.pt \
--parent-b-ckpt checkpoints/parent_b.pt \
--quantization-applied ptq_dynamic \
--epochs 10 --lr 1e-4
The script automatically calls convert() + save_quantized() and writes
child_finetuned_qat_int8.pt (and companion JSON) alongside the float QAT
checkpoint.
Validating the int8 output
Use the existing compare_outputs helper or scripts/validate_quantized.py
to check fidelity between the float QAT child and the converted int8 model:
from farm.core.decision.training.quantize_ptq import compare_outputs
result = compare_outputs(tuner._active_child, q_model, states)
See also farm/core/decision/training/quantize_ptq.py (QuantizedValidator) for
full JSON reports with fidelity / latency / size sections.
Evaluating the child vs both parents (offline)
Use scripts/validate_recombination.py (library: RecombinationEvaluator, RecombinationReport) to score child vs parent A and child vs parent B on a shared state buffer—top-1 / top-k action agreement, KL, MSE, MAE, cosine on Q-logits—plus optional parent A vs parent B (--include-parent-baseline) and oracle agreement in the report summary. Float roles use BaseQNetwork state-dict checkpoints; for quantized full-model .pt files (same format as validate_quantized.py), pass --parent-a-quantized, --parent-b-quantized, and/or --child-quantized. Quantized roles are CPU-only; the JSON report includes model_formats (schema ≥ 1.1).
Related issues
- Parent epic: Dooders/AgentFarm#8 – Distillation, Quantization, and Crossover pipeline.
- Implementation:
farm/core/decision/training/finetune.py(FineTuner,FineTuningConfig,QUANTIZATION_APPLIED_MODES). - QAT building blocks:
farm/core/decision/training/quantize_qat.py(WeightOnlyFakeQuantLinear,QATTrainer). - Child vs parents evaluation:
scripts/validate_recombination.py,farm/core/decision/training/recombination_eval.py.
9. Systematic search (crossover × fine-tune)
To sweep many crossover recipes and fine-tune regimes on the same state buffer (leaderboard + manifest + recommendation), use:
- Design / search space:
docs/design/crossover_search_space.md— grid definitions, metrics,default/minimal/default-qat/minimal-qatpresets. - CLI:
scripts/run_crossover_search.py— e.g.--search-space minimal,default-qat(adds ashort_qat/ptq_dynamiccolumn),--max-runs,--workers Nfor process-parallel children (floatBaseQNetworkparents only). - Library:
farm.core.decision.training.crossover_search.run_crossover_search—num_workers,SearchConfig.default_with_qat(),minimal_with_qat(). - Make:
make crossover-search-smoke(two children, synthetic states).
Strategy semantics for each mode remain as in §2–§4 above; the search layer only combines those modes with named fine-tune regimes.
10. Full comparison matrix: child vs parents, students, and quantized counterparts
This section documents the evaluation methodology required by Dooders/AgentFarm#8 — how to compare a crossover child against all reference checkpoints using a single state buffer and aligned metrics.
Comparison matrix
| Row | Child role | Reference A / B | Purpose |
|---|---|---|---|
| A | float child | float parent A / float parent B | Baseline recombination quality vs originals |
| B | float child | float student A / float student B | Child vs distilled intermediates |
| C | float child | int8 student A / int8 student B | Effect of quantization on comparison |
| D (opt.) | int8 child | float parent A / float parent B | Deployment-aligned parity check |
Additionally, Row C includes per-pair quantized-fidelity reports
(float student vs int8 student) produced by QuantizedValidator, exposing the
latency/size trade-off alongside the agreement drop.
Driver script (recommended)
scripts/run_comparison_matrix.py orchestrates all rows with a single shared
state buffer and writes individual JSON reports plus a Markdown/CSV summary:
# Full matrix — rows A, B, C, D (--row-d opt-in)
python scripts/run_comparison_matrix.py \
--parent-a-ckpt checkpoints/crossover/parent_A.pt \
--parent-b-ckpt checkpoints/crossover/parent_B.pt \
--student-a-ckpt checkpoints/distillation/student_A.pt \
--student-b-ckpt checkpoints/distillation/student_B.pt \
--student-a-int8 checkpoints/quantized/student_A_int8.pt \
--student-b-int8 checkpoints/quantized/student_B_int8.pt \
--child-ckpt checkpoints/crossover/child.pt \
--child-int8 checkpoints/crossover/child_int8.pt \
--row-d \
--seed 42 --n-states 1000 \
--include-parent-baseline \
--report-dir reports/comparison_matrix
Architecture flags default to --input-dim 8 --output-dim 4 --hidden-size 64.
Use --states-file <path.npy> instead of --seed / --n-states to supply
real replay-buffer states.
Minimum viable run (rows A only — only parents + child required):
python scripts/run_comparison_matrix.py \
--parent-a-ckpt checkpoints/crossover/parent_A.pt \
--parent-b-ckpt checkpoints/crossover/parent_B.pt \
--child-ckpt checkpoints/crossover/child.pt \
--report-dir reports/comparison_matrix
Manual per-row commands
If you prefer to run each row individually with the existing specialist scripts,
use the same --seed / --n-states across calls so metrics are comparable:
Row A — child vs float parents
python scripts/validate_recombination.py \
--parent-a-ckpt checkpoints/crossover/parent_A.pt \
--parent-b-ckpt checkpoints/crossover/parent_B.pt \
--child-ckpt checkpoints/crossover/child.pt \
--include-parent-baseline \
--seed 42 --n-states 1000 \
--report-dir reports/comparison_matrix/row_A
Row B — child vs float students (pass students in reference slots)
python scripts/validate_recombination.py \
--parent-a-ckpt checkpoints/distillation/student_A.pt \
--parent-b-ckpt checkpoints/distillation/student_B.pt \
--child-ckpt checkpoints/crossover/child.pt \
--seed 42 --n-states 1000 \
--report-dir reports/comparison_matrix/row_B
Note: The JSON schema uses the word “parent” internally; the prose in reports should clarify that reference A = student A, reference B = student B.
Row C — child vs int8 students
python scripts/validate_recombination.py \
--parent-a-ckpt checkpoints/quantized/student_A_int8.pt \
--parent-b-ckpt checkpoints/quantized/student_B_int8.pt \
--parent-a-quantized \
--parent-b-quantized \
--child-ckpt checkpoints/crossover/child.pt \
--seed 42 --n-states 1000 \
--report-dir reports/comparison_matrix/row_C
Row C / quantized fidelity — float student vs int8 student (pair A):
python scripts/validate_quantized.py \
--float-a-ckpt checkpoints/distillation/student_A.pt \
--quant-a-ckpt checkpoints/quantized/student_A_int8.pt \
--pair A \
--seed 42 --n-states 1000 \
--report-dir reports/comparison_matrix/row_C_fidelity
Row D — quantized child vs float parents (optional)
python scripts/validate_recombination.py \
--parent-a-ckpt checkpoints/crossover/parent_A.pt \
--parent-b-ckpt checkpoints/crossover/parent_B.pt \
--child-ckpt checkpoints/crossover/child_int8.pt \
--child-quantized \
--seed 42 --n-states 1000 \
--report-dir reports/comparison_matrix/row_D
Aggregating results
After generating the per-row JSON files, produce a combined Markdown/CSV summary:
python scripts/summarise_comparison_matrix.py \
--report-dir reports/comparison_matrix
Or point at specific files:
python scripts/summarise_comparison_matrix.py \
--files \
reports/comparison_matrix/row_A_child_vs_parents.json \
reports/comparison_matrix/row_B_child_vs_students.json \
reports/comparison_matrix/row_C_child_vs_int8_students.json
Metrics aligned across rows
| Metric | Source | Interpretation |
|---|---|---|
action_agreement (top-1) |
RecombinationEvaluator / QuantizedValidator |
Fraction of states where child / int8 matches reference |
top_k_agreements |
RecombinationEvaluator |
Relaxed agreement (child’s top-k includes reference argmax) |
kl_divergence |
Both | KL(ref ‖ child) on Q-logits; lower = more similar |
mse / mae |
RecombinationEvaluator |
Q-value error; lower = more similar |
mean_cosine_similarity |
Both | Cosine on Q-logit vectors; higher = more similar |
oracle_agreement |
RecombinationEvaluator |
Fraction where child matches ≥ 1 reference |
latency_ratio |
QuantizedValidator |
Quant / float inference time; > 1 means slower |
size_ratio |
QuantizedValidator |
Quant / float on-disk size |
Interpreting results
- Row A ≈ Row B: child tracked parents and students equally well — crossover preserved the distilled policy.
- Row A > Row B (child closer to parents): fine-tuning against a parent as reference pulled the child away from the distilled policy.
- Row C ≈ Row B: quantization of the student reference barely changes the comparison, so quantization noise is small relative to crossover diversity.
- Row C ≪ Row B: quantization dominates the error budget; consider
QAT-aware fine-tuning (
FineTunerwithquantization_applied="ptq_dynamic"). - Row D ≈ Row A: the child survives int8 conversion without significant policy shift vs the float parents.
Gaps
- No online rollout returns — the comparison uses offline (state-batch) metrics only. For return parity on real tasks, wire checkpoints into the same env + feature pipeline used in training.
- Architecture must match across all rows — verify
--input-dim,--output-dim,--hidden-sizeare identical for all checkpoints.
11. References
- Implementation:
farm/core/decision/training/crossover.py - Training package exports:
farm/core/decision/training/__init__.py - Benchmark script:
scripts/benchmark_crossover.py - Comparison matrix driver:
scripts/run_comparison_matrix.py - Summary aggregator:
scripts/summarise_comparison_matrix.py - Related validation patterns:
scripts/validate_quantized.py,farm/core/decision/training/quantize_ptq.py