Crossover Strategies: Design Note

Related issues: #611 (crossover operators), #612 (child initialization), #613 (this note).

1. Overview

The farm.core.decision.training.crossover module provides three strategies for combining two Q-network state dicts (float or quantized) into a child network. Crossover enables evolutionary or multi-parent search after PTQ / QAT rather than only mixing float weights during training.

All three strategies accept paired state dicts with identical keys and tensor shapes, dequantize any torch.qint8 tensors to float32 before operating, and return a float32 offspring state dict compatible with nn.Module.load_state_dict.

2. Strategies

2.1 `random` — per-tensor selection

For each parameter tensor independently, flip a (possibly biased) coin and take the tensor from parent A (probability alpha) or parent B (probability 1 - alpha).

child_sd = crossover_quantized_state_dict(
    state_dict_a, state_dict_b,
    mode="random",
    seed=42,          # for reproducibility
    alpha=0.5,        # default: 50/50
)

Determinism: fully reproducible given seed or a seeded np.random.Generator.

Edge cases:

alpha=1.0 → exact copy of parent A.
alpha=0.0 → exact copy of parent B.

2.2 `layer` — layer-group alternation

Parameters are first grouped by the first two name segments (for example, network.0, network.1, network.4, network.5, …) and then merged into logical blocks before alternating parents. In the current network layout, this means network.0 + network.1 form block 0, network.4 + network.5 form block 1, network.8 forms block 2, and so on. Even-numbered blocks come from parent A, odd-numbered blocks come from parent B. This keeps each Linear layer’s weight and bias aligned with the following LayerNorm parameters from the same parent, avoiding inconsistent feature scaling.

child_sd = crossover_quantized_state_dict(
    state_dict_a, state_dict_b,
    mode="layer",
)

Determinism: fully deterministic (no RNG).

2.3 `weighted` — parameter-wise averaging

For every aligned tensor compute child = alpha * a + (1 - alpha) * b in float32.

child_sd = crossover_quantized_state_dict(
    state_dict_a, state_dict_b,
    mode="weighted",
    alpha=0.5,  # midpoint blend
)

Determinism: fully deterministic (arithmetic only).

Edge cases:

alpha=1.0 → exact copy of parent A.
alpha=0.0 → exact copy of parent B.

3. High-level API

initialize_child_from_crossover resolves parents (live nn.Module, .pt paths, or state dicts), infers architecture (or uses ChildArchitectureSpec), instantiates a fresh BaseQNetwork or StudentQNetwork, runs crossover_quantized_state_dict, loads float weights with strict=True, and returns the child in eval() mode.

PTQ checkpoints (dynamic only): If <path>.json looks like dynamic PTQ metadata from PostTrainingQuantizer.save_checkpoint (quantization.mode == "dynamic", dtype == "qint8"), the .pt is loaded with load_quantized_checkpoint and dequantized into a plain float state dict (packed dynamic-quant layers are not fed directly into crossover). That load uses full-model unpickling (weights_only=False); you must pass allow_unsafe_unpickle=True for trusted checkpoints only—otherwise resolution raises with guidance to opt in. Static PTQ sidecars are not treated as auto-loadable on this path.

Optional kwargs: auto_load_ptq_checkpoints (default True), architecture=ChildArchitectureSpec(...), network_class=StudentQNetwork when parents are dicts/paths. allow_unsafe_unpickle (default False): must be explicitly set to True for the dynamic PTQ sidecar path above. Also set True for trusted non-PTQ full-model pickles when weights_only=True fails. Never pass True for untrusted checkpoints.

from farm.core.decision.training.crossover import (
    ChildArchitectureSpec,
    initialize_child_from_crossover,
)

child = initialize_child_from_crossover(
    parent_a,          # nn.Module, path, or state dict
    parent_b,
    strategy="weighted",
    alpha=0.7,
)
out = child(state_batch)

4. Experimental Setup

The numbers in Section 5 were produced by:

Parameter	Value
`input_dim`	8
`hidden_size`	64
`output_dim`	4
`seed_a`	0
`seed_b`	1
`state_seed`	42
`n_states`	256
`n_repeats`	20
`alpha`	0.5
Hardware	CPU

Quality reference: parent A’s Q-values (float32) on the fixed 256-state batch.

Metrics:

Metric	Definition
`mean_q_error`	Mean absolute difference between child and reference Q-values, averaged across states and actions
`max_q_error`	Maximum absolute difference across all (state, action) pairs
`action_agreement`	Fraction of states where `argmax` of child Q-values matches `argmax` of reference Q-values (higher = more similar to parent A)
`mean_time_ms`	Mean wall-clock milliseconds for `crossover_quantized_state_dict` + `load_state_dict`, averaged over `n_repeats`

To regenerate:

# From the repository root
source venv/bin/activate
python scripts/benchmark_crossover.py --n-repeats 20 --output-csv reports/crossover_bench.csv

# Or via pytest (slow + ml markers; benchmarks are both)
pytest tests/decision/test_crossover_performance.py -m "ml and slow" -v -s

5. Results

Note: Reference numbers below; re-run scripts/benchmark_crossover.py on your machine.

Latest recorded run: 2026-04-08, python scripts/benchmark_crossover.py --n-repeats 20 from the repository root (Linux, development CPU).

Strategy	Alpha	Time (ms)	Mean Q Err	Max Q Err	Act. Agree
`random`	N/A	0.204	0.8350	3.2098	0.383
`layer`	N/A	0.152	0.8350	3.2098	0.383
`weighted`	0.5	0.324	0.6045	2.7700	0.461

Wall-clock is mean milliseconds for crossover_quantized_state_dict + load_state_dict over 20 repeats. Quality columns are vs parent A on 256 synthetic states (state_seed=42).

Interpreting random vs layer here: For the default benchmark seeds, random and layer matched exactly on mean/max Q error and action agreement (same reported child vs A) while times differed. That can happen for this small untrained net and fixed RNG (per-tensor draws may align with the layer block assignment). It is not a general guarantee—change seeds or use trained parents (distinct parameters in every block, including LayerNorm) and the two strategies usually diverge. weighted at α=0.5 stayed a clear interpolation in this run: lower Q error and higher agreement with A than random/layer.

Older artifact (historical): An earlier doc revision showed layer with zero Q-error vs A; that was tied to untrained LayerNorm defaults (weight=1, bias=0) being identical across parents so some block swaps barely changed outputs. Trained checkpoints avoid that pitfall; always re-run the benchmark after changing parents or seeds.

6. Tradeoff Interpretation

random — maximises offspring diversity: each parameter tensor is independently drawn from either parent, so children can explore a wide range of policy combinations. The downside is high variance: the child’s quality relative to both parents is unpredictable. Good for population-based search where diversity is the objective.

layer — preserves structural coherence: all parameters within a layer block (weight, bias, LayerNorm scale/shift) always come from the same parent. This avoids the representational inconsistency of mixing, say, a weight from parent A’s distribution with a LayerNorm learned for parent B’s activations. Tradeoff: only two possible children per parent pair (up to group-order symmetry), so it produces less diversity than random.

weighted — provides smooth interpolation: at alpha=0.5 the child sits at the arithmetic midpoint of the two parents in weight space. This can smooth sharp features and reduce maximum Q-error relative to both parents, but may also blur distinctive policy structure from either parent. Good for model ensembling or as a warm-start for further training. The child’s quality degrades gracefully as alpha moves away from 0 or 1.

7. Test Coverage

Test file	Markers	Content
`tests/decision/test_crossover.py`	(default)	Correctness / regression: all three modes on synthetic fixtures, edge cases (`alpha=0/1`), quantized inputs, round-trip forward pass, `crossover_checkpoints`, `initialize_child_from_crossover`
`tests/decision/test_crossover_performance.py`	`ml` (+ `slow` for benchmarks)	Fast smokes tagged `ml` (default run); wall-clock + quality benchmarks tagged `ml` and `slow` (`pytest -m "ml and slow"`); diversity + strategy summary

8. Post-Crossover Fine-tuning and QAT

After crossover produces a float32 child, the child is often fine-tuned against a frozen reference model (one of the parents, or the distilled teacher) to recover performance. See farm/core/decision/training/finetune.py and scripts/finetune_child.py.

When to use float fine-tuning vs QAT fine-tuning

Scenario	Recommended mode
Child will be deployed in float32	`quantization_applied="none"` (default)
Child will be converted to int8 via `quantize_dynamic`	`quantization_applied="ptq_dynamic"`
Parents were statically quantized before crossover	`quantization_applied="ptq_static"`
Parents were QAT-trained float checkpoints (pre-convert)	`quantization_applied="qat_float"`

Why this matters: Crossover produces float32 children even when the parents were int8. If the child is fine-tuned in full float32 but then converted to int8 for deployment, the optimiser adapted to float noise instead of quantization noise. QAT-aware fine-tuning inserts WeightOnlyFakeQuantLinear (STE, weight-only, same scope as QATTrainer) so the loss is minimised under the same approximation error as int8 inference.

Recipe for QAT-aware fine-tuning

from farm.core.decision.training.finetune import FineTuningConfig, FineTuner
from farm.core.decision.training.crossover import crossover_quantized_state_dict

# 1. Crossover (parents may have been quantized)
child_sd = crossover_quantized_state_dict(sd_a, sd_b, mode="weighted")
child.load_state_dict(child_sd)

# 2. QAT fine-tune (fake-quant weights during training)
cfg = FineTuningConfig(
    quantization_applied="ptq_dynamic",  # or "ptq_static" / "qat_float"
    epochs=10,
    learning_rate=1e-4,
)
tuner = FineTuner(reference=parent_a, child=child, config=cfg)
metrics = tuner.finetune(states, checkpoint_path="child_qat.pt")

# 3. Convert and save as int8 (PTQ-compatible format)
q_model = tuner.convert()
tuner.save_quantized(q_model, "child_qat_int8.pt")

From the CLI:

python scripts/finetune_child.py \
    --parent-a-ckpt checkpoints/parent_a.pt \
    --parent-b-ckpt checkpoints/parent_b.pt \
    --quantization-applied ptq_dynamic \
    --epochs 10 --lr 1e-4

The script automatically calls convert() + save_quantized() and writes child_finetuned_qat_int8.pt (and companion JSON) alongside the float QAT checkpoint.

Validating the int8 output

Use the existing compare_outputs helper or scripts/validate_quantized.py to check fidelity between the float QAT child and the converted int8 model:

from farm.core.decision.training.quantize_ptq import compare_outputs
result = compare_outputs(tuner._active_child, q_model, states)

See also farm/core/decision/training/quantize_ptq.py (QuantizedValidator) for full JSON reports with fidelity / latency / size sections.

Evaluating the child vs both parents (offline)

Use scripts/validate_recombination.py (library: RecombinationEvaluator, RecombinationReport) to score child vs parent A and child vs parent B on a shared state buffer—top-1 / top-k action agreement, KL, MSE, MAE, cosine on Q-logits—plus optional parent A vs parent B (--include-parent-baseline) and oracle agreement in the report summary. Float roles use BaseQNetwork state-dict checkpoints; for quantized full-model .pt files (same format as validate_quantized.py), pass --parent-a-quantized, --parent-b-quantized, and/or --child-quantized. Quantized roles are CPU-only; the JSON report includes model_formats (schema ≥ 1.1).

Parent epic: Dooders/AgentFarm#8 – Distillation, Quantization, and Crossover pipeline.
Implementation: farm/core/decision/training/finetune.py (FineTuner, FineTuningConfig, QUANTIZATION_APPLIED_MODES).
QAT building blocks: farm/core/decision/training/quantize_qat.py (WeightOnlyFakeQuantLinear, QATTrainer).
Child vs parents evaluation: scripts/validate_recombination.py, farm/core/decision/training/recombination_eval.py.

9. Systematic search (crossover × fine-tune)

To sweep many crossover recipes and fine-tune regimes on the same state buffer (leaderboard + manifest + recommendation), use:

Design / search space: docs/design/crossover_search_space.md — grid definitions, metrics, default / minimal / default-qat / minimal-qat presets.
CLI: scripts/run_crossover_search.py — e.g. --search-space minimal, default-qat (adds a short_qat / ptq_dynamic column), --max-runs, --workers N for process-parallel children (float BaseQNetwork parents only).
Library: farm.core.decision.training.crossover_search.run_crossover_search — num_workers, SearchConfig.default_with_qat(), minimal_with_qat().
Make: make crossover-search-smoke (two children, synthetic states).

Strategy semantics for each mode remain as in §2–§4 above; the search layer only combines those modes with named fine-tune regimes.

10. Full comparison matrix: child vs parents, students, and quantized counterparts

This section documents the evaluation methodology required by Dooders/AgentFarm#8 — how to compare a crossover child against all reference checkpoints using a single state buffer and aligned metrics.

Comparison matrix

Row	Child role	Reference A / B	Purpose
A	float child	float parent A / float parent B	Baseline recombination quality vs originals
B	float child	float student A / float student B	Child vs distilled intermediates
C	float child	int8 student A / int8 student B	Effect of quantization on comparison
D (opt.)	int8 child	float parent A / float parent B	Deployment-aligned parity check

Additionally, Row C includes per-pair quantized-fidelity reports (float student vs int8 student) produced by QuantizedValidator, exposing the latency/size trade-off alongside the agreement drop.

Driver script (recommended)

scripts/run_comparison_matrix.py orchestrates all rows with a single shared state buffer and writes individual JSON reports plus a Markdown/CSV summary:

# Full matrix — rows A, B, C, D (--row-d opt-in)
python scripts/run_comparison_matrix.py \
    --parent-a-ckpt    checkpoints/crossover/parent_A.pt \
    --parent-b-ckpt    checkpoints/crossover/parent_B.pt \
    --student-a-ckpt   checkpoints/distillation/student_A.pt \
    --student-b-ckpt   checkpoints/distillation/student_B.pt \
    --student-a-int8   checkpoints/quantized/student_A_int8.pt \
    --student-b-int8   checkpoints/quantized/student_B_int8.pt \
    --child-ckpt       checkpoints/crossover/child.pt \
    --child-int8       checkpoints/crossover/child_int8.pt \
    --row-d \
    --seed 42 --n-states 1000 \
    --include-parent-baseline \
    --report-dir       reports/comparison_matrix

Architecture flags default to --input-dim 8 --output-dim 4 --hidden-size 64. Use --states-file <path.npy> instead of --seed / --n-states to supply real replay-buffer states.

Minimum viable run (rows A only — only parents + child required):

python scripts/run_comparison_matrix.py \
    --parent-a-ckpt checkpoints/crossover/parent_A.pt \
    --parent-b-ckpt checkpoints/crossover/parent_B.pt \
    --child-ckpt    checkpoints/crossover/child.pt \
    --report-dir    reports/comparison_matrix

Manual per-row commands

If you prefer to run each row individually with the existing specialist scripts, use the same --seed / --n-states across calls so metrics are comparable:

Row A — child vs float parents

python scripts/validate_recombination.py \
    --parent-a-ckpt checkpoints/crossover/parent_A.pt \
    --parent-b-ckpt checkpoints/crossover/parent_B.pt \
    --child-ckpt    checkpoints/crossover/child.pt \
    --include-parent-baseline \
    --seed 42 --n-states 1000 \
    --report-dir    reports/comparison_matrix/row_A

Row B — child vs float students (pass students in reference slots)

python scripts/validate_recombination.py \
    --parent-a-ckpt checkpoints/distillation/student_A.pt \
    --parent-b-ckpt checkpoints/distillation/student_B.pt \
    --child-ckpt    checkpoints/crossover/child.pt \
    --seed 42 --n-states 1000 \
    --report-dir    reports/comparison_matrix/row_B

Note: The JSON schema uses the word “parent” internally; the prose in reports should clarify that reference A = student A, reference B = student B.

Row C — child vs int8 students

python scripts/validate_recombination.py \
    --parent-a-ckpt       checkpoints/quantized/student_A_int8.pt \
    --parent-b-ckpt       checkpoints/quantized/student_B_int8.pt \
    --parent-a-quantized \
    --parent-b-quantized \
    --child-ckpt          checkpoints/crossover/child.pt \
    --seed 42 --n-states 1000 \
    --report-dir          reports/comparison_matrix/row_C

Row C / quantized fidelity — float student vs int8 student (pair A):

python scripts/validate_quantized.py \
    --float-a-ckpt  checkpoints/distillation/student_A.pt \
    --quant-a-ckpt  checkpoints/quantized/student_A_int8.pt \
    --pair A \
    --seed 42 --n-states 1000 \
    --report-dir    reports/comparison_matrix/row_C_fidelity

Row D — quantized child vs float parents (optional)

python scripts/validate_recombination.py \
    --parent-a-ckpt  checkpoints/crossover/parent_A.pt \
    --parent-b-ckpt  checkpoints/crossover/parent_B.pt \
    --child-ckpt     checkpoints/crossover/child_int8.pt \
    --child-quantized \
    --seed 42 --n-states 1000 \
    --report-dir     reports/comparison_matrix/row_D

Aggregating results

After generating the per-row JSON files, produce a combined Markdown/CSV summary:

python scripts/summarise_comparison_matrix.py \
    --report-dir reports/comparison_matrix

Or point at specific files:

python scripts/summarise_comparison_matrix.py \
    --files \
        reports/comparison_matrix/row_A_child_vs_parents.json \
        reports/comparison_matrix/row_B_child_vs_students.json \
        reports/comparison_matrix/row_C_child_vs_int8_students.json

Metrics aligned across rows

Metric	Source	Interpretation
`action_agreement` (top-1)	`RecombinationEvaluator` / `QuantizedValidator`	Fraction of states where child / int8 matches reference
`top_k_agreements`	`RecombinationEvaluator`	Relaxed agreement (child’s top-k includes reference argmax)
`kl_divergence`	Both	KL(ref ‖ child) on Q-logits; lower = more similar
`mse` / `mae`	`RecombinationEvaluator`	Q-value error; lower = more similar
`mean_cosine_similarity`	Both	Cosine on Q-logit vectors; higher = more similar
`oracle_agreement`	`RecombinationEvaluator`	Fraction where child matches ≥ 1 reference
`latency_ratio`	`QuantizedValidator`	Quant / float inference time; > 1 means slower
`size_ratio`	`QuantizedValidator`	Quant / float on-disk size

Interpreting results

Row A ≈ Row B: child tracked parents and students equally well — crossover preserved the distilled policy.
Row A > Row B (child closer to parents): fine-tuning against a parent as reference pulled the child away from the distilled policy.
Row C ≈ Row B: quantization of the student reference barely changes the comparison, so quantization noise is small relative to crossover diversity.
Row C ≪ Row B: quantization dominates the error budget; consider QAT-aware fine-tuning (FineTuner with quantization_applied="ptq_dynamic").
Row D ≈ Row A: the child survives int8 conversion without significant policy shift vs the float parents.

Gaps

No online rollout returns — the comparison uses offline (state-batch) metrics only. For return parity on real tasks, wire checkpoints into the same env + feature pipeline used in training.
Architecture must match across all rows — verify --input-dim, --output-dim, --hidden-size are identical for all checkpoints.

11. References

Implementation: farm/core/decision/training/crossover.py
Training package exports: farm/core/decision/training/__init__.py
Benchmark script: scripts/benchmark_crossover.py
Comparison matrix driver: scripts/run_comparison_matrix.py
Summary aggregator: scripts/summarise_comparison_matrix.py
Related validation patterns: scripts/validate_quantized.py, farm/core/decision/training/quantize_ptq.py