Neural recombination (distillation, PTQ, QAT)
After distilling student Q-networks you can apply 8-bit post-training quantization (PTQ) and, if accuracy is not good enough, quantization-aware training (QAT). Implementation details and PyTorch version notes live in farm/core/decision/training/quantize_ptq.py and farm/core/decision/training/quantize_qat.py.
For the full step-by-step pipeline, see the Neural Recombination Runbook.
Typical flow
-
Distill float students (
student_A.pt/student_B.pt):python scripts/run_distillation.py --help -
PTQ (default: dynamic weight-only
qint8; static mode needs calibration states):python scripts/quantize_distilled.py \ --checkpoint-dir checkpoints/distillation \ --output-dir checkpoints/quantizedFor static PTQ, use
--states-fileor synthetic--n-states/--seed; calibration volume uses--calibration-batchesand--calibration-batch-size(defaults 10 / 64). Match distillation architecture with--input-dim,--output-dim,--parent-hidden(defaults 8, 4, 64). -
Validate float students (optional):
python scripts/validate_distillation.py --help -
Validate quantized vs float (CPU):
python scripts/validate_quantized.py --help. The validator loads quantized checkpoints as full-model pickles, so pass--allow-unsafe-unpickleonly for trusted artifacts. The JSON report includes median/mean/p95 single-sample latency, optional throughput (--throughput-batch-size), memory RSS snapshots, float–quant MSE/KL/top-k agreement, and optional teacher metrics ifparent_*.ptis found under--float-dir/--teacher-diror via--teacher-*-ckpt. -
Evaluate a crossover child vs both parents (offline Q metrics, versioned JSON):
python scripts/validate_recombination.py --help. Baselines: child vs parent A, child vs parent B, optional parent A vs parent B (--include-parent-baseline), plus oracle agreement in the report summary. Use the same--states-file/--seed/--n-statespattern as other validation scripts. For quantized full-model checkpoints (PTQ or post-QATtorch.saveexports), add--parent-a-quantized,--parent-b-quantized, and/or--child-quantizedtogether with--allow-unsafe-unpickle; those roles are loaded withload_quantized_checkpointand run on CPU. -
Search many crossover + fine-tune combinations (leaderboard + manifest):
python scripts/run_crossover_search.py --help. Presets includeminimal/default, plusminimal-qat/default-qat(adds ashort_qat/ptq_dynamicregime). Use--workers Nfor process-parallel children (floatBaseQNetworkparents only). Quick check:make crossover-search-smoke. Design notes: crossover search space, strategy semantics: crossover strategies.
Crossover from PTQ parent paths (Python)
initialize_child_from_crossover can auto-detect a dynamic PTQ sidecar next to a .pt file (same JSON shape as PostTrainingQuantizer.save_checkpoint) and load via load_quantized_checkpoint. That path uses full-model unpickling (weights_only=False); pass allow_unsafe_unpickle=True only for trusted checkpoints. Static PTQ sidecars are not auto-loaded here—use float state dicts or in-memory modules. Details: crossover strategies.
Optional QAT
After PTQ, if action agreement or Q-error is unacceptable: weight-only fake quant on linear layers, same int8 export format as PTQ after convert.
python scripts/qat_distilled.py \
--checkpoint-dir checkpoints/distillation \
--output-dir checkpoints/qat
Use --teacher-a-ckpt / --student-a-ckpt (and *-b-* for pair B) when paths are not under a single --checkpoint-dir; see python scripts/qat_distilled.py --help for epochs, learning rate, and --no-convert (float QAT checkpoint only). Quantized QAT checkpoints work with scripts/validate_quantized.py like PTQ outputs.
Tests
pytest tests/decision/test_ptq.py tests/decision/test_validate_quantized.py tests/decision/test_qat.py tests/decision/test_crossover_search.py