Baldwinian vs Lamarckian: policy warm-start across three resource regimes
Issue #849 asked for a matched A/B path to quantify when offspring policy warm-starting helps and when it destabilizes intrinsic-evolution runs. This post reports the full matrix: 36 paired simulations, one aggregate comparison, and a clear answer on whether Lamarckian inheritance is worth turning on under the stable-profile regimes we already use elsewhere.
What we tested
Two inheritance modes on the same intrinsic-evolution stack:
- Baldwinian (baseline): offspring inherit the hyperparameter chromosome and start with a fresh decision policy.
- Lamarckian (treatment): same chromosome inheritance, plus a compatible policy warm-start copied from the parent at reproduction.
Everything else is held fixed across arms — see the protocol doc for the full parameter list. Highlights:
| Knob | Value |
|---|---|
| Profiles | conservative, balanced, buffered |
| Seeds | 42, 7, 19, 101, 137, 256 |
| Logged steps | 1000 (200-step warmup) |
| Crossover | off (isolates inheritance mode) |
| Selection pressure | low |
| Speciation | GMM, max_k=4 |
Matrix size: 2 arms × 3 profiles × 6 seeds = 36 runs.
Runner: scripts/run_inheritance_mode_ab.py.
Comparator: scripts/compare_inheritance_arms.py.
Outputs: experiments/inheritance_ab/ (manifest, per-arm sweeps, aggregate
summary and plots).
Headline result
No robust effect in any profile. Under the protocol’s acceptance gate (paired 95% CI excludes zero and sign agreement ≥ 75%), Lamarckian warm-starting does not earn a regime-wide recommendation — not as a win, not as a stability loss, not as a speciation-collapse risk.
| Profile | Verdict |
|---|---|
| conservative | no robust effect |
| balanced | no robust effect |
| buffered | no robust effect |
That is not the same as “nothing happened.” Warm-start executed at scale and paired runs diverged on population counts. The treatment just did not clear a strict small-sample bar on the ecological readouts we score for recommendations.
Mechanism coverage
Lamarckian warm-start was active throughout the Lamarckian arm:
| Profile | Mean success rate | 95% CI | Applied | Skipped |
|---|---|---|---|---|
| conservative | 0.852 | [0.838, 0.867] | 3118 | 538 |
| balanced | 0.849 | [0.836, 0.862] | 3837 | 681 |
| buffered | 0.849 | [0.835, 0.863] | 3773 | 671 |
Every skip was incompatible_state — parent and child policy shapes did not
match at reproduction time, so those offspring fell back to a cold start.
decide_action_failures were zero in both arms across all 36 runs.
Wall-clock: ~5.5 h per arm (~19.7k s Baldwinian, ~20.2k s Lamarckian). All 36 runs completed without error.
Paired deltas (Lamarckian − Baldwinian)
Performance
Population is the primary performance readout. Effects are seed-noisy and profile-dependent:
Conservative — mildly negative on average, mixed by seed:
| Metric | Mean Δ | 95% CI | Sign agreement |
|---|---|---|---|
| population mean | −8.2 | [−19.5, 3.1] | 67% |
| population final | −6.2 | [−21.5, 9.2] | 67% |
Per-seed final population (Baldwinian → Lamarckian): 74→76 (+2), 64→76 (+12), 63→31 (−32), 67→62 (−5), 75→67 (−8), 67→61 (−6). Seed 42 alone accounts for most of the negative mean.
Balanced — the strongest directional signal, still not robust:
| Metric | Mean Δ | 95% CI | Sign agreement |
|---|---|---|---|
| population mean | +6.2 | [−0.4, 12.8] | 83% |
| population final | +11.8 | [−2.1, 25.8] | 83% |
Five of six seeds gained population (+6 to +27 agents). Seed 7 lost nine agents, widening the CI enough to include zero.
Per-seed final population: 97→88 (−9), 72→88 (+16), 67→94 (+27), 60→67 (+7), 67→91 (+24), 74→80 (+6).
Buffered — flat:
| Metric | Mean Δ | 95% CI | Sign agreement |
|---|---|---|---|
| population mean | −3.8 | [−16.8, 9.2] | 50% |
| population final | 0.0 | [−9.0, 9.0] | 67% |
Per-seed final population: 99→98, 88→86, 105→104, 93→102, 74→83, 95→81 — small swings in both directions.
Stability
Startup death rate was 0.0 in every run for both arms, so the stability-loss path never fired. Oscillation amplitude deltas were small and CI-wide in all profiles (conservative +3.3, balanced −2.2, buffered −0.3; none robust).
Diversity
Speciation slope moved slightly positive under conservative (+0.007/100 steps, 83% sign agreement, CI barely excludes zero) but that metric alone does not trigger a recommendation — and the classifier treats negative slope as collapse risk, not positive. Buffered and balanced speciation deltas were essentially flat.
How to read this
Three layers stack on top of each other:
-
The mechanism works. ~85% of reproduction events in the Lamarckian arm successfully copied parent policy weights. Arms are not equivalent at the action-selection layer.
-
Ecological outcomes are a second-order perturbation. Both arms share the same chromosome inheritance path; decisions combine policy probabilities with chromosome action weights multiplicatively. Inherited weights nudge behavior, but population and speciation are emergent, high-variance summaries — especially with only six paired seeds.
-
The verdict gate is conservative. Balanced came closest to a Lamarckian performance win (83% sign agreement, mean +12 final population) but one dissenting seed kept the 95% CI straddling zero. Conservative and buffered never approached a clean call.
Practical takeaway for now: keep Baldwinian as the default. Lamarckian warm-start adds ~2% wall-clock overhead and ~15% cold-start fallbacks without a demonstrated regime-wide payoff on the metrics we care about at this scale.
What shipped (harness)
The experiment path that produced these numbers:
-
IntrinsicEvolutionPolicy.inheritance_mode:baldwinianlamarckian - Reproduction applies warm-start only in Lamarckian mode via
apply_lamarckian_policy_warmstart - Telemetry in run metadata:
policy_inheritance_metrics.lamarckian_warmstart_applied/skipped(renamed to mode-neutralwarmstart_applied/skippedon 2026-06-17 when the P2–P4 variants landed) scripts/run_inheritance_mode_ab.py— orchestrates both armsscripts/compare_inheritance_arms.py— paired-seed deltas and verdictsscripts/run_stable_profile_seed_sweep.pyaccepts--inheritance-mode
To reproduce:
PYTHONHASHSEED=0 python scripts/run_inheritance_mode_ab.py \
--output-dir experiments/inheritance_ab \
--disk-database \
--resume
python scripts/compare_inheritance_arms.py \
--baseline-dir experiments/inheritance_ab/baldwinian \
--baseline-label baldwinian \
--treatment-dir experiments/inheritance_ab/lamarckian \
--arm-labels lamarckian \
--output-dir experiments/inheritance_ab/aggregate
Open questions
- Balanced near-miss: with more seeds or a longer horizon, does the +12 mean final-population delta on balanced consolidate into a robust win?
- Conservative seed sensitivity: is the seed-42 collapse (−32 agents) a genuine Lamarckian failure mode or run noise?
- Mechanism-proximal metrics: offspring fitness in the first N steps after birth may show a Lamarckian advantage even when whole-population summaries do not.
Those are follow-ups, not revisions to this aggregate. At n=6 per profile, the honest read is: warm-start runs, sometimes helps individual seeds, and does not yet justify flipping the default inheritance mode.