Replay as Developmental Restructuring: Analysis and Proposals
Date: 2026-05-16
Status: Analysis document — not a registered claim. Proposals feed into experiment design and register maintenance.
Framing
Replay is not memory consolidation with the episodic buffering as source material. It is developmental restructuring: a process that selects from stored experience to reshape the space of futures the agent can access.
From this vantage, the critical questions are not:
- Did the agent remember the right things?
- Did loss decrease after sleep?
But rather:
- Did replay expand the option space that is available at the next waking cycle?
- Did replay differentiate the contexts that the agent can act in?
- Did replay integrate moral residue without either amplifying it into paralysis or erasing it into forgetting?
These are different questions, requiring different metrics, different scheduling policies, and different failure signatures at each developmental stage.
Part 1: Current Replay Architecture Map
1.1 Replay Pathways Currently Implemented
| Mechanism | Claim | Phase | Direction | Status | V3 evidence |
|---|---|---|---|---|---|
| Waking quiescent replay | MECH-092 | Waking rest | Hippocampus-internal | Implemented (V3 prereq) | EXQ-136 FAIL (pair ablation); see §1.3 |
| SWS-analog schema consolidation | SD-017 | Sleep SWS | Hip → Cortex | Methods validated (EXQ-265 PASS, EXQ-265a PASS); not yet functionally tested | EXQ-242 FAIL (ablation study); EXQ-265a PASS (phase 2 methods) |
| REM-analog causal attribution | SD-017 | Sleep REM | Cortex → Hip (fill) | Methods validated; functional test pending | Same as above |
| Balanced replay (harm + benefit) | MECH-203 | Any | Mixed | Implemented and validated | EXQ-256 PASS |
| Surprise-gated replay | MECH-205 | Any | RPE-biased | Implemented after iterations | EXQ-258b PASS (after EXQ-258/258a FAIL) |
| Reverse replay diversity | MECH-165 | Sleep NREM | Reverse + non-dominant | Conflicting results | EXQ-244 FAIL (3/3), EXQ-244a PASS (0/1) |
| Sleep aggregation cluster | MECH-272/273/275/285 | Sleep cycle | All | Designed; NOT yet implemented | — |
| Staleness-weighted replay priority | MECH-285 | Sleep | Staleness-proportional | Designed; NOT yet implemented | — |
1.2 What Replay Currently Does (and Does Not Do)
What the architecture guarantees (from hippocampal_systems.md):
- Replay samples alternative traversals over a FIXED residue field — it does not erase or flatten φ(z)
- Replay does not directly change policy
- Replay explores counterfactual paths, not counterfactual values
- Replay is exploratory, not corrective
- Residue integration and curvature updates occur SEPARATELY (offline sleep consolidation)
What the current implementation does NOT yet do:
- Stage-aware replay scheduling (no developmental stage awareness)
- Coverage-maximising replay during infancy (only RPE-priority or random)
- Staleness-weighted seed selection (MECH-285 not implemented)
- Cross-episode Bayesian aggregation (MECH-275 not implemented)
- Self-model correction during sleep (MECH-273 not implemented)
- State-gated routing between anchor and probe channels (MECH-272 not implemented)
1.3 Replay Experiment PASS/FAIL Pattern Summary
FAIL cluster (concerning):
| Experiment | Claim | Outcome | Notes |
|---|---|---|---|
| EXQ-127 | MECH-030 sleep consolidation pair | FAIL | Basic sleep consolidation pairing failed |
| EXQ-136 | MECH-092 quiescent replay pair | FAIL | Quiescent replay pair ablation failed |
| EXQ-242 | SD-017 sleep phase ablation | FAIL | Sleep phase ablation not discriminative |
| EXQ-385 | INV-049 offline consolidation necessity | FAIL | Critical: general law of offline necessity |
| EXQ-385a | INV-049 offline consolidation necessity | FAIL (3/3) | Replication failure — persistent concern |
| EXQ-430 | INV-010 offline integration necessity | FAIL | Second offline necessity invariant failed |
| EXQ-244 | MECH-165 reverse replay diversity | FAIL (3/3) | Diversity was not discriminative |
| EXQ-214 | ARC-039 entorhinal consolidation | FAIL | Entorhinal consolidation not validated |
| EXQ-240/240a | ARC-038 waking consolidation | FAIL, FAIL | Schema assimilation failing consistently |
PASS cluster (stable):
| Experiment | Claim | Outcome | Notes |
|---|---|---|---|
| EXQ-150 | Q-005 sleep anneal | PASS | Sleep annealing does work |
| EXQ-244a | MECH-165 replay diversity validation | PASS | Diversity validation passed (after FAIL iterations) |
| EXQ-256 | MECH-203 balanced replay | PASS | Harm + benefit balance in replay works |
| EXQ-258b | MECH-205 surprise-gated replay | PASS | After iterating through two FAILs |
| EXQ-265 | SD-017 methods validation (phase 1) | PASS | Phase infrastructure works |
| EXQ-265a | SD-017 methods validation (phase 2) | PASS | Phase 2 infrastructure confirmed |
| EXQ-432 | SD-014 replay gate prioritization | PASS | Replay gate prioritization works |
Pattern interpretation: The pair/ablation designs (EXQ-127, EXQ-136, EXQ-242, EXQ-385, EXQ-385a, EXQ-430) are failing consistently while the mechanism-specific probes (EXQ-256, EXQ-258b, EXQ-265) are passing. This suggests the underlying replay mechanisms exist and can be isolated, but the system-level benefit of offline integration is not yet robustly measurable. Most likely reason: the current substrate does not generate enough transition diversity during waking for sleep to show a discriminative advantage. Sleep can only reorganize what waking produced.
Part 2: Literature Evidence Mapping
2.1 Sources from developmental_metrics.md (existing lit-pulls 2026-05-16)
| Source | Finding | REE relevance | DEV-NEED IDs | Claim IDs | Confidence | Classification |
|---|---|---|---|---|---|---|
| Shin et al. (Nature Commun 2025) | Post-task replay biased toward RPE signals; RPE-biased replay predicts subsequent behaviour | Replay quality = RPE-bias selection, not recency or novelty; consolidation gate needs RPE-priority score | DEV-NEED-007 | INV-056, ARC-048 | High | refines existing claim (MECH-205, MECH-285) |
| Joo & Frank (Science 2024) | SWR incidence increases with novelty and harm salience; salient trials over-represented in offline replay | Harm salience is a natural SWR-gating factor; use harm-salience-weighted replay priority | DEV-NEED-007, DEV-NEED-025 | INV-056, MECH-159 | High | refines existing claim (MECH-285) |
2.2 Sources from infant_substrate_expansion.md lit-pulls (2026-05-16)
| Source | Finding | REE relevance | DEV-NEED IDs | Claim IDs | Confidence | Classification |
|---|---|---|---|---|---|---|
| Ventura et al. (2024) CA3 flexibility | Environmental enrichment → greater CA3 spatial tuning and contextual remapping; flat environments → poor representational differentiation | Environment structure during infant exploration determines downstream hippocampal representational flexibility and thus replay utility | DEV-NEED-003, DEV-NEED-007 | ARC-065, ARC-007 | 0.65 | suggests implementation metric (replay_coverage_per_zone) |
| Parker-Holder et al. (2020 NeurIPS DvD) | Apparent diversity (pairwise distance) != effective diversity (volume); population can be spread but degenerate | Current V3 replay diversity metrics may be measuring apparent diversity; need volumetric trajectory coverage | DEV-NEED-005, DEV-NEED-007 | Q-046, ARC-065 | 0.78 | refines existing claim (replay_diversity_index must use volumetric metric) |
| Doupe & Kuhl (1999 Annu Rev Neurosci) | BG-driven motor variability necessary for repertoire formation; pharmacological silencing = monostrategy | If waking exploration is monostrategy, replay has only one trajectory class to replay; replay cannot introduce diversity that was never experienced | DEV-NEED-001, DEV-NEED-005 | ARC-065, MECH-309 | 0.78 | supports existing claim (replay diversity is bounded by waking diversity) |
| Griffin et al. (2026) Brain Sci | Phase 1 (reward-free Hebbian) builds action-perception assemblies; Phase 2 (dopamine) consolidates subset; Phase 1 MUST precede Phase 2 | Two-phase infant design: reward-free exploration phase must precede replay-consolidated goal training | DEV-NEED-001, DEV-NEED-007 | ARC-065, INV-073 | 0.76 | suggests missing claim (Phase 0 reward-free = replay has nothing to collapse on yet) |
2.3 Sources from lit-pull agents (2026-05-16 parallel searches)
2.3a Hippocampal Replay Development, Sleep/Infant, Diversity, Trauma
| Source | Finding | REE relevance | DEV-NEED IDs | Claim IDs | Confidence | Classification |
|---|---|---|---|---|---|---|
| Buhl & Buzsaki (2005 Neuroscience 132:843, PMID 16039793) | SWRs with high-frequency ripple component do not emerge until end of postnatal week 2 (P12-P20 rat); early “proto-SWR” sharp waves lack the ripple component; emergence coincides with GABA developmental switch | Replay-capable consolidation requires a substrate that matures gradually. A cold-start AI system cannot be expected to show generalizing replay before its gating/inhibitory architecture is mature. Grounds DEV-NEED-029 | DEV-NEED-007, DEV-NEED-029 | SD-017, MECH-092, INV-055 | High | supports existing claim; suggests missing claim (replay hardware maturation gate) |
| Noguchi, Matsumoto & Ikegaya (2023 J Neurosci 43:6126, PMID 37400254) | Intracellular Vm dynamics during SWRs reach adult-like precision only by P30 in mice; before this, prolonged depolarizations without the pre/post-SWR inhibitory flanking that enables temporally precise sequence compression | Replay quality (sequence fidelity, compression) matures on a separate timeline from replay occurrence. Early replay may be noisy and poorly compressed — a distinct infant property, not a failure | DEV-NEED-007 | MECH-092, SD-017, INV-055 | High | suggests missing claim (replay sequence fidelity as developmental metric) |
| Seehagen, Konrad, Herbert & Schneider (2015 PNAS 112:1625) | Infants aged 6 and 12 months who napped within 4 hours of declarative learning retained at 4h and 24h delays; non-nappers failed. Single nap >= 30 min sufficient. First experimental causal demonstration in year-1 infants | Offline integration in infancy is causal and time-sensitive. Consolidation benefit degrades rapidly if offline phase delayed past ~4 hours. Constrains when replay should be triggered after experience | DEV-NEED-007 | INV-010, ARC-011, SD-017 | High | supports existing claim; suggests implementation constraint (replay trigger window) |
| Horváth, Plunkett & Csibra (2016 Current Biology 27:R745) | Neonates enter sleep via REM (not NREM); REM/NREM ratio inverted (50-80% REM in newborns vs ~20% adults). Sleep spindles — NREM marker of hippocampal-to-neocortical transfer — absent at birth and emerge in first months | Developmental arc of sleep architecture maps onto consolidation capability. Infant sleep is REM-dominant; the SWS-analog pass may be minimal in infancy and become load-bearing only as spindles emerge | DEV-NEED-007 | SD-017, MECH-092, INV-055 | Medium-High | suggests missing claim (stage-indexed SWS:REM ratio) |
| Stickgold & Walker (2013 Nature Neurosci 16:139, PMID 23354387) | Sleep performs “memory triage”: selectively preserving schema-consistent, emotionally salient, or future-relevant traces while discarding mundane detail. Outcome is compressed generalisation, not faithful storage | Replay should not aim for lossless replication. The selective/triage function is a feature: sleep builds compressed, generalised representations. Validates replay as filter, not recorder | DEV-NEED-007 | MECH-030, SD-017 | High | supports existing claim (sleep is restructuring, not mere consolidation) |
| Werne, Chadwick & Series (2026 PLOS Comput Biol DOI:10.1371/journal.pcbi.1013251) | Computational model: sleep replay inherently increases fear generalisation even when original memory is accurate; multi-context extinction suppresses renewal more than single-context. Sleep-dependent SHY prevents fear sensitization; disruption causes pathological accumulation | Consolidation via replay is not neutral — it systematically broadens associations. Replay must be constrained (hypothesis-tagged, per MECH-094) to prevent harm-valence overgeneralisation. Disrupted offline phase → fear consolidates unopposed | DEV-NEED-007, DEV-NEED-018 | MECH-094, MECH-124, MECH-018 | Medium | supports existing claim (MECH-094 tag necessity); suggests experiment (tag-loss replay vs tagged replay) |
| Schapiro et al. (2017 Phil Trans R Soc B 372:20160049, PMID 27872368) | Hippocampal replay interleaves episodic events to transfer structured statistical regularities to neocortex via CLS. Dense, diverse replay required; biased or sparse replay fails to extract cross-episode generalisation | Diverse replay (covering many past states, not just recent or high-reward) is necessary for learning transferable ethical principles rather than situation-specific rules. Directly grounds play+real interleaved replay proposal | DEV-NEED-007, DEV-NEED-011 | ARC-011, MECH-194, MECH-195, INV-060 | High | supports existing claim; suggests experiment (EXQ-IDEV-002 interleaved replay) |
| Roscow, Chua, Costa, Jones & Lepora (2021 Trends Neurosci 44:816, PMID 34481635) | Biological replay prioritizes novelty, reward-boundary events, and reverse sequences — NOT uniform random sampling. This prioritisation substantially improves generalisation over uniform replay | Replay scheduler should weight harm-boundary events and surprising transitions. Uniform replay is a baseline. Supports MECH-205 (surprise-gated) + MECH-285 (staleness-priority) design | DEV-NEED-007 | MECH-205, MECH-285, MECH-165 | High | refines existing claim (priority design is neurobiologically grounded) |
| Cai, Mednick et al. (2009 PNAS 106:10130, PMID 19506253) | REM sleep (not NREM or quiet wakefulness) specifically improves Remote Associates Test (creative/relational generalisation). REM primes associative networks enabling cross-domain transfer | Different sleep stages contribute qualitatively different types of generalisation. SD-017’s SWS/REM split should produce different generalisation effects — schema formation (SWS) vs relational binding (REM) | DEV-NEED-009, DEV-NEED-011 | SD-017, MECH-030 | High | refines existing claim (SWS/REM functional specialisation); suggests experiment (SWS-only vs REM-only on strategy transfer) |
| Pace-Schott, Seo & Bottary (2022 Neurobiology of Stress 17, PMID 36545012) | Post-extinction REM sleep predicts superior extinction recall. Post-trauma REM fragmentation in acute phase (days-weeks) = strongest PTSD predictor. Hyperarousal disrupts REM-dependent safety-memory consolidation; fear consolidates unopposed | Disrupted offline phases after aversive events fail to consolidate extinction and over-consolidate fear. MECH-094 tag loss = PTSD mechanism is neurobiologically anchored here | DEV-NEED-007, DEV-NEED-018 | MECH-094, MECH-124, ARC-011 | High | supports existing claim (MECH-094 tag loss clinical grounding) |
2.3b Offline RL Diversity, Replay Planning, Developmental Replay, Monostrategy
| Source | Finding | REE relevance | DEV-NEED IDs | Claim IDs | Confidence | Classification |
|---|---|---|---|---|---|---|
| Zha et al. (2024 arXiv:2410.20487) | Diversity-based replay (DBER) outperforms prioritised replay (PER) in sparse-reward environments; diverse trajectory sampling yields better policy coverage across MuJoCo, Atari, and vision navigation | In infant-phase agents with sparse reward signals, diversity-first replay outperforms error-prioritised replay at bootstrapping generalisable behaviour. Grounds coverage-priority infant policy (§6.2) | DEV-NEED-005, DEV-NEED-007 | ARC-065, MECH-285 | High | supports existing claim; grounds stage-indexed scheduler |
| Adaptive Replay Buffer (2024 arXiv:2512.10510) | Large buffers improve diversity but introduce staleness bias as policy drifts; on-policyness weighting bridges offline richness and online freshness | Developmental agent faces same staleness problem; on-policyness metric is a tractable proxy for “how relevant is this memory now” — relevant to MECH-285 staleness mechanism | DEV-NEED-007 | MECH-285 | Medium | refines existing claim (staleness metric design) |
| Shin, Tang & Jadhav (2019 Neuron 104(6), PMID 31677957) | Directional shift in replay: early learning dominated by reverse (retrospective) replay for credit assignment; late learning dominated by forward (prospective) replay for planning. PFC reads out prospective sequences to guide choices | Developmental AI should transition replay use from consolidation-dominant (retrospective) to planning-dominant (prospective) as competence increases. This is a staging signal, not a parameter | DEV-NEED-007, DEV-NEED-008 | ARC-007, MECH-030, SD-017 | High | suggests missing claim (prospective vs retrospective replay scheduling across stages) |
| Ólafsdóttir et al. (2018 PNAS 115(10), PMID 29483254) | Prospective replay is spatially focused near current position and behaviourally predictive; retrospective replay is more diffuse. Two functionally dissociable modes. Disruption of prospective events impairs upcoming choice accuracy | Two distinct replay modes: prospective (planning at decision points) and retrospective (consolidation). Stage differences: infancy = retrospective dominant; adult = prospective dominant | DEV-NEED-007, DEV-NEED-019 | ARC-007, MECH-030, ARC-018 | High | suggests missing claim (dual-mode replay scheduling; prospective mode for planning gate) |
| Foster & Wilson (2006 Nature 440:680); Karlsson & Frank (2009) | Awake forward replay during pauses at decision points represents upcoming trajectories; remote replay of never-directly-visited paths occurs — combinatorial simulation beyond simple memory playback | Replay is generative, not only reproductive: it can compose novel paths. This property is necessary for planning and supports the MECH-092 waking quiescent replay role in prospective simulation | DEV-NEED-007 | MECH-092, ARC-007, ARC-018 | High | supports existing claim (ARC-018 prospective rollout grounding) |
| Vöröslakos et al. (2023 J Neurosci 43:6126, PMID 37400254) | SWR-associated membrane potential dynamics mature by P30 in rodents; before P30, ripples occur but sequences are poorly ordered (insufficient inhibitory flanking) | Stage where replay infrastructure exists but sequence fidelity is low = infant phase. Disordered replay early is expected, not pathological. INV-049 FAILs may partly reflect this — substrate not yet mature | DEV-NEED-007, DEV-NEED-029 | SD-017, MECH-092, INV-049 | High | refines existing claim; suggests why INV-049 FAILs are substrate, not theory failures |
| Peiffer et al. (2020 Sci Rep 10:9979, PMID 32561803) | Children (7-12y) show better sleep-dependent declarative consolidation than adults; 25-35% SWS vs 15-20% adults. Hippocampal-to-prefrontal transfer mechanism more efficient early in development | Peak replay bandwidth (SWA proportion) occurs in middle childhood, not infancy. More offline replay during childhood produces disproportionate consolidation gain — supports childhood’s higher strategy-abstraction role | DEV-NEED-009, DEV-NEED-011 | ARC-011, SD-017, INV-060 | High | supports existing claim; suggests childhood sleep:wake ratio > infancy for strategy abstraction |
| Lu et al. (2019 Nat Commun 10:1779) | Without MEC inputs, CA1 replay becomes rigid/stereotyped (less diverse, same sequences repeatedly). MEC lesion blocks reversal learning. Diversity loss = flexibility loss | Direct mechanistic evidence: replay diversity is upstream of behavioural flexibility. Homogenised replay buffer = monostrategy-locked agent. Grounds MECH-124 prevention as primary gate | DEV-NEED-005, DEV-NEED-007 | MECH-124, MECH-285, MECH-165 | High | supports existing claim; grounds monostrategy_prevention_score as mandatory metric |
| Ego-Stengel & Wilson (2010 Hippocampus 20:1, PMID 19816984) | Disrupting SWRs during rest impairs subsequent spatial learning; disruption destabilises acquired strategy (behavioural noise rather than stable strategy) | SWR-mediated replay necessary for both learning AND stabilising what has been learned. Insufficient replay = noise; biased replay = lock-in. Directly grounds SD-017 necessity | DEV-NEED-007 | MECH-092, SD-017, INV-049 | High | supports existing claim (SD-017 necessity) |
Part 3: Does Replay Currently Preserve, Increase, or Collapse Diversity?
3.1 Architectural Intent vs Implementation Reality
Intended: The architecture specifies replay as “exploratory, not corrective” and guarantees replay does not flatten φ(z). The constraint is correct.
Current implementation risk:
- Collapse toward salience: Without MECH-285 (staleness-weighted priority), replay defaults to RPE-priority or recency. Both select against low-salience, low-RPE trajectories — exactly the content that diversifies the option space.
- Cold-start zero-differential: DEV-NEED-029 / EXQ-573 confirmed that diversity mechanisms produce zero differential on cold-start substrate. During infancy, the ResidueField, EWMA, and E3 score variance are all near-zero — so every replay policy produces the same result.
- Content poverty from infant compression: The six compressions identified in infant_substrate_expansion.md (§2) mean the infant replay buffer contains only termination- adjacent and unremarkable mid-episode sequences. Sleep cannot reorganize content that was never created.
- Context undifferentiation (MECH-153 failure): Without SD-017 properly functional, context cosine_sim → 1.0. Sleep cannot improve context attribution if all contexts look the same.
- Forward-only replay bias: MECH-165 reverse replay is intended to extend the effective reach of replay into non-dominant trajectories. Its FAIL pattern (EXQ-244 FAIL 3/3) before a later validation pass suggests it is fragile or substrate-dependent.
3.2 Assessment Per Function
| Function | Current state | Evidence | Assessment |
|---|---|---|---|
| Preserves diversity | Architecture guarantees (no φ-erasure) | Correct in design | Structurally preserved |
| Increases diversity | Not yet demonstrated | EXQ-385/385a FAIL; INV-049 twice failed | Not yet demonstrated; substrate gap |
| Collapses diversity | Risk when salience-priority dominates | MECH-124 failure mode; cold-start problem | Real risk, not yet a confirmed failure |
| Stabilises residue | Shallow; consolidation delta low | DEV-NEED-007 metric not yet measured | Partially working; full cluster not implemented |
| Improves developmental transitions | Not yet testable | No developmental ablation exists | Gap — no experiment tests this |
| Supports infant-to-child progression | Not yet testable | No warm-start gate exists | Gap — DEV-NEED-029 is PROPOSED |
| Supports play-to-real transfer | Predicted by Schapiro 2017 | No experiment; MECH-203 balanced replay passed | Promising architecture; untested |
Part 4: Replay Bottlenecks
4.1 Structural Bottlenecks
B1. Content poverty from infant substrate compression
The infant substrate (CausalGridWorldV2 with default parameters) generates a replay buffer dominated by: (a) termination-adjacent sequences (high salience, low diversity), and (b) unremarkable mid-episode sequences (low salience, high frequency). This is not a replay scheduler failure — the content was never created during waking.
- Root cause: binary harm, homogeneous geography, uniform action consequences
- Fix: graduated harm zones, microhabitat zones, multi-resource heterogeneity (infant_substrate_expansion.md §5)
B2. No developmental stage awareness in replay scheduler
The same replay policy (RPE-priority + recency) runs at every developmental stage. During infancy, when the agent needs COVERAGE, RPE-priority selects against exploration. During adulthood, when the agent needs INTEGRATION, coverage-priority may replay irrelevant early experiences.
- Root cause: no curriculum parameter governing replay scheduling per stage
- Fix: stage-indexed replay scheduler (see §6.1)
B3. Cold-start warm-start failure (DEV-NEED-029)
MECH-313 (noise floor), MECH-314a (novelty bias), and MECH-320 (tonic vigor) all require warm substrate. But the infant stage IS cold-start. The mechanisms designed to prevent monostrategy cannot activate until the substrate is warm, which requires passing through infancy first — a bootstrapping problem.
- Root cause: diversity mechanisms designed for post-infant agent
- Fix: v_t_floor as cold-start proxy for MECH-320; coverage-maximising replay as bootstrap for MECH-285
B4. Replay homogenisation risk (MECH-124)
Replay dominated by high-salience trajectories (high harm, high RPE) amplifies the residue in those regions while neglecting low-salience trajectories that represent viable alternatives. Over many sleep cycles, the option space contracts toward harm-avoidance strategies.
- Root cause: salience/RPE priority without staleness correction or coverage floor
- Fix: MECH-285 staleness-weighted priority; coverage floor constraint on replay scheduler
B5. Context undifferentiation blocking attribution
SD-017 is the minimal infrastructure that makes MECH-092 useful. But SD-017’s functional validation is incomplete (EXQ-242 FAIL, EXQ-265 method-PASS but not functionally tested). Without stable context attractors, replay cannot route experiences to the right context slots, and consolidation cannot produce the differentiated attribution maps that E3 needs.
- Root cause: SD-017 infrastructure implemented as methods but not functionally validated
- Fix: SD-017 functional validation experiments (context discrimination before/after sleep)
B6. No replay coverage floor during infancy
The residue_consolidation_delta metric (DEV-NEED-007) specifies that sleep should produce positive delta (increasing residue coverage). But without a lower bound on coverage fraction of replayed content, a replay scheduler can satisfy other criteria (RPE priority) while never expanding residue geography.
- Root cause: no explicit coverage floor constraint in replay scheduler
- Fix:
replay_coverage_floor: floatparameter specifying minimum fraction of replayed episodes that must come from distinct zones not recently replayed
4.2 Replay Overfitting
Signature: Replay selects repeatedly from the same high-salience episodes; the same trajectories are consolidated many times while the rest of the buffer is never touched.
Mechanism: RPE-biased replay without staleness correction → same high-RPE episodes are replayed each cycle → residue amplified in those regions → those regions appear even more salient → positive feedback loop.
Detection: replay_diversity_index < 0.15 AND replay_RPE_priority_score > 0.6 simultaneously.
Prevention: MECH-285 staleness decay + coverage floor.
4.3 Replay-Induced Monostrategy
Signature: After many sleep cycles, agent’s trajectory library collapses to one or two dominant strategy classes.
Mechanism: High-RPE replay reinforces harm-avoidance trajectories. Alternative strategies (with lower RPE, lower residue, but genuine option value) are never replayed and gradually decay from effective policy influence.
Detection: traj_volume_estimate declining across sleep cycles; action_class_coverage declining post-sleep.
This is MECH-124 (consolidation-mediated option-space contraction, Walker PTSD analog).
4.4 Replay Failure to Integrate Residue
Signature: post_sleep_z_goal_retention < 0.85 AND residue_consolidation_delta ≈ 0.
Mechanism: Replay runs but does not write to residue geometry (correct by design — residue updates are separate). But the separate residue integration step may not be triggered, or may integrate into already-saturated regions, or may fail to assign content to the correct context slot (B5).
Detection: residue_curvature_index flat before and after sleep.
4.5 Replay Failure to Expand Trajectory Diversity
Signature: traj_pairwise_cosine_mean and traj_volume_estimate do not increase across infant stage despite many sleep cycles.
Mechanism: Sleep can only recombine what waking produced (Doupe & Kuhl 1999). If waking generates only one trajectory class, sleep cannot introduce new ones.
Detection: traj_volume_estimate plateau for > 500 episodes at infant stage AND action_class_coverage ≤ 2.
Resolution: The fix is the infant substrate (§4.1/B1), not the replay scheduler.
Part 5: Replay Metrics Proposals
5.1 Replay Diversity Metrics
| Metric | Formula | Stage | Threshold | Readiness |
|---|---|---|---|---|
replay_diversity_index | Fraction of replayed episodes from distinct zones vs dominant zone | Infant | > 0.4 | TelemetryRequired |
replay_volumetric_coverage | log-det of trajectory kernel matrix over replayed episodes (DvD-style; Parker-Holder 2020) | All | Increasing trend | TelemetryRequired |
replay_zone_coverage_fraction | Fraction of defined zones represented in last sleep cycle’s replay | Infant | > 0.6 | TelemetryRequired |
replay_context_class_count | Number of distinct context attractors sampled during last sleep cycle | Childhood+ | > 2 | TelemetryRequired |
5.2 Replay Novelty Metrics
| Metric | Formula | Stage | Threshold | Readiness |
|---|---|---|---|---|
replay_RPE_priority_score | Fraction of replayed episodes with RPE > mean_RPE | Adult | > 0.6 | TelemetryRequired |
replay_staleness_score | Mean staleness weight of replayed seeds (from StalenessAccumulator) | All | Increasing over cycles | TelemetryRequired (MECH-285) |
replay_low_salience_fraction | Fraction of replayed episodes with RPE < 25th percentile | Infancy | > 0.3 | TelemetryRequired |
replay_novel_trajectory_fraction | Fraction of replayed trajectories that differ from top-5 most-replayed | All | > 0.5 | TelemetryRequired |
5.3 Replay Coverage Metrics
| Metric | Formula | Stage | Threshold | Readiness |
|---|---|---|---|---|
replay_residue_coverage_delta | Change in residue_coverage_pct after each sleep cycle | Infant | > 0 | TelemetryRequired |
replay_latent_breadth | Fraction of z_world space sampled by replayed episodes | All | Increasing during infant | TelemetryRequired |
replay_buffer_utilisation | Fraction of unique episodes in replay buffer sampled at least once per K cycles | All | > 0.4 | TelemetryRequired |
replay_coverage_floor_adherence | Is the coverage_floor constraint satisfied? | Infant | Boolean | SubstrateReady (post-implementation) |
5.4 Replay-Stage Metrics
| Metric | Formula | Stage | Threshold | Readiness |
|---|---|---|---|---|
post_sleep_z_goal_retention | z_goal.norm() ratio before/after sleep integration | Infant | > 0.85 | TelemetryRequired |
post_sleep_context_differentiation | Decrease in context cosine_sim after sleep cycle | Childhood | > 0.05 per cycle | TelemetryRequired |
post_sleep_trajectory_diversity_delta | Change in traj_volume_estimate after sleep | Childhood | Positive | TelemetryRequired |
post_sleep_residue_integration_efficiency | Change in residue_curvature_index / replay_steps | Adult | Positive | TelemetryRequired |
sleep_wake_ratio | Steps in offline integration / steps in waking episode | Infant | > 0.10 | TelemetryRequired |
5.5 Replay Restructuring Metrics
| Metric | Formula | Stage | Threshold | Readiness |
|---|---|---|---|---|
pre_post_traj_volume_ratio | traj_volume_estimate post-sleep / pre-sleep | All | > 1.0 (sleep expands) | TelemetryRequired |
monostrategy_prevention_score | Is MECH-124 failure mode absent? action_class_coverage post-sleep >= pre-sleep | All | >= 1.0 | TelemetryRequired |
residue_integration_without_amplification | Does residue integrate toward equilibrium (not diverge)? | Post-harm | delta toward mean, not away | TelemetryRequired |
option_space_contraction_rate | Rate of decline in viable trajectory classes across sleep cycles | All | < 0 (not declining) | TelemetryRequired |
Part 6: Stage-Differentiated Replay Scheduling
6.1 Core Proposal: Stage-Indexed Replay Scheduler
The replay scheduler needs a dev_stage parameter that modifies selection policy:
class StageAwareReplayScheduler:
def select(self, buffer, dev_stage, staleness_map=None) -> List[Episode]:
if dev_stage == "infant":
return self._coverage_priority(buffer)
elif dev_stage == "childhood":
return self._interleaved_priority(buffer)
elif dev_stage == "adult":
return self._rpe_and_staleness_priority(buffer, staleness_map)
6.2 Infancy: Coverage and Valence-Map Formation
Goal: Maximise spatial/latent coverage to build valence geography for sleep to consolidate.
Replay policy: Coverage-maximising with harm-salience floor.
| Parameter | Value | Rationale |
|---|---|---|
| Coverage fraction minimum | 0.6 of zones | All zones must be sampled (not just hazard-adjacent) |
| RPE priority weight | 0.2 (low) | High RPE = termination trajectories; these dominate infant buffer and should not monopolise replay |
| Low-salience fraction minimum | 0.3 | Stickgold & Walker 2005: sleep finds hidden structure in weak associations |
| Prospective vs retrospective | Retrospective dominant | Infancy is about understanding what happened; planning comes later |
| Sleep:wake ratio | > 0.10 (frequent offline) | Biological: high infant sleep demand (70% REM in neonates); rapid consolidation |
Failure signature: replay_zone_coverage_fraction < 0.3 = infant replay is dominating on hazard zone; valence geography is one-sided (only harm, no benefit).
Key insight from Gómez et al. (2006): A single nap was sufficient for 15-month-olds to generalise an abstract rule. This means infant replay efficiency is high when content is available — the failure mode is content poverty, not replay mechanism failure. The substrate fix (infant_substrate_expansion.md) is the primary intervention; replay scheduling is secondary.
6.3 Childhood: Strategy Abstraction and Play-to-Real Transfer
Goal: Bind the shared strategy structure between play and real episodes (Schapiro 2017). Prevent synthetic magnitude calibration from consolidating.
Replay policy: Interleaved play + real episodes; strategy-structure-biased selection.
| Parameter | Value | Rationale |
|---|---|---|
| Play:real interleaving ratio | 1:1 minimum | Schapiro 2017: sleep binds shared structure across episode types |
| RPE priority weight | 0.4 (moderate) | Some salience selection; not dominant |
| Coverage fraction minimum | 0.5 | Maintain coverage; childhood can afford more RPE selection than infancy |
| Post-sleep target | post_sleep_context_differentiation > 0.05 | Context differentiation is the childhood gain |
| Sleep:wake ratio | > 0.05 (moderate) | Less sleep than infancy; biological pattern |
Critical constraint (strategy/calibration dissociation, MECH-195): Replay must interleave play and real episodes. Replay of play-only episodes during sleep = risk of consolidating synthetic magnitude calibration. Replay of real-only episodes = failure to bind strategy structure across domains.
Failure signature: synthetic_magnitude_leak_ratio » 1.0 AND post-sleep context_differentiation flat = play calibration is consolidating, strategy structure is not.
6.4 Adolescence/Adult: Responsibility, Repair, and Reconciliation
Goal: Integrate moral residue from harm-causing events without amplifying them into paralysis; support repair and reconciliation; maintain option-space breadth.
Replay policy: RPE + staleness priority with MECH-124 prevention.
| Parameter | Value | Rationale |
|---|---|---|
| RPE priority weight | 0.6 (primary) | Shin et al. 2025: RPE-biased replay predicts subsequent behaviour |
| Staleness weight | 0.3 | MECH-285: corrects for overrepresentation of high-RPE content |
| Coverage floor | 0.25 | Prevents complete abandonment of low-salience content |
| Harm-integration target | residue_integration_without_amplification | REM function: fear extinction, not re-traumatisation |
| MECH-124 monitor | monostrategy_prevention_score >= 1.0 | Walker PTSD analog: option-space should not contract |
Clinical mapping: Adult replay failure modes map onto clinical presentations:
- High RPE priority + MECH-094 tag failure → PTSD intrusive replay (traumatic trace gets high priority but is mis-routed to waking consciousness)
- Posterior drift toward harm-attribution → depression rumination (self-domain posterior drifts negative; replay amplifies rather than corrects)
- Option-space contraction → anhedonia / restriction (viable alternatives excluded from policy)
Part 7: Candidate Experiments
7.1 Replay Before/After Infant Gate (Priority: High)
Title: Replay diversity and residue consolidation as predictors of infant gate passage
Scientific question: Does the quality of replay (coverage, context differentiation, residue delta) improve as the infant substrate matures toward gate passage, and does this predict successful childhood entry?
Design:
- Track replay metrics (§5) longitudinally across the infant phase
- Measure: replay_zone_coverage_fraction, replay_residue_coverage_delta, post_sleep_z_goal_retention
- Gate passage = DEV-NEED-008 blocking criteria passed
- Test: do replay quality metrics predict gate passage 100 episodes in advance?
Claim IDs: DEV-NEED-007, DEV-NEED-008, INV-055, ARC-011
Expected outcome: Replay quality improves monotonically during infant phase; convergence of replay metrics and gate criteria confirms replay is producing developmental progress, not just memory consolidation.
Failure signature (important): Replay metrics improve but gate criteria do NOT improve = replay is reorganizing the same impoverished content, not creating new developmental capacity. Resolution: fix the substrate (infant_substrate_expansion.md), not the replay scheduler.
EXQ label: Candidate for EXQ-IDEV-001
7.2 Replay Before/After Play Stages (Priority: High)
Title: Interleaved play+real replay vs play-only replay on strategy transfer
Scientific question: Does interleaving play and real episodes in the sleep replay buffer (MECH-203-style balance) produce better strategy-structure transfer (DEV-NEED-011) while preventing synthetic magnitude consolidation?
Design:
- Three conditions:
- (A) Play-only replay during childhood sleep
- (B) Real-only replay during childhood sleep
- (C) Interleaved 1:1 play:real replay (MECH-203 extension)
- Measure: play_to_real_competence_SCC, synthetic_magnitude_leak_ratio, post_sleep_context_differentiation
- Hypothesis: condition (C) maximises SCC while keeping leak ratio in [0.7, 1.3]
Claim IDs: MECH-194, MECH-195, MECH-203, INV-060, DEV-NEED-011
Literature anchor: Schapiro et al. (2017 Nat Hum Behav) — sleep preferentially binds memories sharing structure; Gruber et al. (2020 Curr Biol) — childhood sleep promotes rule abstraction, not just episode retention.
EXQ label: Candidate for EXQ-IDEV-002
7.3 Replay Effect on Monostrategy (Priority: High)
Title: Coverage-priority vs RPE-priority replay on infant monostrategy prevention
Scientific question: Does coverage-maximising replay during infancy reduce monostrategy (measured by traj_volume_estimate and action_class_coverage) compared to RPE-priority replay?
Design:
- Requires warm-start (DEV-NEED-029 gate must be confirmed first)
- Two conditions on agents past warm-start threshold:
- (A) Default RPE-priority replay (current)
- (B) Coverage-maximising replay (coverage_floor=0.6, RPE_weight=0.2)
- Measure: traj_volume_estimate, action_class_coverage, monostrategy_prevention_score across 1000 episodes post-warm-start
- Hypothesis: condition (B) shows higher traj_volume_estimate at episodes 200, 500, 1000
Claim IDs: DEV-NEED-005, DEV-NEED-007, ARC-065, MECH-285, MECH-124
Prerequisite: EXQ-ISEF-001 (warm-start calibration; DEV-NEED-029 thresholds must be set)
EXQ label: Candidate for EXQ-IDEV-003
7.4 Replay Effect on Residue Stability (Priority: Medium)
Title: Sleep replay prevents residue saturation after harm events
Scientific question: Does a single sleep cycle after a high-harm episode prevent residue saturation that would otherwise occur over many waking episodes?
Design:
- Three conditions:
- (A) No sleep after high-harm episode
- (B) Sleep with current scheduler after high-harm episode
- (C) Sleep with coverage-priority scheduler (low-salience content included) after high-harm episode
- Measure: residue_saturation_pct, residue_curvature_index, mode_stability_after_harm across 50 subsequent episodes
- Hypothesis: (B) and (C) > (A) on curvature and stability; (C) > (B) on option-space preservation
Claim IDs: ARC-013, MECH-018, MECH-124, DEV-NEED-018
EXQ label: Candidate for EXQ-IDEV-004
7.5 Replay Effect on z_goal Diversification (Priority: Medium)
Title: Does sleep expand or collapse goal variety during infant-childhood transition?
Scientific question: Does replay during the infant-to-childhood transition expand z_goal diversity (more distinct goal signatures, higher z_goal_identity_count) or collapse it toward the dominant goal encountered?
Design:
- Two conditions:
- (A) No sleep at infant-childhood transition boundary
- (B) Sleep (current scheduler) at transition boundary
- Measure: z_goal_identity_count, z_goal_norm, z_goal_persistence_across_novel_contexts at 0, 50, 100 episodes after transition
- Hypothesis: (B) shows higher z_goal_identity_count and better context-persistence
Claim IDs: MECH-189, INV-055, DEV-NEED-006, DEV-NEED-008, DEV-NEED-024
EXQ label: Candidate for EXQ-IDEV-005
7.6 SD-017 Functional Validation: Context Discrimination (Priority: Very High)
Title: SD-017 SWS-analog reduces context cosine_sim to < 0.95 after 3 sleep cycles
Scientific question: Does the SD-017 SWS-analog pass produce measurably better context differentiation (lower cosine_sim between distinct context representations) than no-sleep?
Design: This is the functional validation that EXQ-265/265a did as methods validation. Need to test the actual claim (Law et al. 2016: ~3 interleaved sessions required).
- Two conditions: 3 waking sessions + 3 sleep cycles vs 3 waking sessions + no sleep
- Measure: context cosine_sim before and after each cycle; target < 0.95 by cycle 3
- Acceptance: sleep condition shows context_cosine_sim < 0.95; no-sleep stays > 0.95
Claim IDs: SD-017, MECH-166, INV-044
EXQ label: Candidate for EXQ-IDEV-006 (or continuation of EXQ-500/503 series)
Part 8: Telemetry Proposals
The following channels should be added to MECH-042 telemetry for replay-related monitoring. These supplement the channels already proposed in developmental_metrics.md.
# Replay quality channels (new)
replay_zone_coverage_fraction : float # fraction of zones sampled in last sleep cycle
replay_volumetric_coverage : float # log-det of trajectory kernel over replayed episodes
replay_buffer_utilisation : float # fraction of unique buffer entries sampled per K cycles
replay_low_salience_fraction : float # fraction replayed with RPE < 25th percentile
replay_play_real_interleave_ratio : float # play:real ratio in last sleep replay pool
replay_context_class_count : int # distinct context attractors sampled in last sleep cycle
replay_novel_trajectory_fraction : float # fraction not in top-5 most-replayed
# Replay effect channels (new)
pre_post_traj_volume_ratio : float # traj_volume post-sleep / pre-sleep
monostrategy_prevention_score : float # action_class_coverage post-sleep / pre-sleep (>= 1.0 = good)
option_space_contraction_rate : float # rate of decline in viable traj classes across sleep cycles
post_sleep_context_differentiation : float # decrease in context cosine_sim per sleep cycle
residue_integration_without_amplification : bool # residue delta toward equilibrium (not away)
# Stage-aware replay scheduler
replay_scheduler_stage : str # "infant" | "childhood" | "adult"
replay_coverage_floor_active : bool # is the coverage floor constraint active?
replay_coverage_floor_adherence : bool # was the floor satisfied this cycle?
# Warm-start gate (ARC-065, DEV-NEED-029 extension)
replay_warm_start_gate_all_green : bool # all three warm-start criteria met
Part 9: Suggested Register Updates
9.1 Updates to Existing DEV-NEED Rows
| DEV-NEED | Current state | Recommended update |
|---|---|---|
| DEV-NEED-007 | “offline passes improve map stability, goal seeds, repertoire quality” | Add: replay_diversity_index (> 0.4), replay_low_salience_fraction (> 0.3 infant), replay_zone_coverage_fraction (> 0.6 infant); add explicit infant vs childhood scheduling distinction |
| DEV-NEED-008 | 8-criterion gate from developmental_metrics.md | Add: replay_zone_coverage_fraction > 0.6 as advisory gate; confirm at least one sleep cycle with positive residue_consolidation_delta before gate passes |
| DEV-NEED-005 | “behavioral entropy below ceiling but broad” | Add: monostrategy_prevention_score >= 1.0 post-sleep (MECH-124 check); traj_volume_estimate non-declining across sleep cycles |
| DEV-NEED-011 | “strategy transfer without synthetic magnitude calibration” | Add: replay_play_real_interleave_ratio must be >= 0.5 during childhood sleep; Schapiro 2017 anchors play+real interleaving |
| DEV-NEED-018 | “repair after harm” | Add: residue_integration_without_amplification = True after each high-harm episode sleep cycle |
9.2 New DEV-NEED Candidates
DEV-NEED-030 (PROPOSED): Stage-Aware Replay Scheduling
| Field | Value |
|---|---|
| Developmental Need | Replay scheduler must adapt to developmental stage |
| Stage | Cross-stage |
| Claim IDs | ARC-011, SD-017, MECH-285; PROPOSED new claim |
| Required mechanism | dev_stage parameter on replay scheduler; stage-indexed policies (§6.1) |
| Gate criterion | replay_scheduler_stage matches current dev_stage; stage-specific metrics pass |
| Failure if absent | Infant stage uses adult RPE-priority = replay_low_salience_fraction < 0.1 = valence map never consolidated; adulthood uses infant coverage-priority = monostrategy never learned |
| Current status | PROPOSED; not registered |
| Priority | After infant_substrate_expansion.md substrate features |
DEV-NEED-031 (PROPOSED): MECH-124 Prevention Gate
| Field | Value |
|---|---|
| Developmental Need | Sleep replay must not contract the option space |
| Stage | All (especially adult and post-harm childhood) |
| Claim IDs | MECH-124, ARC-011; PROPOSED new claim |
| Required mechanism | monostrategy_prevention_score monitored per sleep cycle; MECH-285 staleness correction |
| Gate criterion | monostrategy_prevention_score >= 1.0 across rolling 5 sleep cycles; option_space_contraction_rate ≤ 0 |
| Failure if absent | Progressive option-space contraction = adult agent approaches PTSD/depression phenotype; EXQ-573 type null eventually becomes a FAIL on options, not measures |
| Current status | PROPOSED; not registered |
| Priority | With sleep aggregation cluster implementation |
9.3 Claims That May Require Revision
INV-049 (offline phases are a mathematical necessity): This invariant has FAILED twice (EXQ-385, EXQ-385a). The theory is correct (Gómez 2006, Diekelmann 2010, SD-017 doc all ground it strongly). The failures are most likely substrate failures (content poverty) not theoretical failures. Recommended: do not demote INV-049; instead queue:
- infant_substrate_expansion.md features to fix content poverty
- EXQ-IDEV-001 (longitudinal replay quality) as a better test
- EXQ-IDEV-006 (SD-017 functional validation) as a targeted test
Flag as pending_substrate_reconfirmation: true with note: “two FAILs (EXQ-385, EXQ-385a) most likely reflect infant substrate compression (content poverty) preventing sleep from demonstrating discriminative advantage; substrate fix is the prerequisite for a valid test.”
ARC-038 (waking consolidation): Multiple FAILs (EXQ-191, EXQ-240, EXQ-240a, EXQ-267). Pattern consistent across iterations. Waking schema assimilation is not working in current substrate. This is a genuine concern: if waking consolidation fails, sleep has an impoverished prior to work with. This is independent of infant content poverty — it affects the SWS-analog pass.
Part 10: Summary Assessment
What replay currently does well
- Architecture is correct: replay is exploratory, non-corrective, non-erasing of φ(z). This is the right design.
- Balanced replay (MECH-203) works: harm+benefit balance validated (EXQ-256 PASS).
- Surprise-gating (MECH-205) works after iteration: surprise-gated replay eventually validated (EXQ-258b PASS).
- Sleep phase infrastructure works: SD-017 method validation passed (EXQ-265, 265a).
What replay currently does not do
- No stage awareness: same policy at every developmental stage.
- No content coverage guarantee: replay can satisfy RPE metrics while ignoring most of the latent space.
- No warm-start gate: diversity mechanisms zero-differential on cold substrate.
- No MECH-124 prevention: no monostrategy monitoring across sleep cycles.
- Full sleep aggregation not implemented: MECH-272/273/275/285 are designed but not yet built — the self-model, place attribution, and staleness correction are all absent.
Primary recommendation
The infant substrate is the priority, not the replay scheduler. Replay cannot restructure content that was never created. The graduated harm zones, microhabitat zones, and multi-resource heterogeneity from infant_substrate_expansion.md are prerequisites for any replay experiment that tests developmental progression.
Once the infant substrate is enriched:
- Implement stage-aware replay scheduler (§6.1)
- Add replay quality telemetry channels (§8)
- Run EXQ-IDEV-001 (longitudinal infant replay quality)
- Run EXQ-IDEV-006 (SD-017 functional validation)
- Then build the sleep aggregation cluster (MECH-272/273/275/285)
This ordering ensures that each validation experiment tests the mechanism in isolation rather than being confounded by content poverty or unimplemented upstream infrastructure.
Related Claims
- ARC-007 (hippocampal systems / path memory and replay)
- ARC-011 (offline integration necessity)
- ARC-013 (residue geometry)
- ARC-014 (default mode)
- ARC-038 (waking consolidation mode)
- ARC-065 (diversity mechanisms)
- INV-010 (offline integration necessity)
- INV-049 (offline phases mathematical necessity — see §9.3)
- MECH-018 (sleep residue integration)
- MECH-030 (sleep modes and ethical consolidation)
- MECH-092 (quiescent waking replay)
- MECH-124 (consolidation-mediated option-space contraction)
- MECH-165 (replay diversity — forward/reverse balance)
- MECH-203 (balanced harm/benefit replay)
- MECH-205 (surprise-gated replay)
- MECH-272/273/275/285 (sleep aggregation cluster)
- SD-017 (minimal sleep-phase architecture)
- DEV-NEED-007 (frequent offline integration during early development)
- DEV-NEED-029 (ARC-065 warm-start gate)