Sleep-Aggregation Cluster: MECH-272 / MECH-273 / MECH-275 / MECH-285

Claim cluster:

MECH-272 – state-gated routing of hippocampal replay (anchor channel waking, probe channel sleep)
MECH-273 – sleep half of the self-model: full-Bayesian aggregation of single-episode SD-003 outputs
MECH-275 – general sleep-phase Bayesian aggregation across attribution domains (parent of MECH-273)
MECH-285 – sleep-consolidation priority biased by MECH-284 V_s residual schema-staleness map

Status: candidate, v3_pending (all four) Phase status (substrate): none of the four implemented; this doc commits the build order Registered: 2026-04-25 Sibling docs (do NOT duplicate):

sd_017_sleep_phase_architecture.md – minimal V3 SWS/REM phase machinery (parent infrastructure)
v_s_invalidation_runtime.md – MECH-284 online arm + MECH-287 trigger (already implemented through Phase 3)
sleep/offline_phases.md – V4 full sub-phase ordering (MECH-120-123)
default_mode.md – MECH-092 quiescent waking replay (V3 prerequisite)

Lit-pull dependencies:

targeted_review_mech285_sleep_replay_seed/SYNTHESIS.md – 5 verdicts grounding seed-pool, priority shape, timing, salience separation, dual-trace
targeted_review_connectome_mech_273/ – 5 entries (lit_conf 0.77)
targeted_review_connectome_mech_275/ – 4 entries (lit_conf 0.85)
targeted_review_v_s_foundation/SYNTHESIS.md – per-region V_s, dual-trace anchor preservation

Depends on (substrate-ready as of 2026-04-25): SD-017 design only / no implementation; MECH-284 Phase 3 online arm; MECH-269 anchor selection + hysteresis; MECH-287 broadcast trigger; SD-003 self-attribution + ARC-033 E2_harm_s + SD-013 contrastive training; MECH-094 simulation-mode write gate; AnchorSet dual-trace preservation.

Implementation gaps to be closed (in order):

Sleep-loop scaffolding (SD-017 minimal – never implemented)
MECH-285 offline-arm replay sampler (deferred in v_s_invalidation_runtime.md)
MECH-272 routing gate (referenced by MECH-269/271/284 but no module exists)
MECH-275 general Bayesian aggregator
MECH-273 self-model specialisation

Problem

The waking-phase architecture as of 2026-04-25 produces episode-local attribution and schema-staleness signals, but does not aggregate them. SD-003 emits a per-episode causal signature; MECH-284 emits a per-region staleness scalar; MECH-269 marks anchors inactive. None of these are integrated across episodes. Without an offline aggregation pass:

Episode-local self-attribution remains noisy. SD-003’s causal_sig is computed on a single (z_t, a_actual, a_cf) tuple. Systematic biases (e.g. reward delay, observation noise on outcomes) propagate uncorrected into E2_harm_s. The agent has a signature but not a self-model in the durable sense (MECH-273).
Schema-revision is single-step. MECH-284 staleness biases the next anchor selection (online), but the inactive anchor’s content is never re-examined against accumulated evidence from other episodes. The dual-trace is preserved (MECH-269 mark_inactive) but never integrated.
Other attribution domains (object schemas, social attribution, place schemas) need the same aggregation pattern. MECH-275 generalises the pattern; MECH-273 specialises it to self.

The cluster’s job is to take the waking-phase residuals (SD-003 outputs, MECH-284 staleness map, AnchorSet dual-trace, MECH-276 counterfactual feedstock) and run a periodic offline pass that:

Selects which content to re-examine (MECH-285, staleness-weighted broad coverage).
Routes that content through a different consumer set than waking (MECH-272, probe channel dominant).
Aggregates evidence across replays into a posterior update (MECH-275).
Specialises the aggregator output to write back to E2_harm_s and SD-033a (MECH-273).

Architectural commitments

C1. Sleep-mode entry is deterministic in V3

Sleep-mode entry triggers at end of every K episodes (configurable; default K=8 to match SD-017 narrative). SD-037 broadcast-override-driven entry (sustained low-arousal, high drive) is a known refinement but is deferred to V4. Reason: SD-037 just landed and is under validation (V3-EXQ-483); coupling sleep entry to it now risks compounding uncertainty.

V3 sleep entry is a hard switch: external observation gating closes (MECH-122 placeholder), goal-seeded action selection pauses (drive_level frozen at entry value), and the heartbeat clock continues but routes ticks to the sleep loop instead of the action loop.

C2. Module location: new `ree_core/sleep/` package

A self-contained package, mirroring the architectural cluster. Rationale: the four MECHs share state (sleep phase, replay buffer, posterior store), share the dependency on the StalenessAccumulator snapshot, and share MECH-094 simulation-mode tagging. Distributing across hippocampal/, predictors/, pfc/ would triplicate the phase contract.

The package owns:

sleep/phase_manager.py – SleepPhase enum (WAKING, SWS_ANALOG, REM_ANALOG), entry/exit conditions, tick router. Wires into REEAgent.step().
sleep/replay_sampler.py – MECH-285. Reads StalenessAccumulator snapshot + AnchorSet dual-trace; emits ordered seed sequence per phase tick.
sleep/routing_gate.py – MECH-272. State-conditioned channel weights consumed by the existing MECH-271 hippocampal router (anchor vs probe channel destination weights).
sleep/bayesian_aggregator.py – MECH-275. Maintains posteriors per attribution domain; consumes probe-channel replay events; emits posterior-update messages.
sleep/self_model_aggregator.py – MECH-273. Subclass of the general aggregator specialised on SD-003 causal_sig posterior; writes corrected residuals into E2_harm_s via offline gradient pass.

C3. Seed pool is BROAD; priority is staleness-proportional softmax

Per targeted_review_mech285_sleep_replay_seed/SYNTHESIS.md verdicts 1, 2, and 5:

Seed pool = full AnchorSet (active + inactive anchors with dual-trace preserved). No active-only filter, no time-since-invalidation gate.
Priority = softmax(staleness[r] / temperature) over the broad pool. Continuous, not threshold-gated. Temperature is a tunable scalar (default 1.0); lower temperatures concentrate replay on the most-staled regions, higher temperatures spread it.
Salience-driven replay (MECH-074b dopamine tag) operates on a separate channel (retrieval-SWR arm) and is not arbitrated against staleness here. MECH-285 modifies only the consolidation-SWR arm.
MECH-284 must continue accumulating on inactive regions after mark_inactive. The online arm currently does this implicitly (the per-region accumulator is keyed on (scale, segment_id), independent of the AnchorSet active flag); the offline arm relies on this and the contract should be made explicit in the StalenessAccumulator docstring.

C4. MECH-272 routing is a destination-weight flip, not a proposer change

MECH-271 already specifies that anchored replay routes to subiculum -> entorhinal deep -> neocortex (E1 consolidation) and to SD-033a (rule/goal viability), and probe-channel replay routes to attribution-aggregation consumers. MECH-272 says these route weights are state-gated:

Phase	anchor_channel weight	probe_channel weight
WAKING	1.0	0.0
SWS_ANALOG	0.6	0.4
REM_ANALOG	0.2	0.8

Defaults are placeholders – the V3 validation experiment (EXP-0169 schema-revision arm) should sweep these. The proposer (MECH-269) keeps producing anchor-rooted trajectories in both phases; the routing changes which downstream consumer receives them. SWS-analog biases toward consolidation (anchor channel still meaningful); REM-analog biases toward attribution-revision (probe channel dominant).

C5. MECH-275 posterior is a per-domain Gaussian over residuals

Initial implementation maintains one Gaussian-distributed posterior per attribution domain. Domains in V3:

self (MECH-273): posterior over E2_harm_s prediction residuals indexed by causal_sig bucket (e.g., quartile of causal_sig magnitude per anchor region).
place (MECH-275 directly): posterior over per-region V_s residuals – a Bayesian upgrade of MECH-284’s leaky integrator.
object and other are V4 domains (object-schema revision; MECH-274 social attribution). Schema slot reserved; no implementation in V3.

Each posterior is updated per replayed event with a standard mean-and-variance update: prior = current posterior, evidence = SD-003 output for self / V_s residual for place, likelihood variance = config-tunable. No reliance on a probabilistic-programming framework; plain numpy/torch arithmetic.

The aggregator’s output is a posterior-update message: (domain, region, delta_mean, delta_variance, n_replays). Consumers decide what to do with it.

C6. MECH-273 writes back to E2_harm_s as offline gradient steps

The self-model aggregator’s posterior over SD-003 residuals is consumed during sleep by running a low-LR offline gradient pass on E2_harm_s, using the corrected residuals as training targets. MECH-094 simulation-mode tag is set throughout (so this offline pass does NOT propagate to E1 viability map updates – it is a self-model parameter update, not an experience write).

Default offline LR = 0.1 * waking_LR. Number of offline gradient steps per sleep cycle is bounded (default 100) to keep sleep-cycle wallclock predictable. SHY normalisation (MECH-120, V4) is the natural follow-on but is out of scope for V3.

The alternative would be to maintain a runtime bias term added to E2_harm_s outputs at inference time (no weight update). Rejected because: (a) the bias term grows unboundedly across many sleep cycles without a rule for decay; (b) the biological mechanism is synaptic update (NREM consolidation), and the engineering story is cleaner if we mirror that; (c) downstream consumers (SD-013 contrastive training) already expect E2_harm_s as a learnable module.

C7. MECH-094 tag enforcement throughout

All replay events generated by the sleep loop carry simulation_mode=True. This blocks:

Hippocampal viability map writes (MECH-094 already enforces this on simulated rollouts).
StalenessAccumulator integration (already enforced – integrate() is only called from REEAgent.sense(), which is the waking observation stream; sleep-loop ticks must not call sense()).
E1 experience-stream consolidation (the SWS-analog routing is a schema update, not an episodic write).

The MECH-273 offline gradient pass on E2_harm_s is the single explicit exception: it IS a parameter update during simulation-mode, but it is gated behind the self-model aggregator specifically and does not propagate elsewhere.

Phase ordering within a sleep cycle

SLEEP_ENTRY:
  - phase_manager: WAKING -> SWS_ANALOG
  - StalenessAccumulator.snapshot() -> staleness_map (frozen for this cycle)
  - replay_sampler.seed_pool = AnchorSet.all_with_dual_trace()
  - routing_gate.weights -> SWS_ANALOG row
  - external observation gating closes; drive_level frozen

SWS_ANALOG (N1 ticks, default 50):
  for tick in range(N1):
    seed = replay_sampler.draw(staleness_map)
    replay_event = hippocampal_proposer(seed, simulation_mode=True)
    routed = routing_gate.route(replay_event)
    if routed.anchor_channel:
      e1_consolidation_consumer(routed)        # context-template update (SD-017)
    if routed.probe_channel:
      bayesian_aggregator.update(routed)        # posterior accumulates

PHASE_SWITCH:
  - phase_manager: SWS_ANALOG -> REM_ANALOG
  - routing_gate.weights -> REM_ANALOG row
  - bayesian_aggregator.snapshot() -> posterior_at_phase_switch (debug)

REM_ANALOG (N2 ticks, default 50):
  for tick in range(N2):
    seed = replay_sampler.draw(staleness_map)        # same map, broader read in REM
    replay_event = hippocampal_proposer(seed, simulation_mode=True)
    routed = routing_gate.route(replay_event)
    bayesian_aggregator.update(routed)

WRITEBACK:
  - self_model_aggregator.offline_gradient_pass(e2_harm_s, n_steps=100)
  - StalenessAccumulator.partial_decay(replayed_regions, decay_factor=0.5)
    # regions that got replayed get an extra leak -- "this got addressed"
  - phase_manager: REM_ANALOG -> WAKING
  - external observation gating opens; drive_level resumes accumulating

The N1/N2 split is configurable; the SWS-bias-toward-anchor-channel and REM-bias-toward- probe-channel is the load-bearing biological commitment. Setting N1=0 reduces the cycle to “REM-only attribution revision”; setting N2=0 reduces it to “SWS-only schema consolidation.” Both edge cases are valid for ablation experiments.

State and contracts

Sleep-loop state

Owned by sleep/phase_manager.py. Per-cycle:

SleepCycleState:
  cycle_id: int                          # monotone counter
  entry_episode: int                     # which waking episode triggered entry
  entry_utc: str
  staleness_snapshot: Dict[RegionKey, float]  # frozen at entry
  current_phase: SleepPhase
  ticks_in_phase: int
  posterior_snapshots: List[...]         # one at SWS exit, one at REM exit
  writeback_summary: dict                # n_grad_steps, regions_decayed, etc.

Posterior store

Owned by sleep/bayesian_aggregator.py. Lives across cycles:

DomainPosterior:
  domain: Literal["self", "place"]
  per_region: Dict[RegionKey, GaussianPosterior]  # mean, variance, n_evidence
  cycle_history: List[(cycle_id, snapshot)]

Persisted via the existing checkpoint path. MECH-275’s per-region per-domain posterior is the durable self/place model that survives across episodes.

Routing-gate contract

routing_gate.route(event) -> RoutedEvent with RoutedEvent.anchor_channel: float and RoutedEvent.probe_channel: float. Both consumers downstream multiply their write strength by the channel weight. This is the single place MECH-272’s state-conditioned routing is enforced.

MECH-285 sampler contract

SleepReplaySampler.draw(staleness_snapshot) -> AnchorRef:

def draw(self, staleness_snapshot):
    seeds = self._anchor_set.all_with_dual_trace()
    weights = softmax([staleness_snapshot.get(s.region_key, 0.0) / self.temperature
                       for s in seeds])
    return numpy.random.choice(seeds, p=weights)

Stateless across draws within a cycle; the snapshot is frozen at entry.

StepHarness write-path audit (GAP-6)

Completed: 2026-05-15. Every write site reachable from the five sleep entry/exit/pass methods was walked and classified. The audit question per sleep_substrate_plan.md Phase 5: does any sleep-side write site need to be re-routed through StepHarness, or does each site have a documented architectural exception?

Note on StepHarness location. StepHarness lives in experiments/_harness.py (line 106), NOT in ree_core/. It enforces the canonical waking per-tick sequence (sense -> _e1_tick -> generate -> update_z_goal -> update_residue -> act). Sleep-period writes are architectural exceptions to the waking sequence by design; they cannot and should not call the harness.

Write sites and exception classifications

Site	Method	Exception class	Reason
`e1.context_memory.write(e1_input)`	`run_sws_schema_pass()`	Intentional offline schema installation	Offline gate temporarily lifted (`e1._offline_mode = False`) to write schema content, then restored. This IS the designed offline write; not a waking experience write. MECH-272 anchor_channel scaling applied before write: `e1_input = e1_input * anchor_weight`.
`e1.shy_normalise(decay)`	`enter_sws_mode()`	Weight-decay write; not an experience write	The canonical exception cited in the GAP-6 spec. Modifies `context_memory.memory.data` in-place (slot weight decay toward slot mean). No gradient flow; no replay content.
`e2_harm_s.parameters()` via `offline_gradient_pass()`	`SelfModelAggregator.offline_gradient_pass()` (called from `_run_cycle()` WRITEBACK phase)	Explicit MECH-094 exception	Only MECH-094-scoped exception in the entire sleep path. Adam optimizer constructed locally over `e2_harm_s.parameters()` ONLY; no other module’s parameters are touched. Bounded MSE steps targeting SWS-snapshot posterior means.
Serotonin state transitions	`enter_sws_mode()`, `enter_rem_mode()`, `exit_sleep_mode()`	Internal neuromodulator state	`serotonin.enter_sws()`, `serotonin.enter_rem(current_precision=...)`, `serotonin.exit_sleep()` update tonic_5ht / phase flags. Not experience writes; not residue writes.
`pcc.note_offline_entry()`, `pacc.note_offline_entry()`	`enter_offline_mode()` (called by `enter_sws_mode()` and `enter_rem_mode()`)	Internal counter resets	Resets `_steps_since_offline` counter and success EMA state. No content writes.
Bayesian aggregator `update()` and `snapshot()`	`_run_cycle()` SWS and REM loops	Internal aggregator posterior state	Updates `GaussianPosterior` (mean, variance, n_evidence) per-domain per-region. Not a `ResidueField` write, not a `ContextMemory` write. `snapshot()` deep-copies live posteriors at PHASE_SWITCH.
`staleness.partial_decay(replayed_regions, decay_factor=...)`	`_run_cycle()` WRITEBACK	MECH-284 internal staleness state	Multiplicative decay of region-keyed staleness scalars. Not an experience write; not residue.

Audit result: ALL sleep-side write sites are documented exceptions. Zero sites require re-routing through StepHarness. The GAP-6 acceptance criterion is satisfied.

Phase ordering note

hypothesis_tag=True is enforced throughout run_rem_attribution_pass() so no ResidueField writes occur during the REM pass. All _score_trajectory() calls are read-only terrain evaluations. The offline gate (e1._offline_mode = True) is set by enter_offline_mode() at the start of both SWS and REM modes, suppressing all waking observation writes from context_memory.write() on the normal sense path. The SWS schema pass temporarily lifts the gate only for its own intentional schema writes, then restores it.

Validation plan

Phase	What lands	Validation experiment	Acceptance criterion
A	Sleep loop scaffolding (`phase_manager`, default-no-op `replay_sampler`, `routing_gate`, `bayesian_aggregator` stubs); SD-017 SWS/REM phase contract	V3-EXQ-NNN smoke: agent runs N waking episodes, enters sleep, returns to waking; bit-identical waking trajectory with `use_sleep_loop=False`	bit-identical OFF; sleep cycle completes without exception ON
B	MECH-285 offline arm (`SleepReplaySampler` reads StalenessAccumulator snapshot, broad pool, staleness-softmax)	EXP-0168 (already drafted in proposals): high vs low waking trigger load over region R	sleep replay event count over R scales monotonically with `staleness[R]` at sleep entry, 2/2 seeds
C	MECH-272 routing gate: state-conditioned channel weights, wired into MECH-271 router	V3-EXQ-NNN: anchor-channel-only vs probe-channel-only sleep cycle	E1 ContextMemory updates only with anchor channel ON; aggregator posterior updates only with probe channel ON
D	MECH-275 general Bayesian aggregator: per-domain posteriors, update from probe-channel events, snapshot+decay contract	EXP-0169 (already drafted): seed waking with biased self-attribution; sleep aggregator should correct it	mean of `self`-domain posterior shifts toward true causal_sig by >= 0.5 SD across 3 sleep cycles
E	MECH-273 self-model offline gradient pass to E2_harm_s	V3-EXQ-NNN: with vs without offline writeback; measure E2_harm_s prediction residual on held-out tuples	residual decreases monotonically across 5 sleep cycles with writeback ON; flat or increasing OFF

EXP-0168 and EXP-0169 are already in experiment_proposals.v1.json (status=gated). They unblock at Phase B and Phase D respectively.

Falsifiability and clinical mappings

Three failure modes per cluster member are predicted and would falsify the claim:

Claim	Failure mode	Predicted phenotype
MECH-272	Routing weights do not flip across phase	sleep cycle produces no aggregator updates; or waking writes get probe-routed
MECH-273	Offline gradient pass diverges or overcorrects	E2_harm_s residuals oscillate; SD-003 causal_sig drifts cycle-over-cycle
MECH-275	Posterior over place V_s diverges from MECH-284 ground truth	sleep map never converges to schema-staleness; replay priority becomes random
MECH-285	Replay event count over high-staleness region indistinguishable from low-staleness	EXP-0168 FAIL: priority is salience-driven only; staleness signal is inert

Clinical mappings (Daniel’s interest; intentionally lightly stated):

PTSD intrusive replay: MECH-094 tag loss + intact MECH-285 staleness priority -> traumatic trace gets high replay priority but is mis-routed (waking intrusion). Treatment target: tag function, not staleness suppression.
Confabulation in dementia: MECH-273 offline writeback proceeds but on a corrupted SD-003 causal_sig (E2_harm_s mis-predicting); the self-model accumulates stable false attributions across cycles. Distinct from psychosis (MECH-094 acute tag failure).
Depression rumination: posterior in self domain drifts toward elevated harm-attribution baseline; replay priority biased toward most-negative regions; offline writeback amplifies the bias rather than correcting it. Plausible target: temperature increase in MECH-285 sampler, staleness leak rate increase in MECH-284.

Out of scope (deferred to V4)

SD-037-driven sleep entry (sustained low-arousal trigger). Currently V3 uses deterministic K-episode trigger.
Object-schema and social-attribution domains (MECH-274 + MECH-275 object specialisation).
SHY synaptic homeostasis (MECH-120) on the offline gradient pass.
Spindle-equivalent thalamic gating (MECH-122) on observation stream during sleep – V3 uses a hard switch in phase_manager.
Bidirectional ThetaBuffer (MECH-122 Phase 3 rewiring) – V3 routing-gate is destination-only.
Mode-conditioned replay-content selection beyond the staleness-vs-salience separation (Joo & Frank 2018 multi-mode SWR). V3 single-priority sampler is sufficient.

Implementation plan (build order)

#	Module / change	Claim(s) implemented	Estimated session count	Validation gate
1	`ree_core/sleep/__init__.py`, `phase_manager.py`, SleepPhase enum, REEAgent.step() integration, `use_sleep_loop` flag default False	– (scaffolding for SD-017)	1	smoke: bit-identical waking OFF/ON
2	`sleep/replay_sampler.py` SleepReplaySampler; AnchorSet.all_with_dual_trace() helper; explicit StalenessAccumulator inactive-region contract in docstring	MECH-285 (offline arm)	1	EXP-0168 PASS
3	`sleep/routing_gate.py` RoutingGate; extend MECH-271 router consumer (HippocampalRouter) with channel-weight multiplication	MECH-272	1	C-phase ablation experiment
4	`sleep/bayesian_aggregator.py` GeneralBayesianAggregator; per-domain GaussianPosterior; cycle snapshot + decay	MECH-275	1-2	EXP-0169 PASS
5	`sleep/self_model_aggregator.py` SelfModelAggregator (subclass); offline E2_harm_s gradient pass; MECH-094 tag enforcement audit	MECH-273	1	E-phase residual experiment

Each step lands behind its own master flag (use_sleep_loop, use_mech285_sampler, use_mech272_routing, use_mech275_aggregator, use_mech273_self_model). Defaults all False so the cluster lands incrementally without affecting any in-flight experiment. The contract tests (tests/contracts/) get one new file per step asserting bit-identical behaviour with the flag OFF and the new behaviour with the flag ON.

Total estimate: 5-6 implementation sessions, 5 validation experiments. The first two steps unblock EXP-0168 (already drafted, gated). All five steps unblock EXP-0169.