SD-034: Governance Closure Operator
Claim ID: SD-034 Subject: governance.closure_operator Status: IMPLEMENTED 2026-04-20 (validated 2026-04-21; design doc backfilled 2026-04-27) Registered: 2026-04-20 Depends on: SD-033 (governance cluster), SD-033a (lateral PFC analog rule_state), MECH-090 (BetaGate commitment latch), MECH-260 (dACC action-class No-Go), MECH-094 (hypothesis tag write gate), SD-032a (SalienceCoordinator), SD-032b (dACC analog) Blocks: EXP-0156 / V3-EXQ-460 (verified-but-not-released), EXP-0157 / V3-EXQ-461 (delayed-reward persistence), EXP-0162 / V3-EXQ-466 (satisficing / residue discharge), EXP-0164 / V3-EXQ-468 (commitment vs contradiction). All landing-diagnostic variants PASSed 2026-04-21.
Problem
REE-V3 has the substrate for Go (basal-ganglia-like gate, SD-032a operating-mode register), Hold (MECH-090 BetaGate bistable commitment latch), and No-Go (MECH-260 dACC action-class suppression). What is missing is the operator that turns evaluation off on successful completion. Without it, a committed rule_state that has been satisfied keeps being evaluated, keeps consuming mode budget, keeps generating residue, and the agent cannot disengage. The 2026-04-20 GAP MEMO and the OCD thought set identify this as the load-bearing missing piece in the governance layer — verified-but-not-released, delayed-reward-persistence, and satisficing failures all trace back to the absence of a closure operator.
Solution
Implemented as ree_core/governance/closure_operator.py (ClosureOperator, ClosureOperatorConfig, ClosureEvent). On detection of rule-completion, the operator emits a five-part coordinated “done” token:
- Beta release — calls
MECH-090 beta_gate.release()to drop the commitment latch. - No-Go injection — calls
MECH-260 dacc.inject_nogo(action_class, count)to bias the just-completed action class toward suppression on the next cycle. Uses the same FIFO mechanism as execution-recorded suppression but is semantically distinct (closure-driven rather than execution-recorded). - Residue discharge — calls
ResidueField.discharge_domain(z_world, factor, radius)for rule-domain multiplicative decay on RBF weights. A hard 1e-6 floor preserves the “residue cannot be erased” invariant; valence_vecs are not modified so 4-component valence is preserved (replay prioritisation remains faithful). - Salience signal — calls
SalienceCoordinator.update_signal("closure_event", value), which re-biases mode affinity towardinternal_planningvia registered affinity_weights (defaultinternal_planning=0.5). - dACC PE reset — calls
dacc.reset_episode_pe()and optionally installs adacc_pe_cap(MECH-268 saturation/reset coupling).
Completion detection
Two paths.
Tick path (default): the operator checks at every select_action() whether
||rule_state(t) - rule_state(t-1)|| < closure_rule_delta_thresholdforclosure_stable_ticksconsecutive ticks- AND
beta_gate_elevated == True - AND
current_mode in allowed_closure_modes - AND
sd_033a write_gate >= closure_min_sd033a_gate - AND
||rule_state|| > 0(guard against firing on an unset rule_state).
Explicit path: emit_closure(action_class, z_world, bypass_mode_conditioning=False). The experiment hook for controlled ablations.
Mode conditioning as falsifiability predicate
Mode conditioning generalises MECH-094’s hypothesis-tag write gate to the closure operation. Closure firing is blocked in internal_replay and offline_consolidation modes via allowed_closure_modes and via the sd_033a gate floor (write_gate("sd_033a") = 0.05 in internal_replay). The architectural commitment is testable: if MECH-090 + MECH-260 + MECH-094 tuning without closure produces the same behavioural signature as closure-with-MECH-094-restriction in follow-up behavioural variants, SD-034 is over-specification. The mode-conditioning predicate is the falsification handle.
ResidueField.discharge_domain API (added in same pass)
Multiplicative decay + sign-aware 1e-6 floor + radius-scoped in-domain selection via squared-distance comparison against (radius * bandwidth)^2. valence_vecs are NOT modified — the 4-component valence is preserved so replay prioritisation remains faithful.
DACCAdaptiveControl extensions
dacc_pe_capconfig field (absolute cap on precision-weighted PE after closure).inject_nogo(action_class, count)method.reset_episode_pe()method (distinct from fullreset()— preserves_action_historywhere the just-injected No-Go lives).
Agent wiring
REEAgent.__init__ instantiates closure_operator when use_closure_operator=True (requires use_lateral_pfc_analog=True and use_dacc=True; salience coordinator optional). select_action() calls closure_operator.tick() after action emission with current z_world, argmax(action_class), operating_mode, and sd_033a gate. reset() calls closure_operator.reset(). register_on_coordinator() wires closure_event into salience.config.affinity_weights at init.
Config surface
REEConfig.use_closure_operator(bool, default False) — master switchclosure_rule_delta_threshold(default 0.001)closure_stable_ticks(default 3)closure_require_beta_elevated(default True)closure_min_sd033a_gate(default 0.5)closure_nogo_injection_count(default 3)closure_residue_discharge_factor(default 0.5)closure_residue_discharge_radius(default 1.5)closure_signal_value(default 1.0)closure_reset_pe_ema(default True)closure_pe_cap_after(default None)closure_signal_affinity_internal_planning(default 0.5)
Backward compatibility
use_closure_operator=False by default → agent.closure_operator is None; every integration site is a no-op. Existing experiments unaffected. Bit-identical with closure_signal_affinity_internal_planning=0.0 and the master switch off.
Architecture Context
SD-034 is the first substrate landed in the SD-033 governance cluster (see evidence/planning/sd033_governance_plan.md). It is also the first consumer of the MECH-261 write-gate registry’s mode-conditioning predicate. Subsequent substrates in the cluster — MECH-266 (asymmetric mode hysteresis), MECH-267 (mode-conditioned hippocampal proposals), MECH-268 (dACC conflict saturation) — are independent extensions of existing modules; SD-034 is the only one that is a new module / new operator.
The five-part signal collocates several biologically-observed end-of-sequence signatures: OFC sequence-completion cells (Rich & Shapiro 2009; Schuck 2016), task-set disengagement (Collins & Frank 2014), and the post-completion No-Go refractory period (Mayr & Keele 2000). The collocation hypothesis is the architectural commitment: V3 treats them as one operator, EXP-0156 and EXP-0162 probe whether they should remain co-located or split.
Biological Grounding
- Rich & Shapiro 2009 — rat PFC strategy-switch neurons; transient activity at strategy completion (DOI 10.1523/JNEUROSCI.4732-08.2009).
- Schuck 2016 — human OFC encodes task-stage / state-space position; supports a “where in the sequence am I?” detector that closure can read from (DOI 10.1016/j.neuron.2016.08.019).
- Collins & Frank 2014 (OPAL) — D1/D2 striatal opponent dynamics produce task-set disengagement under DA modulation (DOI 10.1037/a0037015).
- Mayr & Keele 2000 — backward inhibition: post-completion refractory period on re-entry to the just-abandoned task set, six replicating experiments (DOI 10.1037/0096-3445.129.1.4).
- Smith & Graybiel 2013 — dual-operator action-bracketing in striatum + infralimbic cortex; flags closure substrate may be multi-region rather than OFC/ACC-only (DOI 10.1016/j.neuron.2013.05.038).
The 2026-04-27 SD-034 lit-pull (evidence/literature/targeted_review_sd_034/) recommends:
- Implement No-Go as graded score_bias decay rather than hard refractory gate. The V3 implementation matches: MECH-260’s FIFO action-class history produces a graded recency-bias suppression rather than a binary block.
- Closure-detection signal can fire from multiple substrates (not OFC/ACC only). The V3 implementation is consistent: the substrate commitment is the rule-state-delta-stability detector, not anatomical localisation. The architectural choice is substrate-agnostic.
- Post a transient negative bias to the just-completed rule via SD-033a’s per-candidate projection. The V3 implementation differs slightly: closure does not modify SD-033a’s bias weights directly; the negative bias is delivered through MECH-260 action-class suppression. The downstream effect is similar (next-cycle bias against the just-completed action class) but the substrate is different (action-class FIFO rather than rule-bias projection). For V4 reconsideration: routing the post-completion bias through SD-033a’s existing per-candidate bias projection would unify the bias channels.
What This SD Enables
Validation experiments queued + run + PASSed 2026-04-21:
- V3-EXQ-460 (EXP-0156 verified-but-not-released): substrate-landing diagnostic, 6 sub-tests covering backward compat, wiring, beta release, No-Go, pe reset, mode conditioning. PASS on smoke.
- V3-EXQ-466 (EXP-0162 satisficing / residue discharge): residue-discharge landing diagnostic, 5 sub-tests covering near attenuation, far spared, invariant preserved, closure→discharge end-to-end, distant-z spares. PASS on smoke.
- V3-EXQ-468 (EXP-0164 commitment vs contradiction): coupled SD-034 + MECH-268 PE-saturation diagnostic. PASS on smoke.
Behavioural variants with full E3 task loop + tolerance-band completion env are deferred. Both variants depend on phased rule_state training (not yet on the V3 roadmap) and on a CausalGridWorldV2 task-loop env extension that has not been authored.
Related Claims
- MECH-260 — dACC action-class suppression. Closure inject_nogo extends the existing FIFO mechanism with a closure-driven entry point.
- MECH-261 — write-gate registry. Closure consumes the mode-conditioning predicate to gate firing in internal_replay / offline_consolidation modes.
- MECH-262 — rule-bias projection from SD-033a. Closure reads SD-033a write_gate as a precondition.
- MECH-094 — hypothesis tag / categorical write gate. Closure mode-conditioning generalises this to the closure operation.
- MECH-268 — dACC PE saturation. Closure couples to MECH-268 via
closure_reset_pe_emaand the optionaldacc_pe_capinstall.
StepHarness write-path audit (commitment_closure:GAP-10)
Completed: 2026-05-17. All four governance write sites named in the GAP-10 spec were walked and classified. The audit question (per commitment_closure_plan.md Phase 8): does any commitment/closure write site need to be re-routed through StepHarness, or does each site have a documented architectural exception?
Note on StepHarness location. StepHarness lives in experiments/_harness.py, NOT in ree_core/. It enforces the canonical waking per-tick sequence (sense → record_transition → clock/E1/generate → update_z_goal → update_schema_wanting → select_action → env.step → update_residue). Governance writes that run inside select_action() are within-step-7 architectural exceptions by design — they fire after the canonical sense/ update_z_goal inputs are in place and before env.step consumes the action.
Write sites and exception classifications
| Site | Code location | Exception class | Notes |
|---|---|---|---|
SD-034 closure pulse (closure_operator.tick()) | agent.py:3525-3550, called at end of select_action(), after action tensor is formed | Within-select_action governance write | _current_latent (sense step 1) and mode/gate (update_z_goal step 4) are already in place. Closure fires after, not before, the canonical update steps. Hypothesis-tag guard (hypothesis_tag=False) enforced at call site — replay paths cannot trigger closure. |
MECH-260 dacc.record_action() | agent.py:3511-3518, end of select_action() | Within-select_action governance write | Execution-path recency recording — appends chosen action class to _action_history FIFO. Fires after action is chosen; consumed by next select_action() call. |
MECH-260 dacc.inject_nogo() | closure_operator._fire() → triggered by tick() at agent.py:3541 | Within-select_action governance write | Closure-driven No-Go injection (multiple record_action-equivalent pushes for the just-completed action class). Mechanistically identical to record_action() but semantically distinct; both are within step 7. |
MECH-268 dacc.reset_outcome_history() | closure_operator._fire() → agent.py:3541 | Within-select_action governance write | SD-034 closure hook: clears the MECH-268 saturation FIFO so the next rule-state starts fresh. Fires inside step 7 alongside the other closure sub-signals. |
MECH-268 dacc.reset_episode_pe() | closure_operator._fire() → agent.py:3541 | Within-select_action governance write | Rebaselines the precision-weighted PE EMA on closure. Distinct from reset() (which also clears action history — too destructive for a per-closure event). Within step 7. |
MECH-268 dacc.record_outcome() | EXQ-463 / EXQ-468 experiment scripts only; NOT in agent.py | Experiment-only unit test; canonical wiring deferred | Outcome class labeling after env.step() would naturally sit at StepHarness step 10 (after update_residue()). This call site is absent from the agent canonical path intentionally: env-level outcome tagging requires the GAP-3 env extension (counter-evidence/dual-cue hooks). EXQ-463 and EXQ-468 test the MECH-268 mechanism in isolation directly on a standalone DACCAdaptiveControl object, bypassing REEAgent. No routing gap; intentional deferral. |
SD-033a lateral_pfc.update() | agent.py:2835, inside select_action() after MECH-319 sim-mode gate check | Within-select_action governance write | z_delta/z_world from _current_latent (sense step 1); lpfc_gate from salience.write_gate("sd_033a") (update_z_goal step 4). Both prerequisites established by canonical harness steps before step 7. MECH-319 sim-mode gate (simulation_mode_rule_gate) guards against replay-path rule-state writes at this site. |
Audit result: ALL six write sub-sites are documented exceptions. Zero require re-routing through StepHarness. The one pending item (dacc.record_outcome() in canonical agent path) is a deferred wiring task gated on GAP-3 env extension, not a routing error. The GAP-10 acceptance criterion is satisfied.
Sequence alignment note
The four major sites all run inside select_action() (StepHarness step 7). By the time step 7 runs, the canonical prerequisites are in place:
_current_latent(z_world, z_self, z_delta) set bysense()at step 1.update_z_goal(benefit_exposure, drive_level)has run at step 4, setting salience gate state includingwrite_gate("sd_033a").- No subsequent harness step modifies these values until
env.step()at step 9.
There is therefore no ordering hazard between the governance writes and the canonical sequence.
Anchor Documents
- Anchor doc:
REE_assembly/evidence/planning/sd033_governance_plan.md - Source thought file:
docs/thoughts/2026-04-20_ocd4.md - Lit-pull review:
evidence/literature/targeted_review_sd_034/(6 entries, 2026-04-21 + 2026-04-27) - Implementation notes (full):
ree-v3/CLAUDE.mdlines 1476-1555