Three-Loop Learning Channel Separation

Source thought: docs/thoughts/2026-03-14_three_bg_systems_error_signals.md

Three BG-like cortico-striatal loops require distinct learning channels (ARC-021)

Claim Type: architectural_commitment Subject: basal_ganglia.three_loop_learning_channel_separation Status: candidate Claim ID: ARC-021

The REE architecture is organised around three functionally distinct cortico-striatal-like learning loops, each defined by a different error signal and a different level of representational abstraction. These are:

Sensorium loop — learns from sensory prediction error: the mismatch between predicted and observed sensory latent state. Corresponds to E1 in REE. Continuous; not gated by action. Maintains the “perceived present” — what the system expects to sense given current context.
Action-enacting loop — learns from motor-sensory error on the conceptual sensorium: the mismatch between predicted and actual effect of an action on the unified latent space z_gamma (where coherent sensory objects form). Corresponds to E2 in REE. Gated by action; operates at the level of objects, not raw sensory features.
Planning-gates loop — learns from harm and goal errors: realized harm vs. expected harm, goal achievement vs. goal deviation. Corresponds to E3 and HippocampalModule in REE. Outcome-level; gated by trajectory commitment; the residue field accumulates harm signal across trajectories.

These three loops are the structural substrate of the three-gate BG architecture (Q-019 model B). ARC-021 is the claim that this separation is required — not optional optimization — because the error signals are incommensurable (MECH-069).

Sensory prediction error, motor-sensory error, and harm/goal error are incommensurable (MECH-069)

Claim Type: mechanism_hypothesis Subject: latent_stack.three_loop_error_signal_incommensurability Status: candidate Claim ID: MECH-069

The three error signals that drive learning in the REE architecture are incommensurable: no one of them can be derived from or collapsed into another without loss of the information each carries.

Sensory prediction error (E1): “I failed to predict the sensory world correctly.” This error carries no information about which action caused the mismatch, or whether the mismatch was harmful. It indexes world-model fidelity.
Motor-sensory error (E2): “My action had a different effect on latent sensory objects than I expected.” This error requires knowing what action was taken — it is action-conditioned. It indexes motor-model fidelity. It does not carry information about whether the effect was harmful or goal-relevant.
Harm/goal error (E3 + residue): “This trajectory caused harm / failed to achieve the goal.” This error requires knowing the outcome of a committed trajectory. It indexes ethical and purposive fidelity. It does not carry information about sensory prediction quality or motor prediction quality.

Implication: collapsing these signals into a single scalar prediction error — as a naive world-model architecture would — misattributes credit across all three. A sensory mismatch would incorrectly update motor weights; a harmful outcome would incorrectly update sensory weights. The separation is not for efficiency; it is for correct credit assignment.

Relationship to MECH-058: timescale separation between E1 and E2 (MECH-058) is a consequence of incommensurability. Different error signals have different natural timescales — sensory prediction error is frequent and small; motor-sensory error is sparser (only at action steps); harm/goal error is slowest (only at trajectory completion). The timescale separation follows from the error signal separation, not the reverse.

Relationship to SD-003: the gap E2(z_t, a_actual) − E2(z_t, a_cf) is the agent’s causal signature on the object world. When this gap is near zero for all counterfactual actions, the latent transition was environment-caused; when the gap is large, the agent’s action was causally responsible. This E2-derived causal signature is the correct substrate for self-attribution precisely because E2’s error signal is action-conditioned and operates at the level of coherent objects — not raw sensory streams, not harm outcomes.

E2 is a conceptual-sensorium motor model with planning horizon exceeding E1 (MECH-070)

Claim Type: mechanism_hypothesis Subject: latent_stack.e2_conceptual_sensorium_motor_model Status: candidate Claim ID: MECH-070

E2 is NOT a raw sensory predictor. It is a motor model of the conceptual sensorium — the unified latent space z_gamma where all sensory modalities have been bound into coherent objects by the LatentStack.

E2 specifics:

Input: (z_gamma_t, action_t) — latent object-world state plus action
Output: z_gamma_{t+1} — predicted next object-world state
Error signal: MSE(E2(z_t, a_t), actual z_gamma_{t+1}) — motor-sensory mismatch at the object level
Training: supervised on (z_t, action, z_{t+1}) transition tuples recorded from actual execution

Why E2’s planning horizon must exceed E1’s:

E1 predicts the “perceived present” by projecting the sensory latent stream a short distance forward without action conditioning — it estimates what the system will sense if it continues in its current context. E2 predicts the causal chain of a motor act on the object world, which unfolds over a longer interval. For E2 to be useful for trajectory planning, it must be able to simulate further ahead than E1’s associative horizon:

E1 prediction_horizon: default 20 steps — how far ahead E1 simulates sensory context
E2 rollout_horizon: default 30 steps — how far ahead E2 simulates motor consequences

The planning horizon ordering E2 > E1 is architecturally required: if E2 only predicted as far as E1’s perceived present, trajectory planning would offer no planning advantage over simply following the sensory stream.

HippocampalModule chaining:

HippocampalModule chains E2 rollout kernels into even longer plans (hippocampal planning horizon » E2’s rollout horizon). E2 provides the motor-sensory kernels; the hippocampus chains them into multi-step trajectories navigated under residue-field terrain (ARC-018, MECH-033). The kernel metaphor in MECH-033 refers to E2’s single-transition outputs being composed into longer rollouts — not to E2’s individual planning horizon being short.

Connection to ARC-021: E2 is the “action-enacting loop” of the three-loop architecture. Its error signal (motor-sensory mismatch on z_gamma) is incommensurable with E1’s sensory prediction error and E3’s harm/goal error (MECH-069). This grounds why E2 must be trained separately from E1, with its own optimizer and its own transition buffer recording (z_t, action, z_{t+1}) tuples.

Developmental calibration and locking of the three loops (ARC-076 / MECH-335 / MECH-336)

ARC-021 / MECH-069 / ARC-023 say the three cortico-striatal loops are architecturally distinct (separate channels, incommensurable errors, distinct timescales). They do not say how the loops’ relative commitment gains are set. ARC-076 (registered 2026-05-17, from the targeted_review_play_commitment_loop_personality_window lit pull) supplies that: the loops are calibrated during a bounded juvenile/adolescent developmental window and then crystallized — an instance of INV-075’s LOCK arm — and the locked calibration is what REE should call personality.

Two-layer structure (forced by the McCrae 2000 counterweight, which shows basic-trait stability is largely endogenous, not environmentally written):

L1 — set-points / temperament. The basic drive-prior gains of the loops (above all the limbic-loop value prior: what is worth committing to) lock on a largely endogenous maturational timer. The environment does not write L1. INV-075 lock arm with an endogenous, schedule-driven trigger.
L2 — commitment policy / character. How those drive priors are gated into actual commitment under context is calibrated by the juvenile play/peer environment during the window and consolidated at closure. L2 is the environmentally-determined layer.

So personality is neither a pure environmental readout nor purely endogenous.

Child mechanisms:

MECH-335 — staggered ventral→dorsal windows. The loops do not share one critical period; the limbic/value loop (ventral striatum / OFC) opens earliest and is most environmentally sensitive, then associative, then sensorimotor, with consolidation that can lag window-close (Bijlsma 2023: mPFC change during the window, OFC change after). The ARC-076 calibration phase is therefore an ordered schedule of per-loop windows plus a post-window readout.
MECH-336 — maladaptive-lock failure mode. Adequate environmental input does not determine the locked value (Ham 2026: ~20% lock a maladaptive configuration despite abundant play). Two INV-075-shaped failure modes: lock-onto-bad-calibration / lock-too-early → rigid maladaptive personality; failure-to-lock → identity diffusion. This is REE’s hypothesised handle on personality pathology (a hypothesis, not a settled etiology) and the falsification surface for ARC-076.

Relation to V3-EXQ-543h / MECH-334. Same crystallization family, different target: 543h / MECH-334 locks the gated-policy diversity heads; ARC-076 locks the commitment-loop calibration. Both are LOCK-arm instances of INV-075; the 2026-05-17 handoff-failure lit pull established that handoff-alone is not the biological pattern for developmentally-established circuits, which is why ARC-076 is a lock claim. ARC-076 is a V3 design prescription — not yet implemented, no experiment queued at registration.