ARC-033: E2_harm_s Sensory-Discriminative Harm Forward Model

Claim ID: ARC-033 Subject: harm_stream.sensory_discriminative_forward_model Status: IMPLEMENTED Registered: 2026-03-24 Implemented: 2026-04-09 Depends on: SD-010, SD-011, ARC-027 Blocks: SD-003 counterfactual harm attribution pipeline

Problem

SD-003 self-attribution requires counterfactual harm queries: “what harm would have occurred if I had taken a different action?” The original SD-003 design used a HarmBridge projector (z_world -> z_harm) to compute counterfactuals. This is architecturally infeasible:

  • EXQ-093/094 confirmed bridge_r2=0: z_world perp z_harm by construction (SD-010)
  • SD-010 deliberately separates the harm stream from the world stream; HarmBridge has nothing to learn because the streams share no information by design
  • EXQ-094 confirmed training E3 on bridge noise produced 100x regression in harm_var

Additionally, the original single z_harm stream conflated two functionally distinct nociceptive signals. SD-011 (EXQ-178b PASS) confirmed dual-stream separation into:

  • z_harm_s: sensory-discriminative (Adelta-pathway analog) – immediate proximity, forward-predictable from action
  • z_harm_a: affective-motivational (C-fiber analog) – accumulated homeostatic deviation, NOT counterfactually modeled

The z_harm_s stream supports a learnable forward model because hazard proximity has predictable action-conditional structure: moving away from a hazard reduces proximity in a deterministic, action-dependent way.

Solution

E2HarmSForward: a dedicated learnable forward model f(z_harm_s_t, a_t) -> z_harm_s_{t+1}. Analogous to E2FastPredictor.world_forward() but operating on the sensory-discriminative harm stream. Implemented as ResidualHarmForward in ree_core/latent/stack.py (2026-04-09: also wrapped in dedicated module ree_core/predictors/e2_harm_s.py).

Key architectural decision: residual delta rather than direct next-state prediction. Identity collapse affects direct predictors on autocorrelated signals (r~0.9 for z_harm_s). Predicting the delta forces the network to learn change structure (approach increases proximity, retreat decreases it) rather than outputting the identity mapping.

EXQ-166e PASS confirmed the harm-delta architecture (delta_r2=0.641, causal_gap_approach=0.005). EXQ-195 confirmed harm_forward_r2=0.914 – the forward model component works. The attribution_gap failure in EXQ-195 was classified non-contributory (SD-003 pipeline wiring incomplete, not ARC-033 failure).

Redesigned SD-003 Pipeline (post-SD-011)

z_harm_s_cf = E2HarmSForward(z_harm_s_t, a_cf)
causal_sig  = E3.harm_eval_z_harm(z_harm_s_actual) - E3.harm_eval_z_harm(z_harm_s_cf)

This operates entirely within the harm stream. No cross-stream bridge required.

Architecture Context

E2HarmSForward is NOT part of the E2FastPredictor. It is a separate module that:

  • Takes z_harm_s (from HarmEncoder/SD-010) as input
  • Takes action_onehot as input (same encoding as E2.world_forward)
  • Predicts z_harm_s_{t+1} via residual delta

Training protocol (phased):

  • P0: HarmEncoder warmup (z_harm_s encoder trains on proximity supervision)
  • P1: E2HarmSForward trains on frozen z_harm_s targets (stop-gradient on z_harm_s inputs)
  • P2: Evaluation

Stop-gradient on z_harm_s during forward model training prevents the HarmEncoder from drifting to make predictions trivially easy (representation collapse). The harm stream is low-variance compared to z_world – small learning rate (5e-4) recommended.

What This Enables

SD-003 full counterfactual harm attribution pipeline:

  • z_harm_s_cf = E2HarmSForward(z_harm_s, a_cf) for any counterfactual action a_cf
  • causal_sig = E3(z_harm_s_actual) - E3(z_harm_s_cf)
  • This gives E3 a mechanism to distinguish agent-caused harm from environment-caused harm

Note: z_harm_a (affective-motivational, SD-011) does NOT need a forward model. It feeds E3 commit gating directly as motivational urgency (not used in counterfactuals).

Module Location

# Primary implementation (residual delta architecture):
from ree_core.latent.stack import ResidualHarmForward

# Dedicated wrapper with config (ARC-033):
from ree_core.predictors.e2_harm_s import E2HarmSForward, E2HarmSConfig

# Deprecated (identity collapse):
# from ree_core.latent.stack import HarmForwardModel  -- do not use

Configuration in config.py: E2HarmSConfig (standalone) or via LatentStackConfig.use_e2_harm_s_forward (disabled by default, backward compatible).

ML Engineering Notes

  • Stop-gradient on z_harm_s inputs to forward model during training (prevent encoder drift)
  • Small learning rate (5e-4, cf. E2 self forward model 3e-4)
  • Residual delta architecture (not direct next-state) – see ResidualHarmForward docstring
  • Phased training required: P0 encoder, P1 forward model, P2 eval
  • MECH-094: not applicable (waking forward model, not replay content)
  • Identity collapse risk: mitigated by residual delta + stop-gradient

Biological Grounding

Keltner et al. (2006, J Neurosci): predictability suppresses sensory-discriminative pain activity (S1/S2). The brain models expected nociceptive consequences of voluntary movement and cancels predicted input – the efference copy mechanism applied to the spinothalamic nociceptive pathway. This cancellation is NOT available for the affective stream (ACC/insula integrates homeostatic context beyond single actions).

The existence of predictive cancellation in S1/S2 but not ACC provides direct biological evidence that the sensory-discriminative stream (z_harm_s) supports a forward model while the affective stream (z_harm_a) does not. E2HarmSForward is the REE implementation of this predictive nociceptive mechanism.

Validation Experiment

V3-EXQ-264: ARC-033 E2_harm_s forward model validation (queued 2026-04-09)

  • Two conditions: E2_harm_s enabled vs disabled
  • Metrics: forward_r2, z_harm_s_pred_norm, harm_s_cf_gap
  • 3 seeds, phased training (P0 100ep + P1 50ep)
  • SD-003: self_attribution.counterfactual_e2_pipeline (unblocked by ARC-033)
  • SD-010: harm_stream.nociceptive_separation (prerequisite)
  • SD-011: harm_stream.dual_nociceptive_streams (prerequisite – z_harm_s stream)
  • ARC-027: harm_stream.parallel_thalamic_pathway (prerequisite)
  • MECH-112: approach_avoidance.go_channel (downstream – benefit-side counterpart)
  • ARC-030: approach_avoidance.symmetric_evaluation (downstream)

REE is developed by Daniel Golden (Latent Fields). Apache 2.0.