Play Mode

Definition

Play is a bounded behavioral mode defined by three properties:

Synthetic goal and harm signals — z_goal and z_harm are set arbitrarily (not by homeostatic drive or actual sensory harm), for the duration of the episode
No actual damage or benefit — the agent’s actions have no lasting residue in the environment during play; real stakes are suspended
Learning proceeds as if real — weight updates flow through E3/E2/E1 as if the synthetic signals were genuine homeostatic and harm signals

The play frame is opened and closed by a bilateral social signal co-maintained by the agent and its environment or partner. The signal is externally legible and mutually monitored (see INV-059, ARC-049).

Why Play Is Structurally Necessary (INV-058)

A system that can only learn from real-consequence episodes faces a dilemma:

Restrict exploration to known-safe strategies → safe but non-adaptive; cannot acquire novel goal-pursuit competence
Accept harm exposure during strategy acquisition → adaptive but costly; every novel exploration run risks real harm

Play resolves this by decoupling strategy acquisition from harm exposure. The agent learns HOW to pursue goals during play; real episodes supply the calibration for WHAT the correct goal intensities and harm thresholds are.

This makes play the online complement to offline consolidation (INV-049). Offline consolidation (SWR replay, REM precision recalibration) processes already-acquired experience. Play generates novel experience cheaply enough that consolidation is affordable.

Mechanism: Synthetic Signal Substitution (MECH-194)

During a play episode:

z_goal is seeded synthetically — set to an arbitrary target without requiring drive_level elevation (SD-012)
z_harm is set synthetically — harm signals flow but correspond to no real-world harm event
The play frame tag (ARC-049) marks the episode as authorized-synthetic
Gradient flow through E3/E2/E1 is not suppressed — weights update as if signals were real

Contrast with simulation (pre-commit channel, MECH-094): simulation suppresses post-commit learning to prevent counterfactual updates from contaminating real-world weight state. Play does the opposite — it permits full learning while authorizing the synthetic signals as legitimate training targets.

Mode	Signals	Actions	Learning
Normal	Real	Real	Full
Simulation (pre-commit)	Real	Virtual	Suppressed
Play	Synthetic	Real	Full
Sleep replay	Real (consolidated)	None	Full (offline)

Strategy/Calibration Dissociation (MECH-195)

At play episode close, two things are distinguished:

What transfers to real episodes:

Trajectory competence — E3 has learned goal-pursuit structure
Action-object associations — which action sequences lead toward goal states
Policy shape — the general form of goal-directed navigation

What does NOT transfer directly:

Goal magnitude calibration — what intensity of z_goal is appropriate for real drives
Harm threshold calibration — where the actual harm boundary is

The magnitude calibration is re-anchored by real homeostatic and harm signals after play ends (MECH-196). The agent learned how to pursue goals during play; real experience tells it how urgently to pursue them and where the real danger lies.

This dissociation is what makes play productive rather than contaminating. Without it, play-derived goal/harm magnitudes would corrupt real decision-making.

Episode Close and Recalibration (MECH-196)

At play episode termination — triggered by the bilateral frame signal dropping (INV-059, ARC-049) — the agent transitions from synthetic to real signal mode:

Frame signal drops (mutual — both agent and environment/partner signal episode end)
Real homeostatic drive level overrides synthetic z_goal magnitude
Real harm exposure overrides synthetic z_harm thresholds
Play-acquired policy structure persists; only the magnitude calibration is reset

Safety property: if the frame signal collapses mid-episode (unilateral or signaling failure), recalibration is triggered immediately. This limits the exploitation window: unilateral frame collapse forces real-signal re-entry, not continued synthetic-harm masking.

Frame Maintenance Signal Architecture (ARC-049)

The play frame is architecturally distinct from the hypothesis tag (MECH-094):

Feature	Hypothesis tag (MECH-094)	Play frame tag (ARC-049)
Scope	Agent-internal	Bilateral: agent + environment/partner
Purpose	Prevents counterfactual learning during simulation	Authorizes synthetic signals as training targets
Monitoring	Self-monitored	Mutually monitored, externally legible
Failure mode	Tag loss = PTSD/psychosis mechanism	Frame collapse = exploitation or recalibration

In the V3 single-agent degenerate case, the environment is complicit by design (the experimenter sets play mode parameters). The bilateral architecture is the V4 multi-agent requirement, where two agents must co-maintain the play frame.

The primate analogue is the play bow (episode open signal) and play face (ongoing frame maintenance signal). Q-035 asks whether open/close transitions are sufficient or ongoing exchange is required.

Play as SD-012 Curriculum (ARC-050)

SD-012 (homeostatic drive) is currently blocked: drive_level must scale benefit_exposure to enable z_goal seeding, but EXQ-085 through 085d all fail (z_goal_norm < 0.1).

Play provides an alternative path:

Set z_goal synthetically during a play episode (ARC-049 authorizes this)
E3 learns goal-pursuit structure against the synthetic z_goal
When real homeostatic drive does activate z_goal, E3 already has goal-pursuit competence
SD-012 validation becomes: does real drive_level correctly elevate z_goal and engage the already-learned pursuit structure?

This reverses the apparent dependency: play enables SD-012 experiments, rather than requiring SD-012 to be solved before play is possible. Play is a prerequisite enabler for SD-012 validation, not blocked by it.

Play Type Developmental Progression (INV-060, MECH-197)

Play is not monolithic. The type of play changes as the agent’s subsystem competence develops, with each type training specific architectural components in dependency order:

Sensorimotor Play (earliest)

“What happens if I do this?” Repetitive action-outcome exploration.

Subsystems trained: E1 world model (sensory prediction), E2 motor model (action-outcome mapping)
Synthetic signal complexity: single-step goal, binary harm
Developmental position: transition point out of the infant phase (INV-055). The infant’s novelty-driven exploration becomes structured as soon as synthetic goals can be set.
Frame complexity: minimal — caregiver maintains frame unilaterally; agent does not yet monitor frame state

Constructive Play

Building, sequencing, multi-step assembly toward a target state.

Subsystems trained: E2 rollout competence (multi-step forward model), E3 trajectory selection (action sequencing)
Synthetic signal complexity: compositional goals — agent chains actions toward a synthetic target
Developmental position: requires sensorimotor play competence (E1/E2 world and motor models must be functional)
What emerges: trajectory competence (MECH-195’s transferable output) first develops here

Pretend Play (MECH-198)

“This stick is a sword.” Counterfactual representation within a play frame.

Subsystems trained: MECH-094 hypothesis tag + ARC-049 play tag intersection; E3 commitment architecture in synthetic mode
Synthetic signal complexity: synthetic entities layered on top of synthetic goals — representational substitution
Developmental position: requires constructive play competence (trajectory selection must be functional). This is the first point where the hypothesis tag and play tag co-operate: the agent maintains a counterfactual representation it knows is synthetic, within a play frame with synthetic stakes.
What emerges: full commitment architecture exercised in low-stakes mode; counterfactual reasoning under play-frame protection
Testable prediction: agents whose curriculum skips pretend play should show commitment architecture failures in adult real-consequence operation — specifically: false commits, frame confusion between real and hypothetical (treating real consequences as synthetic), and undertrained commitment gating. A developmental_ablation_discriminative_pair (FULL_PLAY_CURRICULUM vs SKIP_PRETEND) should show this signature.

Games with Rules

Structured play with agreed-upon constraints, turn-taking, defined roles.

Subsystems trained: social coordination (ARC-047), constraint satisfaction, shared-state monitoring
Synthetic signal complexity: synthetic rules as frame-internal constraints — the play frame has internal structure beyond “this is play”
Developmental position: requires pretend play competence (commitment architecture and counterfactual reasoning must be functional)
Frame complexity: ARC-049 frame becomes structured — not just “this is play” but “this is play with these specific shared constraints.” This answers Q-035 empirically: games with rules require ongoing shared-state monitoring, not just open/close transitions.

Cooperative/Social Play

Joint goals, role negotiation, coordinated multi-agent strategy.

Subsystems trained: full multi-agent ARC-049 with mutual frame maintenance, MECH-127 counterfactual other-cost
Synthetic signal complexity: coordinated synthetic goals across agents; shared play-frame
Developmental position: requires games-with-rules competence (shared-state monitoring must be functional)
Frame complexity: full bilateral ARC-049 — agents co-maintain frame as peers. The social harness (ARC-047) scaffolds this until the agents can maintain frames without caregiver assistance.

Caregiver Role Transition (MECH-199)

The caregiver’s role shifts at each developmental transition:

Phase	Caregiver role	Frame function
Infant (INV-055)	Damage protection (ARC-046)	No play frame; caregiver attenuates real harm
Child — sensorimotor play	Frame-setter	Caregiver unilaterally opens/closes play frame
Child — constructive/pretend play	Frame-maintainer	Caregiver monitors frame and intervenes if real harm intrudes
Child — games with rules	Frame co-participant	Caregiver participates in structured play, models rule-following
Child — cooperative play	Scaffolded withdrawal	Caregiver facilitates peer play, begins stepping back
Adult	Peer	Mutual frame maintenance without asymmetric authority

This maps the caregiver requirement (INV-043) onto specific play-type phases: the caregiver is not just “protecting the child” but actively maintaining the play frame that enables strategy acquisition. The caregiver IS the bilateral frame-maintainer that ARC-049 requires during the developmental period when the agent cannot yet self-monitor frame integrity.

Frame Integrity: Play vs. Manipulation (INV-059)

If an agent can assert play-frame unilaterally while the partner continues acting under real-harm assumptions, the frame distinction becomes exploitable:

Agent A asserts play (synthetic harm for itself)
Agent B believes frame is real (acts under real-harm calibration)
A causes real harm to B while B’s defensive responses are suppressed (B expects play-level harm)

Preventing this requires the frame signal to be bilateral and monitored: both parties sustain the signal for the frame to hold. Frame collapse without mutual signal = real harm with reduced defensive response = manipulation.

This is structurally equivalent to the consent requirement in human play: both parties must agree the frame is play, and either party can exit the frame, forcing recalibration.

Open Question (Q-035)

What is the minimal signal architecture for play frame maintenance?

Hypothesis A: A single bilateral open/close signal is sufficient. The frame is established at episode start and terminated at episode end; no intermediate monitoring is needed.
Hypothesis B: Ongoing signal exchange (a continuous play-face equivalent) is required. The frame must be actively re-affirmed throughout the episode.

The answer determines:

Whether mid-episode frame collapse is detectable before close
The strength of INV-059’s exploitation-limiting property
Whether ARC-049’s implementation requires a heartbeat signal or just open/close transitions

Empirical reference: animal play uses both play bow (open) and ongoing play face (maintenance), suggesting ongoing exchange — but whether both are necessary or just observed is unknown.

Connections

Claim	Relationship
INV-049	Play is online complement to offline consolidation
MECH-094	Hypothesis tag analogue; play tag is bilateral where hypothesis tag is internal
SD-012	Play bypasses SD-012 drive_level requirement; enables SD-012 curriculum
EXQ-223	Minimal mind (E1+E2+hippocampus) may use play episodes as training regime
ARC-014	Commitment architecture; play suspends real commit consequences
MECH-182-185	Social signal repertoire; play bow/face are stereotyped signals in this repertoire
INV-055	Infant stage; sensorimotor play is the transition out of infancy
INV-041	Child stage; play-type progression IS the child developmental curriculum
INV-043	Caregiver requirement; caregiver provides frame maintenance during childhood play
ARC-046	Infant hazard protection; predecessor to play-frame protection
ARC-047	Social harness; scaffolds cooperative play before peer-level frame maintenance
MECH-189	Super-ordinal goal formation; constructive and pretend play provide the contexts