中文

Part 1

Define the Context Contract

Context is not text around a model. It is the runtime boundary every turn must obey.

Context Is a Runtime Boundary

Reading Contract: Use this chapter to set the system boundary. Track what becomes durable, turn-local, injected, optional, and replaceable; by the end, prompt construction should feel like a projection over owned runtime state.

Codex context boundary stack from durable ledger, turn-local envelope, injected fragments, optional planes, and replaceable checkpoints to a model-visible prompt
The core move is to treat the prompt as a rebuildable view over named owners, rather than as the owner of history, policy, tool output, or compaction state.

The larger Codex architecture book treats the session runtime as the place where events, tools, policy, streaming, and durable state meet. This book starts one layer deeper: before a model can decide anything, Codex must decide what counts as context. Without that boundary, every later feature becomes a prompt concatenation trick. Tools leak observations forever. Policy changes get buried in old text. Compaction becomes lossy amnesia. Resume becomes guesswork.

Codex avoids that failure mode by making context a runtime-managed artifact. A turn does not send “the conversation” to the model. It sends a prompt projection of a thread ledger under a turn-specific envelope. The projection may include history, initial context, settings diffs, skill guidance, plugin guidance, hook context, memory summaries, tool outputs, images, and compaction summaries. Each piece has ownership and timing.

By the end of this chapter, you should see the system’s core move: context is treated like a mutable database view with audit logs, not like a text area.

This chapter is grounded in the history manager, turn context, context fragments, compaction entry points, and rollout reconstruction code: ContextManager, TurnContext, ContextualUserFragment, InitialContextInjection, and rollout reconstruction.

The Boundary Codex Needs

An agent context system has to answer five questions every turn:

QuestionCodex answer
What is durable?Response items recorded in the thread history and rollout evidence.
What is turn-local?The TurnContext envelope: model, cwd, policies, features, tools, and current runtime facts.
What is injected?Typed context fragments rendered into model-visible messages.
What is optional?Skills, plugins, memory, tool outputs, images, and other material with budgets or filtering.
What is replaceable?Compacted history installed as a checkpoint with explicit replacement history.

That split matters because context changes at different speeds. The user prompt changes every turn. Environment metadata changes occasionally. Permission policy may change mid-thread. Skills may be explicitly mentioned for one turn. Tool outputs may be too large. Compaction may rewrite the old transcript entirely. One buffer cannot represent those lifetimes cleanly.

The important arrow is not the one into the model. It is the loop back into the rollout and history ledger. Codex keeps enough evidence to later rebuild why a prompt looked the way it did.

Naive Prompt vs Runtime-Managed Context

The simplest agent design treats the prompt as a mutable string. Each turn, the previous transcript is concatenated with the new user message and sent to the model. That design has predictable failure modes:

Failure modeWhy it happens with naive promptsWhat Codex does instead
Permission driftStale policy text remains long after the policy changes.Policy is computed per turn from the envelope and emitted as a diff fragment.
Tool floodLarge tool outputs consume the entire window.Output truncation is applied at insertion time; the rollout still carries the full payload.
Modality crashesAn image-bearing message is sent to a text-only model.History is normalized for the active model just before sampling.
Compaction amnesiaA summary loses paired call/output protocol shape.Compaction installs replacement history that preserves protocol pairing.
Resume guessworkReloading a JSON dump of messages misses why the prompt was shaped that way.Rollout reconstruction replays evidence to rebuild ledger and baseline together.

Each row corresponds to a Codex subsystem covered later. The point of the table is not the rows themselves but the underlying move: every failure mode of a naive prompt becomes a runtime concern with a named owner.

The Prompt Is a Projection

The prompt projection is assembled from state that is deliberately not all in the same object. ContextManager owns model-visible history. TurnContext owns the active turn envelope. The context module owns typed fragments. Compaction owns replacement history. Rollout reconstruction owns the logic for rebuilding the effective ledger after resume or fork.

This separation is more expensive than appending strings. It buys three properties Codex needs:

  • Reinterpretability. History can be normalized differently for a model that does not accept images or for a provider with a different truncation policy.
  • Diffability. Runtime facts can be compared with a reference baseline, so Codex can inject only meaningful settings changes.
  • Reconstructability. Durable rollout items can rebuild history after compaction, rollback, and resume.

The following pseudocode is the pattern, not the implementation:

// Pseudocode -- illustrates the projection boundary.
turnEnvelope = resolveTurnEnvelope(config, runtimeState)
ledger = loadThreadHistory(threadId)
ledger.record(contextDiffs(previousEnvelope, turnEnvelope))
ledger.record(userInput)
promptInput = ledger.clone().normalizeFor(modelCapabilities)
sendToModel(baseInstructions, promptInput, toolSpecs)

The pattern matters because it makes prompt construction a repeatable runtime operation. If context is a projection, Codex can change how it projects without corrupting the underlying ledger.

A Map of the Owners

The five owners form a layered stack. Each owner is responsible for one concern; the projection step reads from all of them.

Read the opening figure top-down for prompt construction and bottom-up for replay. Both directions touch the same five owners; the difference is whether the ledger is being read or rebuilt.

Context Has Multiple Lifetimes

Codex context is easier to reason about if you classify every piece by lifetime.

LifetimeExamplesFailure if mishandled
Session lifetimeBase instructions, thread id, persisted rollout, memory mode.Resume cannot recover the same operating frame.
Turn lifetimemodel, provider, cwd, permissions, tools, realtime flag.A model request runs with stale policy or stale capabilities.
Prompt lifetimenormalized history, selected skills, selected plugins, hook context.Optional material crowds out core task state.
Checkpoint lifetimecompaction summary and replacement history.A long thread forgets the wrong details.
Client lifetimetoken usage, TUI display state, app-server replay.UI reports a context state the runtime did not own.

The road not taken is a single “conversation messages” list. That is tempting because every model API eventually needs a list. Codex keeps the list as an output format, not as the core abstraction.

The lifetimes are not equal. A single session usually contains many turns; each turn produces a single prompt projection; checkpoints span ranges of turns; clients reattach over time. Reading the code with that lifetime model in mind makes it clear why a single mutable buffer cannot represent every owner.

Source-Level Map

The source tree confirms the boundary:

  • context_manager/history.rs stores and prepares response items.
  • session/turn_context.rs gathers the turn envelope.
  • context/* renders typed runtime facts as fragments.
  • session/turn.rs decides when to record context, user input, skills, plugins, hooks, and pending input.
  • compact.rs and compact_remote.rs rewrite history under a checkpoint protocol.
  • session/rollout_reconstruction.rs reconstructs effective history from durable rollout facts.

This is why the system is worth a book. The knowledge is scattered across modules because the responsibilities are real; the narrative has to put them back together.

Apply This

  1. Projection Boundary. Use durable state as the input to prompt construction. Keep raw events separate from model-ready messages, and reject projection code that starts mutating the ledger.
  2. Lifetime Labels. Classify every context source by how long it should survive. Name session, turn, prompt, and checkpoint lifetimes, and treat “temporary” data that becomes durable as a design fault.
  3. Context Ownership. Give each context plane one owner. Route updates through that owner, and treat client-side injection of model-visible state as an ownership break.
  4. Auditable Forgetting. Make compaction and truncation explicit events. Store replacement history or summaries as checkpoints, and avoid summaries that cannot explain what they replaced.
  5. Prompt as View. Treat the final model input as a view. Rebuild it on demand for each model capability set, and avoid assuming that every provider accepts the same prompt shape.