中文

Part 2

Build the Runtime

The runtime is a scheduler for context, streaming, tools, cancellation, and replay.

Chapter 5: Threads, Sessions, and Durable State

Reading Contract: Use this chapter to understand durable work. Track thread identity, queues, history, rollout, and session ownership as separate answers to state, progress, and resumability.

Durable thread state map separating thread identity, session facade, queues, history, rollout, projections, resume, fork, and rollback
Thread identity, live session ownership, queues, history, and rollout evidence solve different persistence problems and should not collapse into one transcript.

Source boundary: named files, types, functions, tests, schemas, and request or event shapes are verified source only where this chapter links to the pinned Codex commit or its Source Map. Broader architecture terms such as “runtime”, “owner”, “projection”, and “contract” are surrounding contract inference from those visible anchors, not claims about OpenAI service internals.

Chapter 4 ended at the protocol boundary: clients can submit operations, receive events, and exchange model items without knowing how the runtime works. This chapter steps behind that boundary and shows the durable machinery that turns those messages into a live agent thread.

Problem: an agent must be resumable, forkable, interruptible, and observable while still feeling like one continuous conversation.

Thesis: Codex achieves that by separating the live runtime handle from the durable thread record.

Mental model: a thread is the long-lived work ledger; a session is the running process currently serving it.

The Runtime Stack

Codex is not organized around a single “chat object.” The runtime is a stack of handles, each narrowing the authority of the layer above it.

ThreadManager owns the set of live threads and the shared services needed to create new sessions. CodexThread is the stable external handle clients use after a thread is created or resumed. Codex is intentionally narrow: it exposes a submission path and an event path. Session is the large interior object that holds resolved configuration, persistent identity, active turn state, mailbox state, realtime state, goals, review state, and service handles.

This separation matters because clients should not be able to accidentally reach into the scheduler. They submit an operation. The session decides whether that operation starts a new task, steers an active one, records pending input, updates durable metadata, or shuts the thread down.

The First Event Is a Contract

The first client-visible event in a newly opened runtime is the session setup event. It carries the thread identity, session identity, working directory, model choice, provider identifier, approval policy, permission profile, initial messages, and other resolved settings.

That ordering is deliberate. Some startup work can continue after the session is visible: MCP servers may initialize, plugins may report delayed capability, and prewarm work may run in the background. The setup event gives every client a deterministic anchor before those later events arrive.

// Pseudocode - illustrative pattern.
function open_runtime_thread(request):
    resolved = resolve_configuration(request)
    live_store = open_thread_persistence(resolved.thread_identity)
    session = build_session(resolved, live_store)

    emit_event(SessionConfigured(resolved.public_metadata))

    record_initial_history_if_resuming(session)
    start_background_capability_initialization(session)
    start_submission_loop(session)

    return client_thread_handle(session)

The important property is not the exact implementation order. The important property is that clients can treat setup as the beginning of the event stream and can render a resumed thread without waiting for every optional capability to finish loading.

Three Histories

Most mistakes in agent architecture begin by treating “history” as one thing. Codex keeps three histories with different purposes.

HistoryPrimary ownerWhat it answers
Model-visible contextContextManagerWhat should the next model request see?
Rollout JSONLthread persistenceWhat happened, in replay order, so this thread can resume or fork?
SQLite projectionsstate layerWhat can be queried efficiently without replaying every line?

The model-visible context is normalized into model items. It is not a UI transcript and not a raw log. It contains the items that should participate in future inference, filtered by model modality and adjusted by compaction, context injection, tool observations, and pending input.

The rollout file is the replay spine. It records durable runtime facts in order: session metadata, turn context snapshots, model-visible items, event records, compaction records, rollback markers, and other replayable items. It is the source of truth for reconstructing a thread.

The SQLite state is a projection layer. It stores thread metadata, list/search indexes, logs, memory jobs, graph edges, and other query-friendly state. A projection can be repaired or rebuilt from the durable record when the design allows it. The rollout is the ledger; the database is the index and operational state.

Recording One Item May Touch All Three

When a turn records a new item, the runtime has to decide which surfaces should learn about it. A tool result might be model-visible, replayable, useful for client rendering, relevant to analytics, and relevant to trace diagnostics. Those are related, but they are not identical.

// Pseudocode - illustrative pattern.
function record_runtime_item(turn, item):
    if item.belongs_in_model_context:
        context_manager.append(item.as_model_item)

    if item.belongs_in_replay:
        rollout.append(item.as_rollout_item)

    if item.updates_queryable_state:
        sqlite_projection.apply(item.as_projection_delta)

    if item.should_be_seen_by_clients:
        event_stream.emit(item.as_event)

This design protects replay. A client can disappear and rejoin. A database row can be stale and refreshed. A model request can be rebuilt from normalized context instead of screen text. Durable state remains the common language among those surfaces.

Start, Resume, Fork, Roll Back

Thread operations are variations on the same principle: load or choose a replay prefix, build a session over it, then append future items in the same vocabulary.

Resume is not a string reload. It reconstructs model history and session metadata from durable items. Fork is not a copy of a UI transcript. It selects a coherent prefix and opens a new thread whose future diverges from the source. Rollback is not merely deleting visible messages. It rebuilds the active model context from the surviving durable record, taking compaction and incomplete turns into account.

This is why turn context snapshots are durable. They let a later session know which working directory, permissions, model settings, network policy, memory mode, and environment choices surrounded a turn. Without that context, replay would be text without execution semantics.

Session State Is Layered

The live session holds several kinds of state that should not be collapsed.

LayerLifetimeExamples
Configurationsession lifetime with controlled updatesmodel, provider, permissions, working directory, service tier
Mutable session statesession lifetimetoken usage, current metadata, connector selection, rate limits
Active turn stateone in-flight taskcancellation token, tool futures, pending approvals, streamed items
Mailbox and pending inputacross scheduling boundariesmessages that arrive while another turn is running
Durable store handlesthread lifetimerollout writer, thread metadata, state database handle

The layering lets Codex answer a subtle question: “Did this operation change the durable thread, the current live turn, the next turn, or only a client view?” Those are different commitments. A runtime that cannot separate them will eventually lose user input, duplicate tool results, or make resume incoherent.

Metadata Is Extracted, Not Invented

Thread lists and search views should not require full replay on every request. Codex therefore maintains metadata projections: title or preview fields, model provider, memory mode, archive state, working directory, git information, created and updated positions, and pagination anchors.

The projection is useful because it is queryable. It is safe because the durable rollout still exists behind it. When a listing row is missing or stale, the system can scan the corresponding rollout head or replay metadata-bearing items to repair the index. That is the durable-state pattern: optimize reads with projections, but keep the event record authoritative.

Apply This

  1. Keep the live-handle stack explicit, so clients submit operations rather than mutating scheduler internals.
  2. Separate thread identity from session execution, because one durable thread may have many runtime lifetimes.
  3. Emit a deterministic setup event before optional startup work, so every client has the same stream anchor.
  4. Maintain separate model, replay, and query histories instead of forcing one transcript to serve all purposes.
  5. Implement resume, fork, and rollback over the same replay vocabulary used by normal turns.

Closing

Chapter 5 turns the protocol from Chapter 4 into a living runtime. The thread is durable; the session is active; the turn is scheduled work inside that session. Chapter 6 follows one scheduled turn through the loop where Codex builds context, samples the model, executes tools, handles interruptions, and decides whether the agent is done.

Source Map

ConceptSource anchor
Thread manager boundarycodex-rs/core/src/thread_manager.rs
Client-facing thread handlecodex-rs/core/src/codex_thread.rs
Queue-pair runtime facadecodex-rs/core/src/session/mod.rs
Model-visible historycodex-rs/core/src/context_manager/history.rs
Accepted prompt recordingcodex-rs/core/src/session/mod.rs