中文

Implementation Reference

Reading Contract: Use this reference for dense source review. Treat each table row as an audit target: owner, data structure, runtime decision, and failure path.

Implementation audit reference map tying subsystem owners, data structures, runtime decisions, failure paths, and pinned source anchors
The implementation reference is dense by design: each row points to an owner, a data structure, a decision, a failure branch, and a source anchor.

This appendix is the book’s “read this instead of opening the source” page. It collects the implementation facts that are too detailed for a smooth chapter but too important to leave only in source links.

Use it as a verification surface, not as a substitute narrative. A row is verified source only when the linked type, function, test, or constant directly shows the behavior. A row becomes surrounding contract inference when it connects several visible call sites into an architectural boundary. Anything that would require hosted service internals, private model behavior, or unpublished backend state is outside this reference and should not be read into the tables.

Snapshot Rule

All facts on this page refer to Codex commit 569ff6a1c400bd514ff79f5f1050a684dc3afde3. If a later Codex version changes a type or code path, this page must be updated instead of silently relying on branch links.

Source Anchors by Subsystem

Use this table when you want the implementation names behind the conceptual inventory below.

SubsystemConcrete anchors
Runtime contractSubmission / Op, Event / EventMsg
Threads and sessionsThreadManager, CodexThread, Codex
Turn loopsession/turn.rs, run_user_prompt_submit_hooks, record_user_prompt_and_emit_turn_item
Model providersModelClient, ModelClientSession, Prompt
Toolsbuild_tool_registry_builder, ToolRouter, ToolRegistry, ToolOrchestrator
Shell and exec-serverShellHandler, ExecCommandHandler, RpcClient, FileSystemHandler
Patch runtimeApplyPatchHandler, ApplyPatchRuntime, parse_patch, TurnDiffTracker
Hooks and approvalHookEvent, Hooks, Guardian review, assess_patch_safety
Sandboxing and networkSandboxType, get_platform_sandbox, seatbelt.rs, linux-sandbox, network-proxy
App-server and SDKsMessageProcessor, ThreadState, transport modes, Python Codex, TypeScript Codex
TUIChatWidget, BottomPane, AppServerSession
ExtensionsMcpConnectionManager, SkillsManager, PluginManifest, PluginsManager
Multi-agent, cloud, memoryagent graph types, cloud task API, agent identity, memory citations, memory write phase 1
Build and releaseCargo workspace, Bazel verification, Cargo release workflow, npm staging, governance checks

Runtime Spine

StageConcrete implementation factWhat a source reader would retain
EntryThe CLI parses a command, then routes interactive use, exec use, app-server use, login, MCP, debug, and completion commands to specialized crates.The CLI is an entry router, not the owner of the agent loop.
ProtocolA queued Submission wraps an Op, an id, and trace context. An emitted Event wraps an id and EventMsg.Correlation is explicit. Clients can match progress and outcomes to submitted work.
SessionCodex is a small facade over submission sender, event receiver, status receiver, session state, and shutdown future.The public runtime contract is queue in, events out.
Submission loopThe background loop receives operations, dispatches them, starts or replaces active tasks, aborts current turns, queues follow-up input, and emits uniform completion.Agent behavior is task-driven, not one synchronous function call.
Task modelRegular user turns, compact tasks, review tasks, and user shell command tasks are distinct runtime work units.”A turn” is common, but not the only session task.
Turn looprun_turn prepares context, injects extensions, runs hooks, samples the model, handles tool calls, checks continuation, compacts, and stops.The model call is one phase inside a larger state machine.
Tool runtimeA router and registry convert model tool calls into handlers, orchestration, hooks, approvals, sandboxed execution, events, and model-visible output.Tools are governed side effects, not arbitrary callbacks.
SurfaceTUI and app-server consume or translate the same runtime event stream.UI code should not own agent decisions.

Operation Inventory

Op is larger than “user input.” Group the variants by what they let clients do:

GroupExamplesWhy it matters
Turn inputUserInput, UserInputWithTurnContext, UserTurnUser work can carry cwd, model, approval, sandbox, permissions, output schema, environments, and client metadata.
Interruption and cleanupInterrupt, CleanBackgroundTerminalsStopping a turn and cleaning long-lived terminals are separate operations.
Approval responsesexec approval, patch approval, Guardian-denied action approvalThe runtime treats user or reviewer decisions as queued protocol input.
Dynamic toolsdynamic tool responses and cancellation pathsSome tools are executed by a client surface rather than only by the core runtime.
Thread controlcompaction, rollback, shell command, memory mode, goal operationsA thread is persistent state, not just a transient chat.
MCP and extensionsrefreshes, tool/resource flows, elicitation responsesExternal capability can change and can ask for user input.
Realtimestart, audio, text, close, list voicesThe same protocol family covers streaming voice/text sessions.
Multi-agent/collabagent spawning, messages, waiting, close/resume eventsSubagents are first-class runtime interactions.

Event Taxonomy

EventMsg is the runtime’s public diary. A source reader groups it this way:

CategoryRepresentative eventsReader takeaway
LifecycleTurnStarted, TurnComplete, TurnAborted, ShutdownCompleteClients know when work begins, finishes, aborts, or the runtime exits.
Text and reasoningAgentMessage, deltas, reasoning summaries, raw reasoning, section breaksStreaming output is structured; not every model token is final user text.
Item lifecycleItemStarted, ItemCompleted, raw response itemsNewer item events coexist with legacy task/tool events for compatibility.
Tool lifecycleexec begin/output/end, MCP begin/end, web/image begin/end, patch begin/update/endUI can render progress before the final result exists.
Approval and permissionexec approval, patch approval, permission request, user input request, Guardian assessmentHuman or policy decisions are visible events, not hidden prompts.
Context and historycontext compacted, thread rolled back, token counts, goal updatesMemory pressure and persistent thread edits are protocol-visible.
Errors and warningserror, warning, Guardian warning, stream error, deprecation noticeFailures carry category and meaning; they are not only console text.
Realtime and collabrealtime events, collab agent spawn/wait/close/resumeThe protocol covers interactive and multi-agent workflows too.

Config, Constraints, and Security Inputs

Configuration is layered. Source readers look for four questions: who provided the setting, which layer wins, what constraints apply, and whether the runtime is allowed to ignore it.

AreaSource-level behavior
LayeringSettings can come from defaults, user config, profiles, CLI overrides, managed/MDM-style inputs, environment, and app-server writes.
OriginsRuntime code often keeps provenance so clients can explain active values and distinguish user choice from managed policy.
RequirementsCloud or workspace requirements can disable or constrain features rather than merely set values.
Feature precedenceExperimental and managed features may gate methods, transports, app/plugin availability, or UI controls.
Hot changesSome settings are turn-scoped through UserInputWithTurnContext; others are persistent config writes or profile selections.
Auth storageChatGPT/API-key credentials, MCP OAuth credentials, and connector credentials are handled as security-sensitive state with storage fallback behavior.
Secret handlingLogs, events, and UI paths must avoid exposing tokens, bearer credentials, and connector secrets.

This matters because “config” can be part of safety. A model, sandbox, approval mode, permission profile, writable root, or network rule is not just display state; it can decide whether a tool call runs.

State and Persistence

StateWhat persistsWhy it matters
Thread storethread identity, archive state, metadata, goals, memory settings, and resumable workThreads survive a single session and can be resumed, forked, listed, or archived.
Historymodel-visible conversation items and normalized prompt historyThe next model request is derived from recorded state rather than reconstructed from UI text.
Rollout filesraw or replayable event records used for resume/fork/reconstructionA source reader knows that “transcript” and “runtime replay” are related but not identical.
SQLite/log stateapp/server metadata, migrations, WAL choices, retention budgets, and operational recordsOperational behavior is database-backed in places, not only in memory.
Context baselineturn context items track what has already been injectedPrevents repeated or missing context after compaction, resume, or dependency injection.

Turn State Machine

The simplified algorithm is:

  1. Reject empty work unless pending input or continuation requires a turn.
  2. Refresh or construct the model client session.
  3. Run pre-sampling compaction if the current context is already too large.
  4. Resolve turn-scoped context candidates without yet accepting the prompt.
  5. Run user-prompt hooks and dependency prompts. A blocking hook can still stop the turn before the prompt becomes durable conversation history.
  6. Record the accepted user input, turn context, baseline information, and approved injected context.
  7. Build a sampling request from recorded history and active context.
  8. Stream model events and normalize them into runtime items and deltas.
  9. Dispatch tool calls, dynamic tool requests, or approval prompts.
  10. Drain or requeue pending input according to the active turn state.
  11. Compute whether follow-up work is required.
  12. Compact mid-turn if token pressure requires it.
  13. Run stop and after-agent hooks.
  14. Emit completion, abort, or error events.

Important branches:

BranchMeaning
Pending input existsThe loop may continue or requeue rather than lose input that arrived mid-turn.
Tool futures still runningCancellation and completion must account for in-flight side effects.
Stream disconnectSome transport errors are retryable with budgets and backoff; others become structured errors.
Context window exceededThe runtime may compact before retrying or stop with a clear failure.
Hook blocksA hook can stop execution, replace output, ask to continue, or report failure.
Client session resetModel changes, compaction, or transport recovery can force a new client session.

Model Streaming

Streaming goes through a client boundary before the turn loop sees useful events.

PieceSource-reader fact
ModelClientSessionOwns normalized communication with the model provider for a session.
WebSocket vs HTTPSThe client can prefer a streaming transport and fall back depending on provider or failure.
Auth recovery401-like failures can trigger credential refresh or clearer authentication errors.
Retry budgetTransport retries are bounded and use backoff; not every stream error is fatal immediately.
Event mappingProvider ResponseEvent values are mapped into internal items, text deltas, reasoning deltas, tool calls, and errors.
Dropped consumerIf the receiver disappears, streaming and tool work must be cancelled rather than leaking tasks.

Tool Inventory and Dispatch

LayerWhat it owns
Spec planningDecides which hosted, local, MCP, dynamic, deferred, unavailable, and discoverable tools should appear in the model tool list.
Tool kindClassifies tools so handlers and safety checks can match expected payload shape.
RouterConverts a model tool call into a typed invocation and sends it to the registry.
RegistryStores handlers and dispatches by tool kind/name.
HandlerParses arguments, declares safety metadata, runs or delegates work, and returns structured output.
Parallel runtimeAllows safe concurrent calls while serializing writes or nonparallel-safe behavior.
OrchestratorHandles hooks, approval, sandbox attempt, retry/escalation, event emission, and model-visible results.

Tool inventory includes more than local shell and patch: hosted web search, image generation, MCP tools and resources, dynamic client-owned tools, plugin-discovery helpers such as tool search, code-mode or nested tools, and placeholders for unavailable tools that the model should not call.

Shell and Exec Taxonomy

NameMeaning
shell / local shellA command execution path controlled by Codex with cwd, env, approval, sandbox, timeout, and output events.
unified exec_commandA richer command tool path that can manage PTY/process ids, stdin writes, timeouts, and apply-patch interception.
write_stdinSends input to an already-running exec session rather than starting new work.
user shell commandA user-requested shell action queued through session/thread protocol, not a model tool call.
remote/container backendA runtime backend may execute through a remote or container-like environment instead of the local OS directly.

Important details: command output can be truncated for telemetry or UI, PTY sessions have lifecycle ids, zsh-style shell startup has fallback behavior, and apply_patch may be intercepted from shell-like command text so edits still go through patch semantics.

Patch Runtime

Patch handling has three layers:

LayerResponsibility
Patch grammarParses add, delete, update, move, hunk, and EOF operations.
Runtime handlerTurns model arguments into a patch attempt, computes effective permissions, asks approval when needed, emits patch events, and returns model-visible output.
Diff trackerRecords committed file deltas for the current turn so clients can show a turn diff.

Source readers also know that patch arguments can stream, patch events can begin/update/end before final success, shell/unified-exec can delegate to the patch runtime, remote filesystems can alter where patching executes, and denied or failed patches still need clear model-visible results.

Approval, Permissions, Guardian, and Network

LayerDetail
Approval policyUnlessTrusted, OnFailure, OnRequest, Granular, and Never decide whether prompts are shown, auto-rejected, or escalated.
Permission profileFilesystem and network permissions are richer than legacy sandbox modes; profiles can be built in, active by name, or extended by additional permissions.
Approval requirementA command may be auto-approved, require user approval, be denied by policy, or require an exec-policy amendment.
Approval cacheSome approvals can be cached for the turn or session so repeated equivalent requests do not prompt endlessly.
Request permissionsThe model can request additional permissions through a tool path that is itself governed by approval policy.
HooksPermission-request hooks may decide before Guardian or user approval, and post-tool hooks can replace model-visible output.
GuardianAutomatic review can cover shell, unified exec, patch, network, MCP, and permissions requests. It should fail closed on timeout or reviewer failure.
Network approvalNetwork access can be immediate or deferred, involve host approval caches, managed proxy registration, cancellation on denial, and policy amendments.

Honest security language matters: approval is a decision process, permission profiles describe intended access, and sandboxing is enforcement only when the chosen platform/backend actually enforces it.

Sandbox Platform Behavior

Platform/pathSource-reader fact
PreferenceTool requests can prefer sandboxing automatically, require it, or forbid it.
OverrideSome decisions can bypass the first sandbox attempt, but only through explicit policy.
macOSSeatbelt profiles enforce file and network limits and protect sensitive roots such as repository or Codex metadata where configured.
LinuxThe runtime can choose Landlock or bubblewrap-like helpers; platform sandbox selection can fall back to SandboxType::None, while helper launch failures remain execution errors.
WindowsElevated, unelevated, and restricted-token behavior differ; compatibility limits can force refusal rather than unsafe fallback.
External sandboxThe runtime may know it is already inside a sandbox and preserve only the network semantics it can reason about.
Remote execRemote execution can participate in the filesystem/sandbox story instead of using local process launch.

Sandbox denial is a policy signal, not just a process error. It can carry a network policy decision and may or may not be eligible for unsandboxed retry.

MCP, Apps, Plugins, and Skills

ExtensionWhat source readers know
MCP serversCan be stdio or streamable HTTP, have env vars, bearer-token env vars, required/disabled state, startup/tool timeouts, auth status, OAuth edge cases, and per-tool approval config.
MCP toolsCan be allow/deny filtered, listed from startup snapshots or cache, marked with sandbox metadata, and governed by elicitation policy.
Apps/connectorsExpose connector metadata and app-owned tools, may depend on auth availability, and may cache available tool lists.
PluginsHave manifests, bundled skills/hooks/apps/MCP servers, marketplace caches, availability policy, install/uninstall/share flows, and plugin skill roots.
SkillsAre instruction bundles loaded from explicit mentions, dependencies, disabled paths, bundled collections, or plugin roots.
MentionsUser input can include structured mentions that affect turn-scoped context and tool availability.
HooksSession start, user prompt submit, pre/post tool use, permission request, stop, and after-agent hooks are policy extension points.

The key design rule is that extensions enter through typed inventories, mentions, hooks, or injected context. They should not silently rewrite the central turn loop.

TUI and App-Server Surface

Surface factMeaning
TUI stateThe terminal UI manages scrollback, live tool tails, bottom panes, approval popups, keymaps, editor handoff, notifications, markdown rendering, and resume replay buffering.
App-server transportApp-server can use stdio, websocket, Unix-domain or off/in-process style transports depending on mode.
Initialization gatingClients may need handshake, auth, experimental capability, or notification settings before full use.
Request scopeApp-server requests have serialization scopes: some are global, some are per-thread, and some can be path/thread dependent.
BackpressureJSON-RPC can return server errors for overload or backpressure rather than blocking forever.
SecurityOrigin checks, bearer/capability tokens, websocket auth modes, and secret sanitization are part of the app-server contract.
Client APIsThe surface includes filesystem APIs/watchers, command exec, process spawn experiments, fuzzy search, accounts/rate limits, model/provider info, config writes, feedback upload, remote control, realtime audio/text, and thread operations.
Notification mappingRuntime events become app-server notifications such as turn started/completed, item updates, command approval requests, turn diff updates, token usage, and compaction.

App-server is therefore both a presentation bridge and a runtime entry surface. Modern TUI or exec paths can interact with app-server-style boundaries rather than always calling the old core facade directly.

Error and Retry Taxonomy

FailureRuntime behavior to remember
Retryable transportRetries with bounded budget and backoff before final error.
Auth failureMay refresh credentials, request verification, or emit explicit auth error.
Context windowMay trigger compaction and retry; not always a fatal user-visible error.
Quota/rate/overloadDistinct error categories help UI choose the message.
Cyber/policy denialShould be reported as policy refusal, not generic command failure.
Tool parse errorUsually becomes model-visible tool output so the model can correct arguments.
Runtime fatal errorStops the relevant task and emits an error event.
CancellationDrains or aborts stream/tool futures and reports an aborted or cancelled outcome.
App-server JSON-RPC errorEncodes method-level failure such as invalid request, auth, experimental gate, or backpressure.

Operational and Advanced Behavior

AreaSource-reader detail
Bounded queuesRuntime channels and app-server requests cannot be treated as infinite buffers.
ShutdownCancellation tokens and task joins prevent leaked background work.
Thread unloadApp-server can unload or unsubscribe threads when clients disappear.
ObservabilityTrace context, W3C propagation, telemetry spans, and analytics decorate submissions and tool calls.
Generated contractsApp-server schemas, TypeScript/Python SDK contract tests, and generated types are part of compatibility.
Multi-agentSpawned agents, agent job state, depth/limits, inter-agent messages, wait/close/resume, and collab events are first-class.
Code modeSome tools are only available in specialized modes or nested execution contexts.
Goals and budgetsThread goals can include state and accounting that outlive a single turn.
Review modeReview requests enter and exit through protocol events and task flow.
Cloud task supportCloud/remote task paths add requirements, auth, and operational constraints.

Source-Grounded Self-Check

You should be able to answer these without opening source:

  1. Why is Submission -> Session -> Event a better shape than one blocking run(prompt) call?
  2. Which settings are turn-scoped, and which imply persistent or managed configuration?
  3. What is the difference between a tool parse failure, sandbox denial, approval denial, stream disconnect, and runtime fatal error?
  4. Why are approval, permission profile, network policy, Guardian, and sandbox separate layers?
  5. Why is app-server more than a thin UI adapter?
  6. What state must survive resume/fork/archive operations?

If any answer feels vague, return to the matching table above before opening source.