The Architectural Bet: Agent as a Bounded Operating System

Reading Contract: This chapter names the core architectural bet behind Codex. Read it as a source-grounded argument: a user request is not a chat message after it enters the runtime; it becomes a bounded operation, moves through owned state, crosses authority gates, and leaves replayable evidence.

Bounded agent operating-system map separating client surfaces, typed protocol, session runtime, authority gates, sandboxing, and rollout evidence — Codex is easier to read as a bounded operating environment: clients are replaceable, the protocol is typed, the session runtime owns turns, authority is layered, and evidence outlives the screen.

Source boundary: direct source facts in this chapter are anchored to OpenAI Codex commit 569ff6a1c400bd514ff79f5f1050a684dc3afde3. Named files, types, functions, schemas, and request or event shapes are verified source where linked. Terms such as “bounded operating system”, “owner”, “projection”, and “runtime contract” are surrounding contract inference from those public anchors. They are not claims about private OpenAI service topology.

A normal Codex request sounds simple: change a file, run a check, explain what happened. If Codex were only a chat wrapper, the implementation could forward that sentence to a model and stream text back. The source shows a more demanding shape. The request must become a typed operation. Runtime code must decide which turn owns it. Context has to be selected. Tool requests must cross policy, approval, sandbox, and execution boundaries. Clients need events they can render. Persistence needs facts it can replay.

That is the architectural bet: Codex treats an agent as a bounded operating system. It is not an operating system for arbitrary machine processes. It is a runtime for one constrained workload: AI-assisted software engineering turns. Within that narrower world, it still takes on OS-like duties: accepting system calls, owning sessions, mediating authority, containing side effects, projecting state to clients, and preserving evidence.

This framing matters because it changes where a source reader should look for truth. The model is important, but it is not the architecture. The TUI is important, but it is not the architecture. The architecture is the set of boundaries that decide what can enter, who owns work, which side effects may run, and what evidence survives.

The Problem Pressure

Agent products accumulate surfaces quickly. Codex has a CLI, terminal UI, headless execution mode, app-server, SDKs, remote-control paths, MCP and plugin extension planes, cloud-task clients, and release-time schema checks. Each surface can be real without being the center.

The pressure is that a file tree makes all surfaces look equally architectural. If the reader starts with a visible UI, the runtime becomes “whatever the UI is doing.” If the reader starts with shell execution, the model seems to own side effects. If the reader starts with release workflows, governance looks like packaging. Codex resists that confusion by using typed carriers and runtime owners.

The bounded-OS analogy should stay disciplined. The table below is not claiming that Codex implements a general-purpose operating system. It names the source owners that protect OS-like responsibilities inside the agent runtime.

Runtime pressure	Codex owner to inspect	Protected invariant
A user asks for work.	Protocol submissions and operations.	Intent enters as typed data, not UI text.
Work needs scheduling.	Session and turn runtime.	One owner correlates context, cancellation, model streaming, and pending input.
Model output requests action.	Tool routing and approval code.	Generated output is not execution authority.
Side effects touch the workspace.	Sandbox, permission, hook, and executor code.	Commands and edits are deniable, retryable, and reviewable.
Clients display progress.	Event streams, app-server mapping, and TUI rendering.	Screens are projections over runtime facts.
Releases evolve.	Schemas, generated contracts, tests, and workflows.	Boundaries can drift only when checks permit it.

The rest of the book follows those owners in detail. This chapter establishes the first principle: do not read Codex as a model wrapped in product UI; read it as a bounded runtime wrapped by multiple clients.

The Runtime Contract

Operation-Event Queue Pair

The smallest public shape is a queue pair. A caller sends a submission; the runtime emits an event. At the pinned commit, Submission and Event state that boundary directly:

pub struct Submission {
    pub id: String,
    pub op: Op,
    pub trace: Option<W3cTraceContext>,
}

pub struct Event {
    pub id: String,
    pub msg: EventMsg,
}

That pair is the opposite of “send text, receive text.” The id lets the runtime correlate output facts with the request that caused them. The op field carries a typed operation. The msg field carries a typed event message. The trace field acknowledges asynchronous handoffs instead of pretending the system is a single blocking function call.

Protocol boundary taxonomy showing submissions, operations, events, items, app-server messages, generated schemas, and compatibility checks — The protocol boundary is the first bounded-OS clue: submissions, operations, events, items, app-server messages, schemas, and compatibility checks are distinct carriers.

The Op enum makes the contract more explicit. It contains realtime messages, legacy input, richer turn input, approval responses, tool refreshes, memory updates, interruptions, and shutdown. The important part for this chapter is not the full list; it is the fact that user work enters through a named operation family. In the pinned source, the key fields of UserInputWithTurnContext bundles input with turn-scoped constraints:

UserInputWithTurnContext {
    items: Vec<UserInput>,
    // fields omitted here include environments, schema, client metadata,
    // reasoning preferences, service tier, and collaboration settings.
    cwd: Option<PathBuf>,
    approval_policy: Option<AskForApproval>,
    sandbox_policy: Option<SandboxPolicy>,
    permission_profile: Option<PermissionProfile>,
    model: Option<String>,
}

This is why “the user said something” is not a precise enough source claim. The user supplied input, but the runtime may also receive a working directory, approval policy, sandbox policy, permission profile, model choice, reasoning settings, and collaboration mode in the same queued operation. The request has already become a controlled runtime object before any tool can run.

The Session Facade

The contract becomes operational in the high-level Codex interface. Its comment is unusually direct: Codex “operates as a queue pair where you send submissions and receive events.”

pub struct Codex {
    pub(crate) tx_sub: Sender<Submission>,
    pub(crate) rx_event: Receiver<Event>,
    pub(crate) agent_status: watch::Receiver<AgentStatus>,
    pub(crate) session: Arc<Session>,
    pub(crate) session_loop_termination: SessionLoopTermination,
}

The source backs the architectural claim. This facade is not a UI widget and not a provider client. It owns the submission sender, event receiver, status watcher, and session handle. That makes it the place where client surfaces meet runtime ownership.

The submit side also shows why operation IDs are runtime facts rather than UI decorations. submit wraps an operation in a generated submission ID before sending it into the queue:

pub async fn submit(&self, op: Op) -> CodexResult<String> {
    self.submit_with_trace(op, /*trace*/ None).await
}

pub async fn submit_with_trace(
    &self,
    op: Op,
    trace: Option<W3cTraceContext>,
) -> CodexResult<String> {
    let id = Uuid::now_v7().to_string();
    let sub = Submission {
        id: id.clone(),
        op,
        trace,
    };
    self.submit_with_id(sub).await?;
    Ok(id)
}

pub async fn submit_with_id(&self, mut sub: Submission) -> CodexResult<()> {
    if sub.trace.is_none() {
        sub.trace = current_span_w3c_trace_context();
    }
    self.tx_sub
        .send(sub)
        .await
        .map_err(|_| CodexErr::InternalAgentDied)?;
    Ok(())
}

The receive side is just as narrow. next_event returns the next runtime event. Clients can render, transform, or persist that event, but they do not invent the runtime fact.

pub async fn next_event(&self) -> CodexResult<Event> {
    let event = self.rx_event.recv().await?;
    Ok(event)
}

This is the first bounded-OS mechanism: runtime work crosses a queue boundary. Clients submit requests and observe facts. The fact that a terminal UI feels interactive does not erase the queue-pair contract underneath it.

The Bounded Runtime

Runtime Responsibilities

The OS analogy becomes useful only if each responsibility has a boundary. Codex does not collapse “prompting”, “state”, “tools”, “approval”, and “display” into one long script.

Responsibility	Source-backed boundary	Why it stays bounded
Intent intake	`Submission` and `Op`.	Callers send typed operations, not arbitrary runtime method calls.
Turn ownership	`run_turn` and `TurnContext`.	Context, model session, input, and cancellation meet under one owner.
State selection	`ContextManager`.	Model-visible history is curated rather than identical to every runtime fact.
Durable evidence	`RolloutTrace` and protocol events.	Replay and diagnostics use structured records.
Client projection	App-server event mapping and TUI rendering.	UI state is downstream from event facts.
Authority	`UserInputWithTurnContext`, approval policy, sandbox policy, permission profile, and tool orchestration.	The model can request work, but another layer decides whether it may happen.

The turn runtime shows how much ownership is concentrated after input becomes an operation. The signature of run_turn is a compact map:

pub(crate) async fn run_turn(
    sess: Arc<Session>,
    turn_context: Arc<TurnContext>,
    input: Vec<UserInput>,
    prewarmed_client_session: Option<ModelClientSession>,
    cancellation_token: CancellationToken,
) -> Option<String> {

The function receives session state, turn context, user input, an optional provider session, and cancellation. That is why a turn is a runtime unit, not a single model call. Later chapters will follow the loop in detail; for the architectural bet, the important fact is the ownership boundary. The turn owns the conditions under which the model is called and the conditions under which its output can ask for work.

Three Histories, Not One Transcript

A chat wrapper can pretend there is one transcript. A bounded runtime cannot. Codex has to answer at least three different history questions:

History	Question it answers	Source owner
Model-visible context	What should the model see next?	`ContextManager`
Rollout record	What happened in replayable order?	rollout trace and protocol events
Queryable projection	What should clients list, filter, resume, or summarize quickly?	thread state and app-server projections

Thread durable state map showing thread identity, session facade, queues, history, projections, resume, fork, state ledger, and rollback — Thread state is not a single transcript. Runtime state, model-visible context, durable replay, projections, resume, fork, and rollback serve different readers.

The model-visible side appears in ContextManager. It stores response items, a version, token accounting, and a reference context item used for future diffing:

pub(crate) struct ContextManager {
    items: Vec<ResponseItem>,
    history_version: u64,
    token_info: Option<TokenUsageInfo>,
    reference_context_item: Option<TurnContextItem>,
}

That is not the same as a database list view. It is the context manager for model-visible history. It can compact, roll back, track token pressure, and decide what has to be reinjected into future turns.

The replay side has a different shape. RolloutTrace is a reduced diagnostic graph with turns, conversation items, inference calls, tool calls, terminals, compactions, interaction edges, and raw payload references:

pub struct RolloutTrace {
    pub schema_version: u32,
    pub trace_id: String,
    pub rollout_id: String,
    pub started_at_unix_ms: i64,
    pub ended_at_unix_ms: Option<i64>,
    pub status: RolloutStatus,
    pub root_thread_id: AgentThreadId,
    pub threads: BTreeMap<AgentThreadId, AgentThread>,
    pub codex_turns: BTreeMap<CodexTurnId, CodexTurn>,
    pub conversation_items: BTreeMap<ConversationItemId, ConversationItem>,
    pub inference_calls: BTreeMap<InferenceCallId, InferenceCall>,
    pub code_cells: BTreeMap<CodeCellId, CodeCell>,
    pub tool_calls: BTreeMap<ToolCallId, ToolCall>,
    pub terminal_sessions: BTreeMap<TerminalId, TerminalSession>,
    pub terminal_operations: BTreeMap<TerminalOperationId, TerminalOperation>,
    pub compactions: BTreeMap<CompactionId, Compaction>,
    pub compaction_requests: BTreeMap<CompactionRequestId, CompactionRequest>,
    pub interaction_edges: BTreeMap<EdgeId, InteractionEdge>,
    pub raw_payloads: BTreeMap<RawPayloadId, RawPayloadRef>,
}

The queryable side is different again. list_threads uses the database, filtering, ordering, pagination, and anchors to return a thread page:

pub async fn list_threads(
    &self,
    page_size: usize,
    filters: ThreadFilterOptions<'_>,
) -> anyhow::Result<crate::ThreadsPage> {

The three histories protect the same lesson: do not ask one representation to serve every purpose. The model needs selected context. Replay needs structured fidelity. Clients need efficient projections. Collapsing those into one pretty transcript would make the system easier to demo and harder to operate.

Authority and Projection

The Authority Stack

The bounded-OS model becomes most important when the model requests a side effect. A model item may propose a shell command, patch, MCP call, or other tool use. That proposal is not authority. Authority is assembled from turn context, approval policy, permission profile, hooks, sandbox policy, and executor selection.

Hooks and approval gates in Codex showing hook, policy, auto review, human approval, sandbox, feedback, and evidence paths — Tool execution is a gated path: hooks, policy, automated review, human approval, sandboxing, execution, feedback, and evidence are separate concerns.

The public protocol already carries authority inputs in UserInputWithTurnContext: cwd, approval_policy, sandbox_policy, and permission_profile. The execution path then evaluates a concrete tool request. In ToolOrchestrator, the file-system and network sandbox policies are read from turn context before the approval requirement is determined:

let file_system_sandbox_policy = turn_ctx.file_system_sandbox_policy();
let network_sandbox_policy = turn_ctx.network_sandbox_policy();
let requirement = tool.exec_approval_requirement(req).unwrap_or_else(|| {
    default_exec_approval_requirement(approval_policy, &file_system_sandbox_policy)
});

That small block explains why a prompt cannot be the whole safety story. The approval requirement depends on concrete request shape, approval policy, and the file-system sandbox policy.

The same orchestrator then turns policy into possible outcomes. One branch can reject a request immediately:

ExecApprovalRequirement::Forbidden { reason } => {
    return Err(ToolError::Rejected(reason));
}

Another branch requests approval and rejects the tool call unless the approval decision permits it:

ExecApprovalRequirement::NeedsApproval { reason, .. } => {
    let guardian_review_id = use_guardian.then(new_guardian_review_id);
    let approval_ctx = ApprovalCtx {
        session: &tool_ctx.session,
        turn: &tool_ctx.turn,
        call_id: &tool_ctx.call_id,
        guardian_review_id: guardian_review_id.clone(),
        retry_reason: reason,
        network_approval_context: None,
    };
    let decision = Self::request_approval(
        tool,
        req,
        tool_ctx.call_id.as_str(),
        approval_ctx,
        tool_ctx,
        /*evaluate_permission_request_hooks*/ !strict_auto_review,
        &otel,
    )
    .await?;

    Self::reject_if_not_approved(tool_ctx, guardian_review_id.as_deref(), decision)
        .await?;
    already_approved = true;
}

Only after that does sandbox selection happen. The first attempt is not a default “run it somewhere”; it is selected from file-system policy, network policy, the tool’s first-attempt override, tool preference, Windows mode, and managed network state:

let initial_sandbox = match tool.sandbox_mode_for_first_attempt(req) {
    SandboxOverride::BypassSandboxFirstAttempt => SandboxType::None,
    SandboxOverride::NoOverride => self.sandbox.select_initial(
        &file_system_sandbox_policy,
        network_sandbox_policy,
        tool.sandbox_preference(),
        turn_ctx.windows_sandbox_level,
        managed_network_active,
    ),
};

The model can ask. The runtime decides. The sandbox enforces. The resulting event stream records what happened. That is the heart of the bounded-operating system bet.

Replaceable Clients

Once the runtime owns submissions, turns, events, histories, and authority, clients can differ without becoming separate architectures. The terminal UI can focus on interactive rendering and approvals. exec can focus on deterministic command-line behavior. App-server can focus on JSON-RPC, request serialization, thread state, rejoin semantics, SDK models, and browser or desktop integration.

The source keeps this downstream relationship visible. App-server event mapping does not claim to be the whole runtime. Its helper says it builds a notification that corresponds to a single core event and leaves surrounding state checks to callers. At the pinned commit, item_event_to_server_notification is explicitly a projection layer:

pub fn item_event_to_server_notification(
    msg: EventMsg,
    thread_id: &str,
    turn_id: &str,
) -> ServerNotification {

That separation lets client surfaces multiply without copying the agent loop. A UI can render events richly. An SDK can expose typed models. A daemon can manage transport and rejoin behavior. None of those surfaces should become the source of truth for whether a tool was allowed, what a turn contained, or which facts belong to replay.

Observability and rollout evidence map showing client events, rollout trace, metrics, logs, replay, and source anchors — Evidence is the runtime’s public memory: clients may render different views, but replay, diagnostics, rollout trace, metrics, and source anchors need stable facts.

This is also why generated contracts and release checks matter later in the book. If clients are downstream of typed runtime facts, then schema export, compatibility tests, and boundary checks are not delivery chores. They are how the bounded OS keeps new surfaces from smuggling private assumptions across the runtime boundary.

Apply This

The bounded-OS model is useful beyond Codex. It gives a checklist for any agent system that can act on a user’s environment.

Make intent typed before it becomes work. Define the operation shape that carries user input, cwd, policy, model choice, and any turn-scoped override.
Give the turn one runtime owner. Keep context, cancellation, provider state, pending input, and completion under a boundary that clients can call but do not own.
Separate histories by job. Do not force model-visible context, replay evidence, and queryable list views into one transcript.
Gate side effects after model output. Let generated output request work, then use policy, approvals, hooks, sandboxing, and executors to decide what actually happens.
Treat clients as projections. Build UI, CLI, SDK, and service surfaces over runtime facts instead of letting each surface grow its own agent loop.

The decision table below is the audit version of those rules.

Design choice	Runtime owner	Source anchor	Protected invariant	Failure if collapsed
Submit work as typed operations.	Protocol crate and session facade.	`Submission`, `Op`, `Codex::submit`.	Intent can be correlated and rejected.	UI text becomes implicit runtime authority.
Emit facts as typed events.	Protocol crate and event stream.	`Event`, `EventMsg`, `next_event`.	Clients consume facts instead of inventing them.	Screens become the only record.
Keep turn context with input.	Operation payload and turn runtime.	`UserInputWithTurnContext`, `run_turn`.	Policies, cwd, model, and sandbox settings move with the request.	The same message means different things in different callers.
Separate histories.	Context manager, rollout trace, thread database.	`ContextManager`, `RolloutTrace`, `list_threads`.	Model context, replay, and list views can optimize for different jobs.	One transcript becomes slow, lossy, or unsafe.
Gate side effects after model output.	Tool orchestrator and sandbox manager.	Approval requirement and sandbox selection.	The model requests work; runtime authority decides.	Tool execution becomes prompt-governed.
Treat clients as projections.	App-server, TUI, SDK, and CLI adapters.	Event mapping and generated schemas.	Surfaces can evolve without forking runtime truth.	Every client grows its own agent loop.

The practical rule is simple: before adding an agent feature, name the runtime nouns that will cross subsystem boundaries. Then name the owner that can say no. Only after those two names are stable should a UI or prompt shape be treated as implementation work.

Closing

The bounded-operating-system model explains why Codex is organized around contracts before interfaces. A request enters as an operation, runs under a session and turn owner, draws from selected context, asks for side effects through authority gates, emits events, and leaves evidence that clients can project. That is much more than a chat wrapper, and less than a general-purpose OS. The power comes from the boundary.

Chapter 2 follows the first concrete entry boundary: how an installed command reaches the Rust command router without letting distribution glue become the product architecture.

Source Map

Concept	Source anchor
Runtime vocabulary	`codex-rs/protocol/src/protocol.rs`
Operation enum	`codex-rs/protocol/src/protocol.rs`
Turn-scoped context operation	`codex-rs/protocol/src/protocol.rs`
Event stream	`codex-rs/protocol/src/protocol.rs`
Session facade	`codex-rs/core/src/session/mod.rs`
Submission and event methods	`submit`, `next_event`
Turn runtime	`codex-rs/core/src/session/turn.rs`
Model-visible history	`codex-rs/core/src/context_manager/history.rs`
Rollout replay graph	`codex-rs/rollout-trace/src/model/mod.rs`
Queryable thread projection	`codex-rs/state/src/runtime/threads.rs`
App-server event projection	`codex-rs/app-server-protocol/src/protocol/event_mapping.rs`
Tool approval gate	`codex-rs/core/src/tools/orchestrator.rs`
Sandbox selection	`codex-rs/core/src/tools/orchestrator.rs`