架构赌注：把 Agent 作为有边界的操作系统

阅读契约： 本章命名 Codex 背后的核心架构赌注。把它读成一个源码支撑的论证：用户请求进入 runtime 后不再只是 chat message；它变成 bounded operation，穿过 owned state，经过 authority gates，并留下可 replay 的 evidence。

有边界 agent 操作系统地图：分开客户端表面、类型协议、session runtime、权限门、sandbox 与 rollout 证据 — 把 Codex 读成有边界的运行环境更容易：clients 可替换，protocol 是 typed，session runtime 拥有 turns，authority 分层，evidence 超出屏幕生命周期。

源码边界： 本章的 direct source facts 固定到 OpenAI Codex commit 569ff6a1c400bd514ff79f5f1050a684dc3afde3。被链接的 named files、types、functions、schemas 以及 request/event shapes 属于 verified source。“有边界的操作系统”、“owner”、“projection”、“runtime contract”等术语，是从这些公开锚点得出的 surrounding contract inference。它们不是关于 OpenAI 私有服务拓扑的断言。

一个普通 Codex 请求听起来很简单：修改文件，运行检查，解释发生了什么。如果 Codex 只是 chat wrapper，implementation 可以把这句话转发给模型，再把文本流回来。源码展示的形状要求更高。请求必须变成 typed operation。Runtime code 必须决定哪一个 turn 拥有它。Context 必须被选择。Tool requests 必须穿过 policy、approval、sandbox 和 execution boundaries。Clients 需要可渲染的 events。Persistence 需要可 replay 的 facts。

这就是架构赌注：Codex 把 agent 作为一个 bounded operating system 来处理。它不是面向任意机器进程的操作系统，而是服务一个受约束 workload 的 runtime：AI-assisted software engineering turns。在这个更窄的世界里，它仍然承担类似 OS 的职责：接收 system calls、拥有 sessions、仲裁 authority、约束 side effects、向 clients 投射 state，并保留 evidence。

这个 framing 很重要，因为它改变了源码读者应该在哪里寻找 truth。模型很重要，但它不是架构。TUI 很重要，但它也不是架构。架构是决定什么可以进入、谁拥有 work、哪些 side effects 可以运行，以及什么 evidence 会留下的一组边界。

一、问题压力

Agent products 会很快积累 surfaces。Codex 有 CLI、terminal UI、headless execution mode、app-server、SDKs、remote-control paths、MCP 与 plugin extension planes、cloud-task clients，以及 release-time schema checks。每个 surface 都可能真实存在，但不等于中心。

压力在于，文件树会让所有 surface 看起来同样 architectural。如果读者从可见 UI 开始，runtime 就会变成“UI 正在做的事”。如果读者从 shell execution 开始，模型似乎拥有 side effects。如果读者从 release workflows 开始，governance 看起来像 packaging。Codex 用 typed carriers 和 runtime owners 抵抗这种混淆。

bounded-OS 类比必须保持克制。下表不是说 Codex 实现了通用操作系统，而是命名 agent runtime 内保护 OS-like responsibilities 的源码 owners。

Runtime pressure	要检查的 Codex owner	受保护的 invariant
用户请求 work。	Protocol submissions 和 operations。	Intent 以 typed data 进入，而不是 UI text。
Work 需要调度。	Session 和 turn runtime。	一个 owner 关联 context、cancellation、model streaming 和 pending input。
Model output 请求 action。	Tool routing 和 approval code。	Generated output 不是 execution authority。
Side effects 触碰 workspace。	Sandbox、permission、hook 和 executor code。	Commands 和 edits 可 deny、retry、review。
Clients 展示进度。	Event streams、app-server mapping 和 TUI rendering。	Screens 是 runtime facts 的 projections。
Releases 演进。	Schemas、generated contracts、tests 和 workflows。	Boundaries 只有在 checks 允许时才能 drift。

全书后续会详细跟踪这些 owners。本章建立第一原则：不要把 Codex 读成套在 model 外面的 product UI；要把它读成由多个 clients 包裹的 bounded runtime。

二、Runtime Contract

2.1 Operation-Event Queue Pair

最小的 public shape 是一对队列。Caller 发送 submission；runtime 发出 event。在固定 commit 上，Submission 和 Event 直接写出了这个边界：

pub struct Submission {
    pub id: String,
    pub op: Op,
    pub trace: Option<W3cTraceContext>,
}

pub struct Event {
    pub id: String,
    pub msg: EventMsg,
}

这一对是“发送文本、接收文本”的反面。id 让 runtime 把 output facts 与造成它们的 request 关联起来。op field 携带 typed operation。msg field 携带 typed event message。Trace field 承认 asynchronous handoffs，而不是假装系统是一个单一 blocking function call。

Protocol boundary taxonomy showing submissions, operations, events, items, app-server messages, generated schemas, and compatibility checks — Protocol boundary 是第一个 bounded-OS 线索：submissions、operations、events、items、app-server messages、schemas 和 compatibility checks 是不同 carriers。

Op enum 让契约更显式。它包含 realtime messages、legacy input、更丰富的 turn input、approval responses、tool refreshes、memory updates、interruptions 和 shutdown。本章重点不在完整列表，而在这个事实：user work 通过一个命名 operation family 进入。在固定源码中，UserInputWithTurnContext 的关键字段把 input 与 turn-scoped constraints 绑定在一起：

UserInputWithTurnContext {
    items: Vec<UserInput>,
    // fields omitted here include environments, schema, client metadata,
    // reasoning preferences, service tier, and collaboration settings.
    cwd: Option<PathBuf>,
    approval_policy: Option<AskForApproval>,
    sandbox_policy: Option<SandboxPolicy>,
    permission_profile: Option<PermissionProfile>,
    model: Option<String>,
}

这就是为什么“用户说了一句话”不是足够精确的源码论断。用户提供了 input，但 runtime 也可能在同一个 queued operation 中收到 working directory、approval policy、sandbox policy、permission profile、model choice、reasoning settings 和 collaboration mode。任何 tool 可以运行之前，请求已经变成一个 controlled runtime object。

2.2 Session Facade

契约在高层 Codex interface 中变成可操作形状。它的注释非常直接：Codex “operates as a queue pair where you send submissions and receive events.”

pub struct Codex {
    pub(crate) tx_sub: Sender<Submission>,
    pub(crate) rx_event: Receiver<Event>,
    pub(crate) agent_status: watch::Receiver<AgentStatus>,
    pub(crate) session: Arc<Session>,
    pub(crate) session_loop_termination: SessionLoopTermination,
}

源码支撑了架构论断。这个 facade 不是 UI widget，也不是 provider client。它拥有 submission sender、event receiver、status watcher 和 session handle。也因此，它是 client surfaces 与 runtime ownership 相遇的位置。

Submit side 还说明 operation IDs 是 runtime facts，而不是 UI decorations。submit 在把 operation 送进 queue 前，用生成的 submission ID 包装它：

pub async fn submit(&self, op: Op) -> CodexResult<String> {
    self.submit_with_trace(op, /*trace*/ None).await
}

pub async fn submit_with_trace(
    &self,
    op: Op,
    trace: Option<W3cTraceContext>,
) -> CodexResult<String> {
    let id = Uuid::now_v7().to_string();
    let sub = Submission {
        id: id.clone(),
        op,
        trace,
    };
    self.submit_with_id(sub).await?;
    Ok(id)
}

pub async fn submit_with_id(&self, mut sub: Submission) -> CodexResult<()> {
    if sub.trace.is_none() {
        sub.trace = current_span_w3c_trace_context();
    }
    self.tx_sub
        .send(sub)
        .await
        .map_err(|_| CodexErr::InternalAgentDied)?;
    Ok(())
}

Receive side 同样窄。next_event 返回下一个 runtime event。Clients 可以 render、transform 或 persist 这个 event，但它们不发明 runtime fact。

pub async fn next_event(&self) -> CodexResult<Event> {
    let event = self.rx_event.recv().await?;
    Ok(event)
}

这是第一个 bounded-OS 机制：runtime work 穿过 queue boundary。Clients 提交 requests 并观察 facts。Terminal UI 感觉交互式，并不会抹掉它下面的 queue-pair contract。

三、有边界的 Runtime

3.1 Runtime Responsibilities

OS 类比只有在每项 responsibility 都有边界时才有用。Codex 没有把“prompting”、“state”、“tools”、“approval”和“display”折叠成一条长 script。

Responsibility	Source-backed boundary	为什么保持 bounded
Intent intake	`Submission` 和 `Op`。	Callers 发送 typed operations，而不是 arbitrary runtime method calls。
Turn ownership	`run_turn` 和 `TurnContext`。	Context、model session、input 和 cancellation 汇合在一个 owner 下。
State selection	`ContextManager`。	Model-visible history 被 curated，而不等同于每一个 runtime fact。
Durable evidence	`RolloutTrace` 和 protocol events。	Replay 和 diagnostics 使用 structured records。
Client projection	App-server event mapping 和 TUI rendering。	UI state 位于 event facts 下游。
Authority	`UserInputWithTurnContext`、approval policy、sandbox policy、permission profile 和 tool orchestration。	模型可以请求 work，但另一层决定它是否可以发生。

Input 变成 operation 后，turn runtime 展示了多少 ownership 被集中起来。run_turn 的 signature 是一张紧凑地图：

pub(crate) async fn run_turn(
    sess: Arc<Session>,
    turn_context: Arc<TurnContext>,
    input: Vec<UserInput>,
    prewarmed_client_session: Option<ModelClientSession>,
    cancellation_token: CancellationToken,
) -> Option<String> {

这个 function 接收 session state、turn context、user input、optional provider session 和 cancellation。因此 turn 是 runtime unit，而不是单次 model call。后续章节会详细跟踪 loop；对于架构赌注，关键 fact 是 ownership boundary。Turn 拥有调用模型的条件，也拥有模型输出可以怎样请求 work 的条件。

3.2 三种历史，不是一份 Transcript

Chat wrapper 可以假装只有一份 transcript。Bounded runtime 不行。Codex 至少要回答三种不同的 history 问题：

History	回答什么问题	Source owner
Model-visible context	模型下一步应该看到什么？	`ContextManager`
Rollout record	按 replayable order 发生了什么？	rollout trace 和 protocol events
Queryable projection	Clients 应该怎样快速 list、filter、resume 或 summarize？	thread state 和 app-server projections

Thread durable state map showing thread identity, session facade, queues, history, projections, resume, fork, state ledger, and rollback — Thread state 不是一份 transcript。Runtime state、model-visible context、durable replay、projections、resume、fork 和 rollback 服务不同读者。

Model-visible side 出现在 ContextManager 中。它存储 response items、version、token accounting，以及用于未来 diffing 的 reference context item：

pub(crate) struct ContextManager {
    items: Vec<ResponseItem>,
    history_version: u64,
    token_info: Option<TokenUsageInfo>,
    reference_context_item: Option<TurnContextItem>,
}

这不同于 database list view。它是 model-visible history 的 context manager。它可以 compact、roll back、track token pressure，并决定什么必须 reinject 到 future turns。

Replay side 是另一种形状。RolloutTrace 是一个 reduced diagnostic graph，包含 turns、conversation items、inference calls、tool calls、terminals、compactions、interaction edges 和 raw payload references：

pub struct RolloutTrace {
    pub schema_version: u32,
    pub trace_id: String,
    pub rollout_id: String,
    pub started_at_unix_ms: i64,
    pub ended_at_unix_ms: Option<i64>,
    pub status: RolloutStatus,
    pub root_thread_id: AgentThreadId,
    pub threads: BTreeMap<AgentThreadId, AgentThread>,
    pub codex_turns: BTreeMap<CodexTurnId, CodexTurn>,
    pub conversation_items: BTreeMap<ConversationItemId, ConversationItem>,
    pub inference_calls: BTreeMap<InferenceCallId, InferenceCall>,
    pub code_cells: BTreeMap<CodeCellId, CodeCell>,
    pub tool_calls: BTreeMap<ToolCallId, ToolCall>,
    pub terminal_sessions: BTreeMap<TerminalId, TerminalSession>,
    pub terminal_operations: BTreeMap<TerminalOperationId, TerminalOperation>,
    pub compactions: BTreeMap<CompactionId, Compaction>,
    pub compaction_requests: BTreeMap<CompactionRequestId, CompactionRequest>,
    pub interaction_edges: BTreeMap<EdgeId, InteractionEdge>,
    pub raw_payloads: BTreeMap<RawPayloadId, RawPayloadRef>,
}

Queryable side 又不同。list_threads 使用 database、filtering、ordering、pagination 和 anchors 返回一个 thread page：

pub async fn list_threads(
    &self,
    page_size: usize,
    filters: ThreadFilterOptions<'_>,
) -> anyhow::Result<crate::ThreadsPage> {

三种历史保护同一个 lesson：不要让一个 representation 承担所有目的。模型需要 selected context。Replay 需要 structured fidelity。Clients 需要 efficient projections。把这些塌缩成一份漂亮 transcript，会让系统更容易 demo，也更难 operate。

四、Authority 与 Projection

4.1 Authority Stack

当模型请求副作用时，bounded-OS model 最重要。一个 model item 可能提出 shell command、patch、MCP call 或其他 tool use。这个 proposal 不是 authority。Authority 由 turn context、approval policy、permission profile、hooks、sandbox policy 和 executor selection 共同组成。

Hooks and approval gates in Codex showing hook, policy, auto review, human approval, sandbox, feedback, and evidence paths — Tool execution 是一条 gated path：hooks、policy、automated review、human approval、sandboxing、execution、feedback 和 evidence 是独立 concerns。

Public protocol 已经在 UserInputWithTurnContext 中携带 authority inputs：cwd、approval_policy、sandbox_policy 和 permission_profile。Execution path 随后评估具体 tool request。在 ToolOrchestrator 中，file-system 与 network sandbox policies 会先从 turn context 读取，然后再决定 approval requirement：

let file_system_sandbox_policy = turn_ctx.file_system_sandbox_policy();
let network_sandbox_policy = turn_ctx.network_sandbox_policy();
let requirement = tool.exec_approval_requirement(req).unwrap_or_else(|| {
    default_exec_approval_requirement(approval_policy, &file_system_sandbox_policy)
});

这个小 block 解释了为什么 prompt 不可能是完整安全故事。Approval requirement 取决于具体 request shape、approval policy 和 file-system sandbox policy。

同一个 orchestrator 随后把 policy 变成可能 outcomes。一个 branch 可以立即拒绝 request：

ExecApprovalRequirement::Forbidden { reason } => {
    return Err(ToolError::Rejected(reason));
}

另一个 branch 请求 approval，并在 approval decision 未允许时拒绝 tool call：

ExecApprovalRequirement::NeedsApproval { reason, .. } => {
    let guardian_review_id = use_guardian.then(new_guardian_review_id);
    let approval_ctx = ApprovalCtx {
        session: &tool_ctx.session,
        turn: &tool_ctx.turn,
        call_id: &tool_ctx.call_id,
        guardian_review_id: guardian_review_id.clone(),
        retry_reason: reason,
        network_approval_context: None,
    };
    let decision = Self::request_approval(
        tool,
        req,
        tool_ctx.call_id.as_str(),
        approval_ctx,
        tool_ctx,
        /*evaluate_permission_request_hooks*/ !strict_auto_review,
        &otel,
    )
    .await?;

    Self::reject_if_not_approved(tool_ctx, guardian_review_id.as_deref(), decision)
        .await?;
    already_approved = true;
}

只有之后才发生 sandbox selection。First attempt 不是默认“随便在哪里运行”，而是由 file-system policy、network policy、tool 的 first-attempt override、tool preference、Windows mode 和 managed network state 共同选择：

let initial_sandbox = match tool.sandbox_mode_for_first_attempt(req) {
    SandboxOverride::BypassSandboxFirstAttempt => SandboxType::None,
    SandboxOverride::NoOverride => self.sandbox.select_initial(
        &file_system_sandbox_policy,
        network_sandbox_policy,
        tool.sandbox_preference(),
        turn_ctx.windows_sandbox_level,
        managed_network_active,
    ),
};

模型可以请求。Runtime 决定。Sandbox 执行约束。最终 event stream 记录发生了什么。这就是 bounded operating system bet 的核心。

4.2 可替换客户端

一旦 runtime 拥有 submissions、turns、events、histories 和 authority，clients 就可以不同，而不变成不同架构。Terminal UI 可以专注交互 rendering 和 approvals。exec 可以专注 deterministic command-line behavior。App-server 可以专注 JSON-RPC、request serialization、thread state、rejoin semantics、SDK models，以及 browser 或 desktop integration。

源码保持了这种 downstream relationship。App-server event mapping 并不声称自己是整个 runtime。它的 helper 说自己会构造与单个 core event 对应的 notification，并把 surrounding state checks 留给 caller。在固定 commit 上，item_event_to_server_notification 明确是 projection layer：

pub fn item_event_to_server_notification(
    msg: EventMsg,
    thread_id: &str,
    turn_id: &str,
) -> ServerNotification {

这种分离让 client surfaces 可以增加，而不复制 agent loop。UI 可以丰富地 render events。SDK 可以暴露 typed models。Daemon 可以管理 transport 和 rejoin behavior。这些 surface 都不应该成为“tool 是否被允许”、“turn 包含什么”或“哪些 facts 属于 replay”的 truth source。

Observability and rollout evidence map showing client events, rollout trace, metrics, logs, replay, and source anchors — Evidence 是 runtime 的 public memory：clients 可以 render 不同 views，但 replay、diagnostics、rollout trace、metrics 和 source anchors 需要 stable facts。

这也是后续章节中 generated contracts 和 release checks 重要的原因。如果 clients 位于 typed runtime facts 下游，那么 schema export、compatibility tests 和 boundary checks 就不是 delivery chores。它们是 bounded OS 防止新 surfaces 把私有假设偷渡过 runtime boundary 的方式。

五、应用到实践

Bounded-OS model 在 Codex 之外也有用。它给任何能作用于用户环境的 agent system 提供一个 checklist。

在 intent 变成 work 之前让它 typed。 定义携带 user input、cwd、policy、model choice 和 turn-scoped override 的 operation shape。
给 turn 一个 runtime owner。 把 context、cancellation、provider state、pending input 和 completion 放在 clients 可以调用但不拥有的边界下。
按工作拆分 histories。 不要强迫 model-visible context、replay evidence 和 queryable list views 进入同一份 transcript。
在 model output 之后 gate side effects。 让 generated output 请求 work，再用 policy、approvals、hooks、sandboxing 和 executors 决定实际发生什么。
把 clients 当成 projections。 在 runtime facts 之上构建 UI、CLI、SDK 和 service surfaces，而不是让每个 surface 长出自己的 agent loop。

下面的 decision table 是这些规则的审计版。

Design choice	Runtime owner	Source anchor	Protected invariant	Failure if collapsed
Submit work as typed operations.	Protocol crate 和 session facade。	`Submission`, `Op`, `Codex::submit`。	Intent 可以被 correlate 和 reject。	UI text 变成隐式 runtime authority。
Emit facts as typed events.	Protocol crate 和 event stream。	`Event`, `EventMsg`, `next_event`。	Clients consume facts，而不是 invent facts。	Screens 成为唯一 record。
Keep turn context with input.	Operation payload 和 turn runtime。	`UserInputWithTurnContext`, `run_turn`。	Policies、cwd、model 和 sandbox settings 随请求移动。	同一条 message 在不同 caller 中含义不同。
Separate histories.	Context manager、rollout trace、thread database。	`ContextManager`, `RolloutTrace`, `list_threads`。	Model context、replay 和 list views 可以为不同 jobs 优化。	一份 transcript 变得 slow、lossy 或 unsafe。
Gate side effects after model output.	Tool orchestrator 和 sandbox manager。	Approval requirement 和 sandbox selection。	模型请求 work；runtime authority 决定。	Tool execution 由 prompt 治理。
Treat clients as projections.	App-server、TUI、SDK 和 CLI adapters。	Event mapping 和 generated schemas。	Surfaces 可以演进，而不 fork runtime truth。	每个 client 长出自己的 agent loop。

实际规则很简单：添加 agent feature 前，先命名会跨 subsystem boundary 的 runtime nouns。然后命名可以说 no 的 owner。这两个名字稳定后，UI 或 prompt shape 才应该被当成 implementation work。

六、结语

Bounded-operating-system model 解释了为什么 Codex 先围绕 contracts 组织，再围绕 interfaces 组织。请求以 operation 进入，在 session 与 turn owner 下运行，从 selected context 获取信息，通过 authority gates 请求 side effects，发出 events，并留下 clients 可以 project 的 evidence。这远不止 chat wrapper，也不是 general-purpose OS。力量来自边界。

第 2 章会跟踪第一个具体 entry boundary：安装后的 command 怎样到达 Rust command router，同时不让 distribution glue 变成 product architecture。

源码地图

概念	源码锚点
Runtime vocabulary	`codex-rs/protocol/src/protocol.rs`
Operation enum	`codex-rs/protocol/src/protocol.rs`
Turn-scoped context operation	`codex-rs/protocol/src/protocol.rs`
Event stream	`codex-rs/protocol/src/protocol.rs`
Session facade	`codex-rs/core/src/session/mod.rs`
Submission and event methods	`submit`, `next_event`
Turn runtime	`codex-rs/core/src/session/turn.rs`
Model-visible history	`codex-rs/core/src/context_manager/history.rs`
Rollout replay graph	`codex-rs/rollout-trace/src/model/mod.rs`
Queryable thread projection	`codex-rs/state/src/runtime/threads.rs`
App-server event projection	`codex-rs/app-server-protocol/src/protocol/event_mapping.rs`
Tool approval gate	`codex-rs/core/src/tools/orchestrator.rs`
Sandbox selection	`codex-rs/core/src/tools/orchestrator.rs`