Chapter 20: Multi-Agent Coordination

Reading Contract: Use this chapter to answer one question: when Codex coordinates multiple agents, which facts belong to live thread topology, which belong to model-visible communication, and which belong to offline trace reconstruction? Track thread identity, graph edges, collaboration events, pending reducer edges, and result ownership. Afterward, you should be able to explain why a child agent is a thread with lifecycle state, not an anonymous background prompt.

Multi-agent coordination graph showing parent and child threads, spawn edges, mailboxes, wait states, outcomes, and trace reduction — Multi-agent work is a graph of explicit edges: spawn, mailbox, result, close, live updates, and trace evidence.

Source boundary: named files, structs, enums, handlers, event shapes, graph-store operations, and reducer behavior are verified source only where this chapter links to the pinned Codex commit or this chapter’s Source Map. Claims that live topology and trace topology are different “graphs” are surrounding contract inference from those visible boundaries. This chapter does not claim to know any hidden scheduler or provider-side coordination policy.

Chapter 19 closed Part V by showing how Codex imports external state without inheriting an external runtime. This chapter turns that discipline forward. Once Codex has native threads and native extension surfaces, multi-agent work should be represented as explicit relationships between threads, tools, messages, status, and trace evidence.

The key shift is that “multi-agent” is not a bag of background prompts. It is a lifecycle graph. A parent thread can spawn a child, send or assign work, wait for status, receive a result, resume a descendant, and close a relationship. Each action has three surfaces:

Surface	Owner	What it answers
live runtime	`AgentControl`, thread manager, graph store	what can be spawned, messaged, waited on, resumed, or closed now
protocol events	`Collab*Event` shapes	what clients can render without parsing prose
rollout trace	reducer interaction edges	what actually happened after raw events are captured

The invariant: coordination must not depend on terminal text as the only source of truth.

1. The Coordination Unit Is Still A Thread

A single Codex turn already has model input, streamed output, tool calls, approvals, hooks, persistence, cancellation, and replay. Multi-agent coordination does not create a second runtime. It creates more threads and records how information moves between them.

The source gives this thread-centered design a concrete control plane. AgentControl is shared across a root session tree. It can spawn threads, send input, interrupt, subscribe to status, close agents, resume agents from rollout, and list live agents. usage_hint_text() also shows that root and subagent sessions can receive different usage hints based on SessionSource.

The live graph store is intentionally narrow. ThreadSpawnEdgeStatus is only Open or Closed. LocalAgentGraphStore upserts a parent/child edge, sets child-edge status, lists children, and lists descendants with an optional status filter.

That narrowness is a feature. The graph store should not know how a model phrased the task, how a TUI renders the child, or how a trace viewer draws causality. It answers operational topology questions.

Thread runtime record with thread identity, session facade, queues, history, projections, resume state, fork state, and rollback ledger — A child agent is still a thread record with queues, history, projections, resume state, and rollback evidence.

2. Spawn Creates A Thread, A Status Edge, And A Client Event

The spawn_agent handler makes the lifecycle visible. spawn.rs parses arguments, checks depth, emits CollabAgentSpawnBeginEvent, builds child config, starts a child through agent_control.spawn_agent_with_metadata(), emits CollabAgentSpawnEndEvent, records telemetry, and returns a model-visible result.

Shape-level tool input:

{
  "message": "Audit chapter 17 source links",
  "agent_type": "reviewer",
  "model": "optional override",
  "reasoning_effort": "optional override",
  "fork_context": false
}

Shape-level tool output:

{
  "agent_id": "child-thread-id",
  "nickname": "optional nickname"
}

The live topology is written later in AgentControl. spawn_agent_internal() prepares a SessionSource::SubAgent(ThreadSpawn), starts or forks the child thread, notifies clients that a thread was created, persists the spawn edge, and sends the initial input. persist_thread_spawn_edge_for_source() upserts the edge as Open.

2.1 Forking Has A Recovery Boundary

When fork_context is set, the child is not created from an abstract prompt alone. spawn_forked_thread() flushes parent rollout, reads stored parent history, optionally truncates to the last N fork turns, removes parent usage hints so the child gets fresh hints, filters which rollout items are kept, and starts a forked thread with InitialHistory::Forked.

That protects two invariants:

Pressure	Simpler approach that fails	Source mechanism	Invariant protected
child needs parent context	copy live in-memory state	flush/read stored rollout before fork	fork has durable baseline
parent usage hints differ from child hints	replay parent developer hints into child	filter configured usage hints	child prompt matches child session source
tool/history noise can pollute fork	keep every rollout item	`keep_forked_rollout_item()` filters item kinds	child gets useful history, not parent runtime debris

3. Sending Work Is Mailbox Delivery, Not Shared Memory

Sending work to an agent names another thread as a participant. send_input.rs parses a target thread ID, turns message or items into user input, optionally interrupts the receiver, emits interaction begin/end events, calls agent_control.send_input(), and returns a submission ID.

Shape-level:

{
  "target": "child-thread-id",
  "message": "Check whether this migration claim is source-backed.",
  "interrupt": false
}

returns:

{
  "submission_id": "operation-id"
}

The mailbox model is useful because sender and receiver observations can be separated in the raw event stream. A parent tool call proves the parent requested delivery. A receiver-side model-visible message proves where that delivery entered the child. Those are related facts, not the same fact.

4. Waiting And Closing Are Lifecycle Operations

wait_agent and close_agent make the status/lifecycle boundary explicit.

wait_agent parses non-empty targets, clamps a positive timeout, emits waiting begin/end events, subscribes to each child status, returns when at least one final status arrives or the timeout expires, and returns a map plus timed_out.

{
  "status": {
    "child-thread-id": "completed"
  },
  "timed_out": false
}

close_agent emits close begin/end events, observes the previous status, then calls agent_control.close_agent(). close_agent() marks the persisted child spawn edge Closed before shutting down the agent tree.

Closing a relationship is not deleting history. It changes the operational interpretation of descendants. Graph-store tests show the status-filter behavior: open descendant traversal follows open edges, while closed traversal follows closed edges. A closed branch can remain auditable without being treated as active runtime work.

5. Collaboration Events Are Product Events

Codex emits collaboration events instead of asking clients to infer multi-agent state from tool prose. The protocol source defines:

Event family	Shape owner	What clients can render
`CollabAgentSpawnBegin/End`	`CollabAgentSpawn*`	sender, child thread, prompt preview, model, effort, status
`CollabAgentInteractionBegin/End`	`CollabAgentInteraction*`	sender, receiver, prompt preview, receiver metadata, status
`CollabWaitingBegin/End`	`CollabWaiting*`	target set, receiver refs, status map
`CollabCloseBegin/End`	`CollabClose*`	close target, receiver metadata, previous status
`CollabResumeBegin/End`	`CollabResume*`	resume target, receiver metadata, status

This is the same app-server discipline from Chapter 14 applied to collaboration. Product events are not decorative. They preserve identity and lifecycle so clients can render status, subscriptions, and history without parsing free-form assistant text.

6. Live Graph And Trace Graph Answer Different Questions

There are two useful graphs.

The live graph is compact. It stores parent/child spawn edges and open/closed status. It is optimized for runtime questions: which children exist, which descendants are open, which branches should be shut down or resumed, and which child edge should be marked closed.

The trace graph is semantic. It is built after raw protocol/runtime/tool/model events have been captured. It is optimized for explanation: which tool call created a child, which mailbox item received the task, which child output produced a parent notice, and which raw payloads support the edge.

Confusing them creates design bugs:

Confusion	Consequence
Treat live graph as full trace	runtime store grows UI/explanation policy
Treat trace graph as runtime state	active coordination depends on replay artifact availability
Treat transcript text as graph	compaction or formatting can erase coordination facts
Treat nickname as identity	renamed agents break routing and trace joins

Raw trace events and payload references entering a strict reducer that emits a rollout trace graph with pending queues and raw links — The trace graph is reconstructed from recorded facts, so pending queues and raw links must be explicit instead of inferred from prose.

7. Pending Queues Make Races Explicit

The trace reducer is strict about evidence and forgiving about legitimate ordering races. PendingAgentInteractionEdge stores an edge waiting for the recipient-side conversation item. It carries edge kind, source, target thread ID, message content, optional spawn fallback thread ID, timestamps, and raw payload IDs.

The reducer lifecycle is:

sender tool begin/end observed
  -> queue pending edge with target thread and message content
  -> receiver-side inter-agent message item is reduced
  -> resolve pending edge to exact conversation item
  -> if spawn target item never appears but child thread exists
     resolve spawn to child thread fallback

queue_or_resolve_agent_interaction_edge() resolves immediately if an unlinked matching message item already exists, merges duplicate pending observations only when endpoints agree, and rejects conflicting data. resolve_pending_agent_edges_for_item() resolves a pending edge when a matching inter-agent message item is reduced. resolve_pending_spawn_edge_fallbacks() materializes spawn edges to a thread target only when the child thread exists.

A minimal sequence shows why the queue exists:

P1: parent calls spawn_agent("Audit links")
P2: parent receives tool result with child_thread_id
C1: child thread starts
C2: child receives model-visible mailbox/task message

If P2 is reduced before C2, the edge waits.
If C2 appears, the target is the conversation item.
If C2 never appears but C1 exists, spawn falls back to the child thread.

The reducer also avoids false edges. upsert_close_agent_interaction() refuses to create a close edge to a thread absent from the reduced trace. queue_agent_result_interaction_edge() anchors result delivery to the latest assistant item when available, or to the child thread when failed/cancelled children notify the parent without a final assistant message.

8. Failure Modes: Identity Loss Corrupts Coordination

A multi-agent system fails confusingly when it loses identity. Thread ID, agent path, nickname, tool call ID, model-visible call ID, conversation item ID, and raw payload ID serve different purposes.

Identifier	Owner	Wrong use
thread ID	runtime thread manager and graph store	treating nickname as a substitute
agent path	user-facing agent tree reference	assuming it proves persisted history
tool call ID	parent model/tool lifecycle	treating it as receiver message identity
conversation item ID	model-visible transcript item	using it before receiver delivery exists
raw payload ID	trace evidence ledger	rendering it as user-facing state

The reducer’s design lesson is balanced: reject conflicting endpoints, duplicate model-visible tool relationships, and inconsistent tool-call pairs; tolerate pending delivery, spawn fallback, missing close targets, and child results without final assistant items. Robust coordination is neither “accept every edge” nor “fail on every missing detail.” It preserves evidence and materializes semantic edges only when justified.

Trace Ledger

Question	Chapter 20 answer
Where is the user request now?	It may be in a parent thread, a child thread, a mailbox delivery, a wait status observation, a close operation, or a trace edge that links those facts.
What carries it?	`AgentControl`, `SessionSource::SubAgent(ThreadSpawn)`, graph-store spawn edges, collaboration protocol events, tool outputs, inter-agent messages, and rollout trace interaction edges.
Who owns the next decision?	The model chooses collaboration tools; handlers validate and call `AgentControl`; the graph store records live topology; clients render protocol events; the reducer reconstructs explanatory edges later.
What must remain invariant?	Child agents are threads with identity and lifecycle; live graph state stays operational; trace graph state stays evidentiary; result delivery must not depend on transcript prose alone.
What can fail here?	depth limits, invalid targets, missing threads, timed-out waits, closed branches, child failure before final assistant output, conflicting pending edges, or raw events that never produce a valid target.

Apply This

Model agents as threads. Use this whenever sub-work needs lifecycle and replay. Give each agent thread identity, status, and parentage. Pitfall: treating subagents as anonymous background prompts.
Separate live topology from trace explanation. Use the graph store for operational open/closed descendants, and use the reducer for causal edges. Pitfall: putting UI/trace policy into the live store.
Emit collaboration events. Use typed events for spawn, send, wait, close, and resume. Pitfall: asking clients to infer coordination from tool output prose.
Queue edges across races. Use pending edges when sender and receiver observations can arrive out of order. Pitfall: creating false endpoints just because the best target has not appeared yet.
Anchor failures honestly. Use thread fallback for real spawned children without message items; leave unresolvable payloads attached to raw tool evidence. Pitfall: inventing conversation items to make the graph look complete.

Closing

Multi-agent Codex is still Codex: threads, turns, tools, events, persistence, and replayable state. The new ingredient is explicit information flow across thread boundaries. Chapter 21 moves the same principle beyond local agent trees into cloud tasks, where work may run remotely but still has to return as typed task state, identity, and locally verifiable changes.

Source Map

Concept	Source anchor
Graph edge status	`codex-rs/agent-graph-store/src/types.rs`
Local graph store	`codex-rs/agent-graph-store/src/local.rs`
Agent control lifecycle	`codex-rs/core/src/agent/control.rs`
Session multi-agent integration	`codex-rs/core/src/session/multi_agents.rs`
Multi-agent tool handlers	`codex-rs/core/src/tools/handlers/multi_agents.rs`
Spawn/send/wait/close handlers	`spawn.rs`, `send_input.rs`, `wait.rs`, `close_agent.rs`
Collaboration event protocol	`codex-rs/protocol/src/protocol.rs`
Agent trace reducer	`codex-rs/rollout-trace/src/reducer/tool/agents.rs`