Chapter 15: SDKs, Daemons, and Remote Control

Reading Contract: Use this chapter to answer one practical question: when code outside the core runtime talks to Codex, which layer owns protocol semantics, process lifecycle, restart/reconnect behavior, and recovery? Track three owners - SDK stream routing, daemon process supervision, and remote-control cursor replay - and check afterward whether a “client” is a protocol client, a process wrapper, a lifecycle manager, or a transport bridge.

Codex external reach cover showing protocol client, process wrapper, daemon, remote bridge, and one shared app-server contract — SDKs, daemons, in-process callers, and remote-control bridges reach Codex through different surfaces, but the useful ones preserve the same turn contract instead of inventing new semantics.

Source boundary: source-level claims in this chapter are verified source only when they link to the pinned Codex commit 569ff6a1c400bd514ff79f5f1050a684dc3afde3 or to the Source Map. Design language such as “runtime owner”, “transport bridge”, “semantic boundary”, and “client reach” is surrounding contract inference from those visible files. This chapter does not infer OpenAI service internals.

Chapter 14 treated the app-server as the shared thread contract: multiple connections can start turns, observe notifications, answer server requests, and rejoin running work without directly owning the core session loop. Chapter 15 asks what happens one layer outward, where the same contract has to survive SDK ergonomics, process supervision, and network gaps. Real users do not usually handcraft every JSON-RPC message. They call an SDK, let a daemon find a local server, or cross a remote-control bridge.

The trap is to call all of those surfaces “clients” and stop there. That word hides the important engineering split:

A protocol client must preserve request, response, notification, and server-request semantics.
A process wrapper may only run a command and parse a JSON event stream.
A daemon is not an SDK; it owns process lifecycle, pid state, health probing, restart, and update loops.
A remote-control bridge is not a simple WebSocket; network loss makes cursor, buffering, replay, and connection identity part of correctness.

The useful mental model is therefore not “SDKs call the app-server.” It is “every external surface either preserves the app-server contract or deliberately exposes a narrower contract.”

1. Client Reach Is A Taxonomy, Not A Single Path

The codebase exposes several ways to reach Codex. Their boundaries are intentionally asymmetric.

Surface	Primary boundary	Verified anchor	Local owner / invariant
Rust app-server transport	App-server message transport	`AppServerTransport`	Normalize listener forms and carry `ConnectionOrigin` metadata without changing message meaning.
Python SDK	App-server v2 over stdio	`Codex`, `MessageRouter`	Keep one stdout reader, typed calls, per-request queues, turn streams, and result projection.
TypeScript SDK	`codex exec` process stream	`Codex`, `Thread.runStreamedInternal`	Expose a process-oriented thread API over line-delimited JSON events.
Daemon	Local app-server lifecycle	`Daemon::run`, `PidBackend`	Serialize lifecycle changes, probe socket health, publish pid state, and manage restart/update loops.
Remote control	Backend-mediated app-server stream	`start_remote_control`, `ClientTracker`	Preserve remote connection identity, chunking, ack cursors, reconnect, and replay.

The table matters because it separates contract shape from client ergonomics. First, the TypeScript SDK is not wrong because it is not the Python SDK. It wraps codex exec, writes input to a child process, reads line-delimited stdout, and parses ThreadEvent values. Second, the daemon is not a convenience flag around codex app-server; it is the boundary that makes a long-lived local server safe for short-lived tools.

1.1 Schema gives clients a shared language

The app-server protocol surface has generated types and transport normalization before SDKs make it ergonomic. The transport module accepts stdio://, unix://, ws://IP:PORT, or off through AppServerTransport::from_listen_url. The same module represents incoming work through TransportEvent, with an explicit ConnectionOrigin for Stdio, InProcess, WebSocket, and RemoteControl.

That enum is a small but important clue. Above the transport boundary, the runtime can know where a connection came from without changing what a protocol message means. Origin is connection metadata for handling and disconnection paths; the protocol message remains the semantic contract. It is not permission to silently reinterpret turn/start or server request semantics.

1.2 The public client surface should not leak every protocol detail

The Python SDK’s public class starts the app-server client and validates initialize metadata before exposing the generated method facade. The verified anchor is Codex.__init__, followed by generated methods such as thread_start. That is a user-facing surface: callers should see a thread API, not an obligation to write their own response router.

The TypeScript SDK makes a different tradeoff. Codex.startThread() and resumeThread() return a Thread, but the implementation runs the codex executable. In exec.ts, a threadId becomes resume <id>, stdin receives the prompt, stdout becomes a readline stream, and each line is yielded to the thread layer. The excerpt below is abbreviated; the omitted middle block handles images, environment construction, API key injection, stderr collection, and exit handling.

if (args.threadId) {
  commandArgs.push("resume", args.threadId);
}

// ... image arguments and environment setup omitted ...

const child = spawn(this.executablePath, commandArgs, {
  env,
  signal: args.signal,
});

child.stdin.write(args.input);
child.stdin.end();

const rl = readline.createInterface({
  input: child.stdout,
  crlfDelay: Infinity,
});

for await (const line of rl) {
  yield line as string;
}

// ... spawn errors and non-zero exit handling omitted ...

This is a valid SDK boundary, but it is narrower than “app-server client.” It is a command wrapper with structured events. Claims about server-request replay, daemon-managed sockets, and remote-control cursors should not be projected onto that SDK unless the pinned source shows those hooks.

2. SDK Routing: One Ordered Stream, Many Local Owners

Python SDK MessageRouter diagram showing one process stdout reader routing messages to response waiters, turn queues, pending turn replay, global queue, and fail_all — The Python SDK keeps one reader on the ordered stdout stream, routes responses and notifications to local owners, and uses the same router to wake blocked operations when the reader fails.

The Python SDK’s most important internal invariant is visible in MessageRouter: “only the reader thread should consume stdout.” That sentence explains the whole class. Stdio is ordered, but it is not automatically owned. If two high-level SDK methods both read the process output stream, a response can disappear into the wrong caller.

The router therefore creates local ownership:

class MessageRouter:
    """Route reader-thread messages to the SDK operation waiting for them."""

    def __init__(self) -> None:
        self._lock = threading.Lock()
        self._response_waiters: dict[str, queue.Queue[ResponseQueueItem]] = {}
        self._turn_notifications: dict[str, queue.Queue[NotificationQueueItem]] = {}
        self._pending_turn_notifications: dict[str, deque[Notification]] = {}
        self._global_notifications: queue.Queue[NotificationQueueItem] = queue.Queue()

The four routing structures in that excerpt are not implementation decoration. They are the SDK’s local equivalent of the app-server contract:

_response_waiters maps request ids to one-shot JSON-RPC response queues.
_turn_notifications maps turn ids to queues for streams the caller is already consuming.
_pending_turn_notifications maps turn ids to deques for early events that arrive after turn/start but before the caller has registered the stream.
_global_notifications receives notifications that are not scoped to a turn.

The early-event case is the one that usually breaks naive SDKs. The source handles it in register_turn: it pops pending events for the turn and puts them into the newly registered queue. route_notification does the other half, buffering non-completed turn notifications when the stream queue is not yet present.

def register_turn(self, turn_id: str) -> None:
    turn_queue: queue.Queue[NotificationQueueItem] = queue.Queue()
    with self._lock:
        if turn_id in self._turn_notifications:
            return
        pending = self._pending_turn_notifications.pop(turn_id, deque())
        self._turn_notifications[turn_id] = turn_queue
    for notification in pending:
        turn_queue.put(notification)

This is the client-side version of a runtime recovery rule: “events may arrive before the consumer is ready, but they still belong to the same turn.” Without it, fast turns would become flaky only under timing pressure.

2.1 Failure must wake every blocked owner

The router also owns failure propagation. fail_all snapshots registered turn queues, clears response waiters and the pending-turn buffer, then puts the same exception into every response waiter, registered turn queue, and the global-notification queue. The source comment states the invariant directly: no SDK call should block forever waiting for a response that cannot arrive.

That is why a stream router is more than a multiplexer. It is the local failure boundary. Once the reader thread exits, the router is the only component that knows which user-facing operations might be sleeping.

2.2 Result collection is a projection, not the whole stream

The high-level run() result is built from notifications. _collect_run_result collects completed items, token usage, and the turn/completed notification for the target turn. It raises if the turn completion event never arrives.

That means the public return value is a projection of the stream, not a replacement for the stream. A caller that only needs final text can use run(). A caller that needs progressive items, approvals, or custom UI behavior must treat the notification stream as the richer contract.

2.3 Python and TypeScript are different on purpose

Question	Python SDK	TypeScript SDK
Primary input/output	App-server protocol over stdio.	`codex exec` process stdin/stdout.
Stream owner	`MessageRouter` owns one reader and per-operation queues.	`readline` yields process output lines.
Turn identity	Protocol methods and generated models expose thread/turn operations.	`thread.started` can set `_id`; `resume <id>` is passed to the executable.
Failure boundary	Router wakes waiters and streams.	Child process errors, non-zero exit, and JSON parse failures become SDK errors.
Best fit	Clients needing app-server semantics.	Scripts needing structured execution events.

This distinction keeps the article honest. If you need shared app-server thread ownership, server-request handling, and remote-control semantics, use a surface that actually exposes that contract. If you need a simple language wrapper around a command, the narrower event-stream surface can be the better tool.

3. Daemon Lifecycle: Reliability Is Not A PID File

Daemon lifecycle diagram showing command, lock, probe, pid, backend, ready, update, and stale pid cleanup loop — The daemon serializes lifecycle commands with an operation lock, probes actual socket health, reserves pid state under a separate lock, and cleans stale pid records before publishing readiness.

A local daemon sounds like a small convenience until two callers try to start, restart, update, or stop the same app-server at the same time. The source makes that risk explicit. The daemon defines state filenames such as app-server.pid, app-server-updater.pid, and daemon.lock, then routes lifecycle commands through Daemon::run.

The important pattern is that Start, Restart, and Stop acquire the operation lock before touching process state. Version does not, because it is a probe-style read.

async fn run(&self, command: LifecycleCommand) -> Result<LifecycleOutput> {
    match command {
        LifecycleCommand::Start => {
            let _operation_lock = self.acquire_operation_lock().await?;
            self.start().await
        }
        LifecycleCommand::Restart => {
            let _operation_lock = self.acquire_operation_lock().await?;
            self.restart().await
        }
        LifecycleCommand::Stop => {
            let _operation_lock = self.acquire_operation_lock().await?;
            self.stop().await
        }
        LifecycleCommand::Version => self.version().await,
    }
}

The lock itself is not an abstract mutex hidden in memory. acquire_operation_lock opens the daemon lock file, tries to lock it until a timeout, and sleeps between attempts. That file-backed design matters because separate CLI invocations and update loops can coordinate through the filesystem.

3.1 Probe first, then trust process records

The daemon does not equate “pid file exists” with “server is usable.” start first loads settings and calls client::probe on the control socket. The probe connects, upgrades to a WebSocket, sends initialize, waits for the matching response, sends initialized, closes, and parses the app-server version from the response user agent.

That is a stronger signal than a pid file. A stale pid record can remain after a crash. A healthy socket that responds to initialize proves the app-server is actually accepting the protocol.

The daemon repeats this discipline in restart: if a server is running but not managed by the daemon, restart returns an error instead of killing an unknown process. wait_until_ready polls the same probe until the app-server becomes ready or the start timeout expires.

3.2 PID reservation protects the startup gap

The pid backend has its own lock because startup has a dangerous middle state: one process has decided to start the server, but the pid record is not yet fully published. PidBackend::start creates the pid directory, acquires a reservation lock, creates the pid file with create_new, removes stale records, spawns the detached process, reads its start time, writes a temp record, and renames it into place.

The read side treats missing, empty, starting, running, and stale states differently. read_pid_file_state returns Starting when the pid file is missing but the reservation lock is active. refresh_after_stale_record reacquires the reservation lock before removing a stale record.

This is the daemon’s invariant ledger:

Pressure	Simpler approach that fails	Source mechanism	Invariant protected
Two commands start together.	Both spawn a server.	Operation lock in `Daemon::run`.	One lifecycle mutation at a time.
PID exists but server is dead.	Trust the pid file.	Socket `initialize` probe.	Readiness means protocol acceptance.
Process dies after reservation.	Treat empty pid as running.	Reservation lock and starting state.	Startup gap is observable.
Update loop meets a busy daemon.	Restart anyway.	`try_restart_if_running` returns `Busy` when lock acquisition fails.	Updates do not race user lifecycle commands.
Server belongs to another owner.	Kill whatever answers the socket.	Managed-backend checks before restart/stop.	Daemon only manages its own backend.

The daemon is therefore not just “where the server starts.” It is the component that makes local app-server reach stable enough for SDKs, UIs, and update flows to depend on.

4. Remote Control: Network Loss Becomes Runtime State

Remote-control cursor replay diagram showing remote client, websocket, app-server, client tracker, message chunks, outbound buffer, ack cursor, reconnect, and replay — Remote control turns network uncertainty into explicit state: client identity, stream identity, message chunks, outbound buffer entries, ack cursors, reconnect, and replay.

Remote control is the chapter’s sharpest boundary because it crosses a network. A local stdio client can often fail fast when the child process exits. A remote client can disconnect while the local runtime continues emitting server messages. If the bridge does not remember what the remote side has acknowledged, the client can silently miss notifications after reconnect.

The remote-control module starts with explicit enablement and status. RemoteControlStartConfig carries the remote-control URL and installation id. RemoteControlHandle can enable or disable the bridge and expose status updates. start_remote_control normalizes the target immediately only when remote control is initially enabled; otherwise the later connect path normalizes it. It also initializes status as Connecting or Disabled, creates the websocket runner, and returns the handle.

The protocol makes the recovery fields visible. ClientEnvelope includes client_id, optional stream_id, optional seq_id, and optional cursor. The source comment says seq_id is the backend-generated per-stream cursor for acknowledgements. ServerEvent can carry ServerMessage, ServerMessageChunk, Ack, or Pong.

pub(crate) struct ClientEnvelope {
    pub(crate) event: ClientEvent,
    pub(crate) client_id: ClientId,
    pub(crate) stream_id: Option<StreamId>,
    /// For `Ack`, this is the backend-generated per-stream cursor over
    /// `ServerEnvelope.seq_id`.
    pub(crate) seq_id: Option<u64>,
    pub(crate) cursor: Option<String>,
}

The URL normalizer is also deliberately narrow. normalize_remote_control_url accepts HTTPS URLs for allowed ChatGPT hosts and HTTP/HTTPS only for localhost, then derives enroll and websocket endpoints. This is a transport contract, not a general-purpose arbitrary tunnel.

4.1 ClientTracker turns remote messages into app-server connections

ClientTracker::handle_message maps incoming remote envelopes to local app-server connection events. On an initialize-style message, it opens a new connection with ConnectionOrigin::RemoteControl. Later messages for the same (client_id, stream_id) become TransportEvent::IncomingMessage.

That mapping is the bridge’s semantic hinge. Remote control is not just “write JSON to a websocket.” It creates a normal app-server connection origin, then lets the existing message processor handle app-server semantics above the transport boundary.

4.2 OutboundBuffer is the replay ledger

The outbound side stores unacknowledged server envelopes by (client_id, stream_id). BoundedOutboundBuffer inserts every server envelope and removes only what the ack cursor covers.

fn ack(
    &mut self,
    client_id: &ClientId,
    stream_id: &StreamId,
    acked_seq_id: u64,
    acked_segment_id: Option<usize>,
) {
    let key = (client_id.clone(), stream_id.clone());
    let Some(buffer) = self.buffer_by_stream.get_mut(&key) else {
        return;
    };
    let acked_cursor = (acked_seq_id, acked_segment_id.unwrap_or(usize::MAX));
    buffer.retain(|server_envelope| {
        let envelope_cursor = (
            server_envelope.seq_id,
            server_envelope.event.segment_id().unwrap_or_default(),
        );
        let is_acked = envelope_cursor <= acked_cursor;
        !is_acked
    });
}

The full source also updates a usage watch channel and removes empty buffers. The important article-level point is the comparison key: sequence id plus optional segment id. That is what lets segment acknowledgements advance at wire-chunk granularity instead of pretending a large server message is atomic.

run_server_writer_inner replays existing outbound-buffer envelopes when a websocket writer starts, then assigns contiguous sequence ids per stream to new server events, splits large messages for transport, inserts them into the buffer, and sends their JSON payloads. run_websocket_reader_inner reads client envelopes, stores the subscribe cursor, and calls outbound_buffer.ack(...) when it sees an ack with seq_id and stream_id.

Shape-level sequence:

connect with subscribe cursor C
-> writer replays unacked server envelopes still in outbound buffer
-> new server event gets next per-stream sequence id
-> large message may split into chunks
-> remote client sends ack cursor
-> outbound buffer drops covered sequence/segment entries

No provider-internal behavior is needed for this explanation. The visible source contract already shows why cursor and replay exist: remote control must survive a websocket reconnect without losing app-server messages that the remote side has not acknowledged.

4.3 Chunk reassembly is a safety boundary

Remote clients can send chunked messages too. The REMOTE_CONTROL_SEGMENT_* constants define target, max segment, max reassembled size, and max segment count limits, while ClientSegmentReassembler owns the in-progress assemblies. observe drops segmented envelopes without required seq_id or stream_id, rejects invalid counts and sizes, and resets state on stream changes. The AssemblyUpdate enum names the outcomes; the observe branch rejects old, mismatched, out-of-order, oversized, invalid-base64, and invalid-JSON chunks before forwarding a reassembled ClientMessage.

That is not just defensive parsing. It protects the app-server from receiving half-assembled or replay-confused protocol messages after a network interruption.

5. Compatibility Belongs At The Edge

SDKs, daemon supervision, and remote control all add compatibility costs, but the code tries to keep those costs at the boundary:

Boundary	Compatibility pressure	Edge response
Python SDK	Initialize metadata can arrive through `serverInfo` or user-agent shape.	`_validate_initialize` normalizes required metadata before exposing the SDK.
TypeScript SDK	Existing sessions are resumed through a command-line surface.	`exec.ts` passes `resume <threadId>` to the executable.
Daemon	Server may be already running but not daemon-managed.	`start`, `restart`, and `stop` separate healthy unmanaged servers from daemon-owned backends.
Remote control	Old clients may omit `stream_id` during initialization.	`ClientTracker` contains a documented legacy fallback around stream ids.
Remote reconnect	Remote side may have received only part of a stream.	`BoundedOutboundBuffer` retains unacked envelopes by client and stream.

The pattern is transferable: compatibility should be close to the surface that created it. A stream-id fallback belongs in the remote tracker, not in the core turn loop. User-agent normalization belongs in SDK initialize handling, not in every downstream call. Daemon ownership checks belong in process supervision, not in protocol message routing.

Common Misreadings

Misreading	Correction
”All SDKs are app-server clients.”	The Python SDK is an app-server protocol client; the TypeScript SDK wraps `codex exec` and parses JSON event lines.
”A daemon is just a pid file.”	The daemon uses operation locks, socket probes, pid reservation locks, stale-record cleanup, and readiness polling.
”Remote control is just a WebSocket.”	Remote control adds enrollment, client and stream identity, chunking, ack cursor, outbound buffering, reconnect, and replay.
”Transport differences change turn semantics.”	Transport origin is visible, but app-server protocol messages should keep their meaning above the transport boundary.
”Final SDK results are the whole stream.”	`run()` projections collect completed items, final response, usage, and failure status; the notification stream remains richer.

Apply This

Name the client boundary before judging it. Protocol client, process wrapper, daemon, and remote bridge have different obligations.
Use one reader per ordered stream. Route responses and notifications internally so concurrent user code cannot steal protocol bytes.
Probe health, not just existence. A pid file is evidence; an initialize response from the control socket is stronger evidence.
Make reconnect explicit. If a transport can disconnect while the runtime continues, cursor, buffer, and replay are part of correctness.
Keep compatibility near its owner. Normalize SDK metadata in the SDK, process ownership in the daemon, and stream-id fallbacks in remote tracking.

Closing

SDKs, daemons, and remote control are not accessory code around the “real” runtime. They are where the app-server contract becomes usable from programs, scripts, local supervisors, and remote clients. The source-level theme is consistent: preserve one semantic contract, but let each boundary own the mechanics it is uniquely responsible for. Chapter 16 moves to the most visible local consumer of that contract: the terminal UI.

Source Map

Concept	Source anchor
Transport modes and connection origin	`transport/mod.rs`
Python SDK public API and initialize normalization	`api.py`
Python SDK message routing	`_message_router.py`
Python SDK run-result projection	`_run.py`
TypeScript SDK public API	`codex.ts`
TypeScript event stream and `run()` collection	`thread.ts`
TypeScript process wrapper	`exec.ts`
Daemon lifecycle commands, probe, bootstrap, and operation lock	`app-server-daemon/src/lib.rs`
Daemon control-socket probe	`client.rs`
PID backend reservation, stale cleanup, and process start	`backend/pid.rs`
Remote-control start handle	`remote_control/mod.rs`
Remote-control envelope, chunks, ack cursor, and URL normalization	`protocol.rs`
Remote-control client tracking and connection origin	`client_tracker.rs`
Remote-control websocket buffering, reconnect writer, and ack handling	`websocket.rs`
Remote-control segment reassembly and drop rules	`segment.rs`