Provider Boundary: Transport Changes, Events Stay Stable

Reading Contract: Treat this chapter as a source map for the boundary between model providers and the Codex runtime. Track provider data, runtime provider behavior, turn-local transport state, and adjacent runtime paths. After the chapter, you should be able to explain why HTTP streaming, Responses-over-WebSocket, local providers, Bedrock signing, model catalogs, realtime sessions, and backend tasks do not collapse into one “model call.”

Provider data feeding a runtime provider with auth, capabilities, model manager, typed API, and turn loop boundary — Provider configuration begins as data; the turn loop only talks to a runtime provider that can resolve auth, capabilities, model metadata, and typed API behavior.

Source boundary: direct source claims in this chapter are pinned to OpenAI Codex commit 569ff6a1c400bd514ff79f5f1050a684dc3afde3. ModelProviderInfo, ModelProvider, ModelClientSession, ResponseEvent, ModelsManager, Bedrock auth, Ollama readiness, realtime conversation startup, and CloudBackend are verified source at the linked anchors. Claims about why the runtime keeps transport mechanics below one event vocabulary are surrounding contract inference from those visible types and call sites, not claims about private provider internals.

Four local terms will carry the argument. Provider data is the serializable ModelProviderInfo record: URL, auth fields, retry policy, headers, wire API, and WebSocket support. Runtime provider is the ModelProvider trait object that resolves account state, auth, capabilities, API-provider shape, and model manager choice for the current process. Typed event vocabulary is the ResponseEvent enum that the turn loop sees after a transport-specific stream has been parsed. Adjacent runtime path means a path that may share credentials or base URLs with model calls but has its own lifecycle, such as realtime media/session setup or cloud backend tasks.

Chapter 6 followed a turn until sampling became necessary. This chapter opens that boundary. The practical pressure is simple: the turn loop wants a stable stream of runtime facts, while providers vary across auth, signing, transport, catalog, locality, and cloud workflow concerns. Codex keeps that variety below a small runtime contract before scheduler policy resumes.

Problem: provider differences touch URLs, credentials, request signing, stream framing, retry, model visibility, local readiness, and cloud tasks, but the turn loop cannot afford to learn every provider dialect.

Thesis: Codex converts provider-specific mechanics into runtime provider behavior, typed API requests, and normalized response events before the turn scheduler resumes policy decisions.

Mental model: provider data describes possible communication; runtime provider behavior chooses how this session communicates; the turn loop receives typed events and adjacent task results.

Guiding questions: What is configuration, what is behavior, which transport is active for this turn, and which paths are not model inference at all?

1. Provider Data Is Not Runtime Behavior

1.1 `ModelProviderInfo` Is a Serialized Contract

The data layer is explicit in codex-rs/model-provider-info/src/lib.rs. ModelProviderInfo is a serializable provider definition. It can hold a base URL, an environment variable for a key, a discouraged literal bearer token, command-backed auth, AWS SigV4 auth, the wire API, query parameters, static or environment-driven headers, retry settings, stream idle timeout, WebSocket connect timeout, OpenAI-auth requirement, and whether Responses-over-WebSocket is supported. This trimmed excerpt keeps only the fields needed for the provider-boundary argument:

pub struct ModelProviderInfo {
    pub name: String,
    pub base_url: Option<String>,
    pub env_key: Option<String>,
    // ...
    pub experimental_bearer_token: Option<String>,
    pub auth: Option<ModelProviderAuthInfo>,
    pub aws: Option<ModelProviderAwsAuthInfo>,
    pub wire_api: WireApi,
    pub query_params: Option<HashMap<String, String>>,
    pub http_headers: Option<HashMap<String, String>>,
    pub env_http_headers: Option<HashMap<String, String>>,
    pub request_max_retries: Option<u64>,
    pub stream_max_retries: Option<u64>,
    pub stream_idle_timeout_ms: Option<u64>,
    pub websocket_connect_timeout_ms: Option<u64>,
    pub requires_openai_auth: bool,
    pub supports_websockets: bool,
}

The shape is deliberately descriptive. It says what can be configured; it does not fetch account state, sign a request, check a local server, or choose a model catalog. The file also validates incompatible combinations. For example, the AWS branch rejects supports_websockets because SigV4 signing for WebSocket upgrade requests is not supported in that snapshot (lib.rs#L147-L170).

This is the first invariant: a provider config may describe a risky or special shape, but the runtime gates that shape before any turn depends on it.

1.2 `ModelProvider` Owns Behavior at Request Time

Runtime behavior lives in the ModelProvider trait. The trait still exposes metadata, but it also owns capability upper bounds, attestation support, auth-manager access, current auth, account state, API-provider conversion, runtime base URL, request auth provider, and model manager construction.

pub trait ModelProvider: fmt::Debug + Send + Sync {
    fn info(&self) -> &ModelProviderInfo;
    fn capabilities(&self) -> ProviderCapabilities;
    fn supports_attestation(&self) -> bool;
    fn auth_manager(&self) -> Option<Arc<AuthManager>>;
    async fn auth(&self) -> Option<CodexAuth>;
    fn account_state(&self) -> ProviderAccountResult;
    async fn api_provider(&self) -> Result<Provider>;
    async fn runtime_base_url(&self) -> Result<Option<String>>;
    async fn api_auth(&self) -> Result<SharedAuthProvider>;
    fn models_manager(&self, codex_home: PathBuf, catalog: Option<ModelsResponse>)
        -> SharedModelsManager;
}

The factory create_model_provider shows why this split matters: most provider data becomes a ConfiguredModelProvider, while Amazon Bedrock becomes a dedicated AmazonBedrockModelProvider. That distinction is not cosmetic. Bedrock needs AWS account state, a runtime base URL, body-aware signing, and a static model manager, so Codex should not treat it like a bearer-token OpenAI-compatible endpoint just because the request surface looks similar.

2. Transport Choice Ends Before the Turn Loop Sees Events

HTTP SSE and WebSocket streams entering a stream mapper that emits ResponseEvent items for the turn loop — HTTP SSE and Responses-over-WebSocket have different mechanics, but they converge on `ResponseEvent` before the turn loop makes policy decisions.

2.1 The Turn Loop Consumes `ResponseEvent`

The stable vocabulary is in codex-rs/codex-api/src/common.rs. A response stream can report creation, output items, server model metadata, verification requirements, whether past reasoning was included, completion, text deltas, tool-call argument deltas, reasoning deltas, rate limits, and a models ETag. The excerpt below is shortened, but it keeps the event categories this section relies on:

pub enum ResponseEvent {
    Created,
    OutputItemDone(ResponseItem),
    OutputItemAdded(ResponseItem),
    ServerModel(String),
    ModelVerifications(Vec<ModelVerification>),
    ServerReasoningIncluded(bool),
    Completed { response_id: String, token_usage: Option<TokenUsage>, end_turn: Option<bool> },
    OutputTextDelta(String),
    ToolCallInputDelta { item_id: String, call_id: Option<String>, delta: String },
    ReasoningSummaryDelta { delta: String, summary_index: i64 },
    ReasoningContentDelta { delta: String, content_index: i64 },
    ReasoningSummaryPartAdded { summary_index: i64 },
    RateLimits(RateLimitSnapshot),
    ModelsEtag(String),
}

This enum is the boundary that keeps provider transport from leaking upward as raw SSE lines, WebSocket frames, or provider-specific chunks. Once an event has crossed it, the turn loop can decide whether to render progress, record a completed item, update rate limits, dispatch a tool call, continue sampling, or settle the turn.

The HTTP path builds a Responses request, asks for text/event-stream, and spawns a response stream in endpoint/responses.rs. SSE parsing then maps event kinds through process_responses_event and process_sse.

The WebSocket path creates a ResponsesWebsocketConnection, connects with merged provider headers and auth (responses_websocket.rs#L340-L452), then runs a WebSocket response stream (responses_websocket.rs#L574-L685). That WebSocket loop also feeds response events into the same Responses event processing boundary.

The useful conclusion is not “WebSocket is better.” It is that Codex can switch transport paths only because both paths become the same typed stream before the rest of the agent sees them.

2.3 `ModelClientSession` Makes Transport State Turn-Local

ModelClient is session-scoped, while ModelClientSession is turn-scoped. The comment explains the rule: a ModelClientSession lazily establishes a Responses WebSocket, reuses it across multiple requests within the turn, remembers the last full request for incremental WebSocket payloads, and stores the x-codex-turn-state token. It must not be reused across turns.

Transport selection enforces the same ownership. The model client only enables Responses-over-WebSocket when the provider supports it, session fallback has not disabled it, and the SSE fixture is not active (client.rs#L767-L779).

pub fn responses_websocket_enabled(&self) -> bool {
    if !self.state.provider.info().supports_websockets
        || self.state.disable_websockets.load(Ordering::Relaxed)
        || (*CODEX_RS_SSE_FIXTURE).is_some()
    {
        return false;
    }
    true
}

The per-turn stream method prefers WebSocket for the Responses wire API, but if the WebSocket path returns FallbackToHttp, it switches the session to HTTP and then calls the HTTP Responses API path. try_switch_fallback_transport (client.rs#L1614-L1630) then forces subsequent requests in the Codex session onto HTTP and resets the turn-local WebSocket state.

This is the second invariant: a transport optimization must not become hidden global state that changes the meaning of later turns.

3. Model Metadata Is Runtime Infrastructure

Bundled model catalog, models cache, remote models endpoint, model manager, visible presets, and model info feeding the turn loop — Model metadata is cache-plus-overlay infrastructure: the turn loop needs model facts, not a hard-coded picker list.

3.1 A Model Manager Produces Runtime Facts

The model manager is not merely UI support. The ModelsManager trait lists available models, returns a raw catalog, exposes cached remote models, filters picker presets by auth mode and visibility, chooses a default model, resolves ModelInfo, and refreshes when an ETag changes.

async fn list_models(&self, refresh_strategy: RefreshStrategy) -> Vec<ModelPreset> {
    let catalog = self.raw_model_catalog(refresh_strategy).await;
    self.build_available_models(catalog.models)
}

fn build_available_models(&self, mut remote_models: Vec<ModelInfo>) -> Vec<ModelPreset> {
    remote_models.sort_by(|a, b| a.priority.cmp(&b.priority));
    let mut presets: Vec<ModelPreset> = remote_models.into_iter().map(Into::into).collect();
    let uses_codex_backend = self.auth_manager().is_some_and(
        AuthManager::current_auth_uses_codex_backend,
    );
    presets = ModelPreset::filter_by_auth(presets, uses_codex_backend);
    ModelPreset::mark_default_by_picker_visibility(&mut presets);
    presets
}

Those facts affect more than a dropdown. ModelInfo influences context limits, auto-compaction thresholds, reasoning controls, model visibility, and which defaults make sense for the current auth mode. Chapter 6 already showed that the turn loop reads model info before deciding compaction and sampling behavior; this chapter explains where that info comes from.

3.2 The Catalog Has a Baseline, a Cache, and an ETag Boundary

The OpenAI model manager can combine a bundled catalog, an optional configured catalog, a disk cache, and a remote /models endpoint. The refresh path lives around manager.rs#L225-L359. The cache helpers track path, TTL, freshness, and persistence in cache.rs. The provider endpoint fetches remote catalog data with a bounded timeout in models_endpoint.rs, and the API endpoint attaches client_version when calling models (endpoint/models.rs#L31-L73).

This is an operational compromise. A bundled catalog lets Codex start and work offline. A cache avoids paying network cost every turn. An ETag lets the stream notify the runtime that model metadata changed. Remote refresh gives the backend a way to update model presets without shipping a new binary. The cost is that provider identity, auth mode, visibility, cache freshness, and remote catalog state all matter when interpreting a “model list.”

4. Auth Is Applied After the Request Exists

Provider auth modes, prepared request, body signing, signed request, provider API, and no WebSocket boundary — Some auth modes can attach headers early; Bedrock-style SigV4 must sign the prepared body, so the auth boundary sits after request construction and before transport send.

4.1 Simple Tokens and Command Auth Resolve Before Send

The generic auth path is routed through auth_manager_for_provider and resolve_provider_auth. That layer can create bearer-token auth providers or unauthenticated providers for local/test cases. The important design point is not which credential form is used; it is that the request sender receives a SharedAuthProvider without knowing whether the credential came from ChatGPT auth, an API key, a command, a local provider, or no auth at all.

4.2 Bedrock Shows Why the Request Body Matters

Amazon Bedrock is the case that makes the boundary obvious. Its provider implementation declares provider account state, disables namespace tools, image generation, and web search in its capability upper bounds, computes a runtime base URL, resolves AWS auth, and uses a static model manager (amazon_bedrock/mod.rs#L51-L103).

The SigV4 auth provider then mutates a prepared request:

async fn apply_auth(&self, request: Request) -> Result<Request, AuthError> {
    let mut request = request;
    remove_headers_not_preserved_by_bedrock_mantle(&mut request.headers);
    let prepared = request.prepare_body_for_send().map_err(AuthError::Build)?;
    let signed = self.context.sign(AwsRequestToSign {
        method: request.method.clone(),
        url: request.url.clone(),
        headers: prepared.headers.clone(),
        body: prepared.body_bytes(),
    }).await?;

    request.url = signed.url;
    request.headers = signed.headers;
    request.body = prepared.body.map(RequestBody::Raw);
    request.compression = RequestCompression::None;
    Ok(request)
}

The full code is in amazon_bedrock/auth.rs#L88-L139. Two details matter. First, the signature covers method, URL, headers, and body, so auth cannot be finalized until the request has been built. Second, the code sets RequestCompression::None after preparing the body, because the signed bytes must be the bytes sent.

This is why provider validation rejects AWS plus supports_websockets in this snapshot. That is not a claim that Bedrock can never support WebSockets. It is a verified source boundary: this implementation does not yet support SigV4 signing for WebSocket upgrade requests.

5. Local Providers Still Fit the Contract

5.1 Ollama Adds Readiness Work, Not a New Agent Loop

Local providers have a different failure shape. They may be missing a server, missing a model, or running an incompatible local API version. The Ollama path encodes that operational work in ensure_oss_ready and ensure_responses_supported. It checks that a local Ollama server is reachable, fetches local models, pulls the default model when needed, and rejects versions older than the Responses API minimum.

The invariant is the same: local readiness is provider work. Once the model call begins, the runtime wants typed response events, not “Ollama-specific agent policy.”

5.2 LM Studio Is Another Provider Surface, Not a Separate Runtime

LM Studio has its own client checks: provider construction, server availability, model loading, model listing, and model download through lms live in codex-rs/lmstudio/src/client.rs. Those checks differ operationally from Ollama, but their architectural placement is the same. They prepare a provider surface for the model client; they do not rewrite how the turn loop handles response items, tools, history, or continuation.

6. Backend Tasks and Realtime Are Adjacent Runtime Paths

Turn sampling response events entering runtime boundary and splitting toward realtime sideband, backend tasks, task lifecycle, and apply diff paths — Realtime sessions and backend tasks may sit beside model sampling, but they carry different lifecycles and should not be treated as ordinary provider transports.

6.1 Cloud Backend Tasks Have Task Semantics

The cloud task client exposes a task lifecycle, not a model stream. Its CloudBackend trait can list tasks, fetch summaries, fetch diffs and messages, read the creating prompt plus assistant messages, list sibling attempts, dry-run an apply, apply a task diff, and create a task. This excerpt keeps the lifecycle methods that distinguish tasks from response streams:

pub trait CloudBackend: Send + Sync {
    async fn list_tasks(&self, env: Option<&str>, limit: Option<i64>, cursor: Option<&str>)
        -> Result<TaskListPage>;
    async fn get_task_summary(&self, id: TaskId) -> Result<TaskSummary>;
    async fn get_task_diff(&self, id: TaskId) -> Result<Option<String>>;
    async fn get_task_messages(&self, id: TaskId) -> Result<Vec<String>>;
    async fn get_task_text(&self, id: TaskId) -> Result<TaskText>;
    async fn list_sibling_attempts(&self, task: TaskId, turn_id: String)
        -> Result<Vec<TurnAttempt>>;
    async fn apply_task_preflight(&self, id: TaskId, diff_override: Option<String>)
        -> Result<ApplyOutcome>;
    async fn apply_task(&self, id: TaskId, diff_override: Option<String>)
        -> Result<ApplyOutcome>;
    async fn create_task(&self, env_id: &str, prompt: &str, git_ref: &str,
        qa_mode: bool, best_of_n: usize) -> Result<CreatedTask>;
}

The HTTP implementation builds task URLs and payloads separately in cloud-tasks-client/src/http.rs. Backend tasks may share authentication and base URL concerns with the backend client (backend-client/src/client.rs#L143-L225), but their lifecycle is task state, attempts, diffs, and apply outcomes. A task diff is not a ResponseEvent.

6.2 Realtime Has a Media Plane and a Sideband

Realtime is adjacent for a different reason. Its startup path in realtime_conversation.rs creates bounded audio, text, handoff, and event channels; builds a RealtimeWebsocketClient; and then chooses either a WebRTC call with a sideband input task or a direct realtime WebSocket connection. Startup context is assembled separately in realtime_context.rs, and WebRTC call creation lives in endpoint/realtime_call.rs.

That does not make realtime another version of the normal Responses stream. It has media input, session configuration, handoff output, sideband headers, and event fanout. The runtime can bridge it into the same product experience, but the source keeps the lifecycle separate.

Apply This

Store provider definitions as data, but keep account state, auth, capabilities, and model-manager choice in runtime provider behavior.
Let HTTP streaming and Responses-over-WebSocket differ below the parsing boundary, then converge on ResponseEvent.
Treat ModelClientSession as turn-local transport state; do not reuse sticky routing tokens across turns.
Treat model catalogs as runtime infrastructure with baseline, cache, remote overlay, visibility filtering, and ETag refresh.
Apply body-aware auth after request construction, and keep local readiness, realtime setup, and cloud task APIs adjacent to inference rather than inside the core model-stream abstraction.

Closing

The provider boundary is what lets Chapter 6’s turn scheduler stay focused on agent work. A turn can ask for a model stream without learning AWS signing, Ollama pulls, LM Studio downloads, /models cache refresh, WebSocket upgrade state, realtime sidebands, or cloud task apply semantics. Those details still exist; they are just owned by the layer that can verify and normalize them.

Chapter 8 moves from sampling to evidence: how rollout persistence, trace bundles, reducers, analytics, OTEL spans, and debug context make the runtime observable after provider differences have been normalized.

Source Map

Concept	Source anchor
Provider data shape	`codex-rs/model-provider-info/src/lib.rs`
AWS/WebSocket validation boundary	`ModelProviderInfo::validate`
Runtime provider trait	`codex-rs/model-provider/src/provider.rs`
Provider factory	`create_model_provider`
Response event vocabulary	`ResponseEvent`
HTTP Responses stream	`endpoint/responses.rs`
WebSocket Responses stream	`endpoint/responses_websocket.rs`
Turn-scoped model client session	`ModelClientSession`
Transport selection and fallback	`ModelClientSession::stream`
Model manager contract	`ModelsManager`
Model cache	`models-manager/src/cache.rs`
Bedrock runtime provider	`amazon_bedrock/mod.rs`
Bedrock body signing	`amazon_bedrock/auth.rs`
Ollama readiness and version gate	`codex-rs/ollama/src/lib.rs`
LM Studio local client checks	`codex-rs/lmstudio/src/client.rs`
Cloud task lifecycle	`CloudBackend`
Realtime startup	`realtime_conversation.rs`