Compaction as a Checkpoint Protocol
Reading Contract: Use this chapter to treat compaction as checkpoint installation. Track who triggers it, what history it replaces, and what evidence remains for future turns.
Chapter 5 covered optional context budgets. Budgets delay the inevitable; they do not eliminate it. Long threads eventually exceed the useful context window. Codex’s answer is compaction, but the important design choice is that compaction is a checkpoint protocol. It does not merely ask a model to summarize old text. It installs replacement history, updates the reference context baseline, emits events, runs hooks, resets provider session state when needed, and recomputes token usage.
This is where Codex most clearly treats forgetting as a governed operation.
By the end of this chapter, you should understand local and remote compaction as two implementations of the same semantic boundary: replace the live history with a smaller history that can still support future turns.
Two Triggers, One Boundary
Compaction can happen manually, before sampling, or mid-turn after a sampling request reaches the auto-compact limit and the model still needs follow-up. The timing changes context placement:
| Timing | Initial context placement | Reason |
|---|---|---|
| Manual or pre-turn | Do not inject into replacement history; clear reference baseline. | The next regular turn can fully reinject canonical context. |
| Mid-turn | Inject before the last real user message or summary. | The model expects the compaction item to remain last while continuation still has current context. |
This distinction is the heart of the protocol. Compaction is not only “smaller history.” It is smaller history placed in a model-compatible order.
Hooks wrap compaction because compaction is a side effect on the thread’s semantic state. External policy may need to block or observe it.

Before and After
The clearest way to understand compaction is to compare history before and after. The key invariant is chronological grouping: user messages, tool calls, observations, summaries, and injected context keep their protocol identity even when old material is replaced.
Notice how the mid-turn variant re-injects initial_ctx because the continuation that follows still needs runtime facts. The pre-turn variant clears the baseline so the next regular turn rebuilds the bundle from scratch.
Local Compaction
Local compaction appends a synthesized compaction request to a clone of history, then samples the model until completion. If the context window is exceeded while compacting, it removes the oldest history item and retries, preserving recent messages and prefix cache as much as possible. After completion, it extracts the latest assistant summary, collects user messages, builds a new compacted history, optionally inserts initial context, installs a CompactedItem with replacement history, resets websocket session state, and recomputes token usage.
// Pseudocode -- illustrates local checkpoint installation.
history = cloneLiveHistory()
history.record(compactionRequest)
while not history.fitsModelWindow():
history.dropOldest()
summary = askModelForSummary(history.forPrompt(model))
replacement = buildHistory(
recentUserMessages(history),
summary,
)
if midTurn:
replacement.insertBeforeLastUser(currentInitialContext)
installReplacementHistory(
replacement,
referenceContextForPlacement,
)
The important detail is replacement history. A later resume does not have to infer what compaction meant from a free-form summary alone; it can start from the installed replacement.
The retry-by-dropping-oldest loop is small but worth noticing. It keeps prefix cache hits as high as possible by trimming from the old end. A naive implementation would shrink the entire window proportionally, losing both warm-cache prefixes and the most recent messages.
Remote Compaction
Remote compaction uses a provider compact endpoint when available. It trims function-call history to fit the compact endpoint, builds a prompt with current tools, calls the compact endpoint, filters the returned compacted history, optionally inserts initial context, records an installed checkpoint in rollout trace, replaces live history, and recomputes token usage.
Remote compaction is not just an optimization. It gives the provider a first- class conversation-history compaction path while Codex still owns the semantic install boundary. The endpoint may produce compacted history, but Codex decides which items survive and where canonical context goes.
The ownership line is the contract: the provider may produce compacted material, but Codex filters it, places it, records it, and installs it as runtime state.
Local vs Remote at a Glance
| Aspect | Local compaction | Remote compaction |
|---|---|---|
| Compaction work | Codex prompts the live model for a summary. | Provider compact endpoint produces compacted items. |
| Window protection | Drop-oldest retry loop. | Trim to compact window before request. |
| Filtering | Codex extracts summary; constructs replacement. | Codex filters returned compacted items. |
| Trace recording | Replacement history install event. | Installed-checkpoint payload in rollout trace. |
| Determinism | Depends on live model behavior. | Depends on provider compact contract. |
| Rollback compatibility | Works against arbitrary models. | Requires provider compact endpoint support. |
Both strategies emit the same kind of event downstream: an installed checkpoint that resume code can recognize. Differences are confined to who produces the compacted material.
Why Summary Alone Is Not Enough
A summary is prose. Replacement history is protocol state. The difference is huge. Replacement history can preserve user-message boundaries, compaction item placement, and current context insertion. It gives rollout reconstruction a concrete base. Summary text alone would force resume code to reinterpret old events every time.
A small “summary-only” failure illustrates the trap: a summary can remember themes, but it cannot by itself preserve user-message boundaries, tool-call pairing, or insertion points for current context.
Codex still carries summary text, but the checkpoint is the real abstraction.
Apply This
- Compaction Checkpoint. Install compacted output as replacement history. Store the post-compaction prompt base, and avoid summary-only designs that cannot reconstruct state.
- Placement Mode. Make context placement explicit for pre-turn and mid-turn compaction. Name placement strategies, and avoid one-size-fits-all summary insertion.
- Hooked Forgetting. Run policy hooks around semantic history rewrites. Treat compaction like a state-changing operation, and reject invisible background forgetting.
- Provider-Owned Work, Runtime-Owned Install. Let a provider produce compacted history, but keep filtering and installation local. Use the same rule with external summarizers, and do not trust remote output as already safe.
- Token Recompute. Recompute usage after replacement. Invalidate stale counters, and keep UI or compaction thresholds from relying on pre-compaction totals.