When the Window Fills Up
Every serious agent eventually has to answer the same question: when the conversation is too big to resend, what survives? The repos in this set answer with very different machinery — micro-compaction, pruning, structured summaries, folding, visibility metadata, and emergency truncation.
Compaction is not the same thing as prompt caching
Prompt caching is about reusing a stable prefix so the provider discounts repeated input. Compaction is a different survival mechanism: it changes the conversation itself so the next request fits in the model window. One tries to preserve the prefix exactly. The other decides which parts of history can be folded, truncated, summarized, hidden, or replaced.
The architectural split
Caching rewards stability. Compaction accepts loss. The best agents do not blur those jobs: they keep the reusable prefix stable, then compact the rest with explicit logic around thresholds, summaries, and tail preservation.
The pattern taxonomy
Micro-compaction
Cheap first pass: shrink or truncate stale tool results before paying for a real summarization step.
Pruning before summary
Remove the least valuable blobs first, especially verbose tool output, then summarize only if that is still not enough.
Structured summary insertion
Replace older history with a summary that preserves explicit slots like goal, progress, decisions, files, or next steps.
Conversation folding
Convert old turns into one synthetic message while keeping a recent tail verbatim, often with special handling for pinned constraints or skills.
Visibility-only compaction
Keep the original messages for user display or audit, but hide them from the agent and replace them with an agent-only continuation summary.
Reactive truncation and circuit breakers
When token estimates fail or the provider rejects the request, better agents retry mechanically and stop hammering if compaction keeps failing.
Comparison matrix
| Repo | Main strategy | What it preserves | Why it stands out |
|---|---|---|---|
| Claude Code source snapshot | Auto-compaction + reactive fallback | Recent tail, command surface, runtime control state | Good example of compaction as part of the core loop |
| Goose | Full-message compaction plus tool-pair summarization | Latest user text and visibility metadata | Best “agent-visible vs user-visible” story |
| Pi Mono | Structured compaction entries | Goal, progress, decisions, file reads, modified files | Most explicit partially-lossless structure |
| Neovate Code | Truncation, pruning, then compaction | Protected tools and recent turns | Best layered budget-defense design |
| Open Claude Code | Micro-compaction before full compaction | Recent turns | Simple but honest staged approach |
| DeepSeek Reasonix | History folding plus emergency truncation | Pinned skills, pinned constraints, recent tail | Best token-plus-byte pressure example |
| DeerFlow | Summarization middleware | Configurable keep budget and rescued skill reads | Most framework-shaped answer |
| Hermes | Use carefully | Some structured preservation | First-pass compaction framing was too strong |
| OpenHands | Weak evidence | Context metadata, not a strong compaction pipeline | Important negative control |
| Oh My Pi | Weak evidence | Post-run summaries rather than a flagship compactor | Do not oversell it |
Claude Code and Open Claude Code: compaction as runtime control loop
The Claude Code family is useful because it shows two related but differently mature approaches. The Claude Code snapshot treats compaction as part of the main runtime: automatic thresholds, reactive handling for prompt-too-long errors, and guardrails to stop repeated failures from turning into a tight loop.
Open Claude Code is simpler but still interesting. It first tries a cheap micro-compaction pass that truncates stale tool results. Only after that fails to buy enough space does it perform a heavier full summary compaction. That design is much less sophisticated than Pi Mono or Goose, but it captures an important production lesson: not every overflow deserves an expensive summarization call.
if (micro_compact_is_enough) {
keep going
} else {
summarize older history
preserve recent turns
}
Simple can still be lossy
Open Claude Code is intentionally blunt: old tool results get truncated, then older messages get replaced with a short summary. It is a useful baseline precisely because it exposes what advanced repos are adding on top of that pattern.
Goose, Pi Mono, and Neovate: three different “serious” designs
Goose: agent-visible summary, user-visible history
Goose distinguishes between what the user can still inspect and what the agent sees going forward. That makes its compaction model unusually honest: the repo acknowledges that summarization changes the working context, but it does not have to erase the original messages from the transcript entirely.
Pi Mono: partially lossless structure
Pi Mono tracks read files, modified files, compaction boundaries, and branch summarization. It is still lossy — the original middle conversation is gone — but it keeps more explicit state than the repos that collapse everything into one prose paragraph.
Neovate: layer the defenses
Neovate is strongest as a budget-defense story. It has a truncation layer, a pruning layer, and a compaction layer. Protected tools and recent turns get special treatment, which is exactly what you want when some context is much more recoverable than others.
The shared lesson
These repos all reject the naive “just summarize everything” move. The more mature the implementation, the more it distinguishes recent turns, tool output, preserved files, and continuation state.
Reasonix and DeerFlow solve the long-context problem differently
Reasonix CheetahClaws and DeerFlow are two of the most useful counterpoints in this set because they do not look like classic terminal-only agents.
Reasonix: fold under token and byte pressure
Reasonix does real conversation folding, and it also has local preflight logic for emergency truncation when either estimated tokens or JSON body bytes exceed the safety threshold. It preserves pinned skills and pinned constraints explicitly, which makes it more than a simple summarizer.
DeerFlow: middleware and skill rescue
DeerFlow expresses compaction as middleware, not just a helper function. That lets it rescue recent skill-file reads before summarization and makes compaction feel like one policy inside a larger orchestration framework.
Lossy, partially lossless, and “good enough”
| Style | Repo examples | What survives | Main tradeoff |
|---|---|---|---|
| Naive lossy summary | Open Claude Code | A short prose recap plus recent messages | Easy to build, easier to lose nuance |
| Structured lossy summary | Pi Mono, Neovate, Claude Code snapshot | Named slots like goal, progress, files, decisions | Still lossy, but better recovery ergonomics |
| Partially lossless preservation | Pi Mono, Goose, Reasonix | Recent tail, file lists, visibility metadata, pinned constraints | More code, more bookkeeping, better continuation |
| Prune-first compaction | Neovate, Open Claude Code | Old tool chatter is reduced before summary | Good token savings without full semantic rewriting |
What not to overclaim
The correction list matters here
This page is stronger because of the second pass, not the first. Hermes should not be treated as the flagship compaction story. ADK-Rust is not one of the main exemplars for this topic. OpenHands still looks weak in the current snapshot. Oh My Pi is better discussed for editing/runtime ergonomics than for context folding.
That negative evidence is useful. It keeps the page honest and makes the real leaders stand out. A mature compaction implementation is not a checkbox. It usually needs thresholds, preserved tail logic, fallback behavior, and some explicit theory of which information is still recoverable after rewriting history.
Recommendations for new agent builders
- Prune before you summarize. Cheap wins on stale tool output reduce cost and preserve more recent detail.
- Keep a real recent tail. Most failures happen when compaction rewrites the exact turns the model still needs to act on.
- Use structured summaries if recovery matters. Pi Mono and Neovate show why named slots beat one free-form paragraph.
- Distinguish token limits from byte limits. Reasonix and Open Claude Code both show that transport size and model window are separate failure modes.
- Build retries and breakers into the design. Reactive truncation and circuit breakers are not cleanup. They are part of a production compaction loop.
- Keep compaction separate from prompt caching. Stable prefix design helps prompt caching; compaction rewrites the working transcript. They should cooperate, not blur into one policy blob.
Bottom line
Compaction is where agent products stop being demos. Once a task runs long enough, the harness has to decide what counts as durable context and what can be sacrificed. The best repos do not rely on a single summary call. They layer:
- cheap pruning,
- protected recent context,
- structured or carefully bounded folding, and
- retry logic for when estimates still miss.
If prompt caching is about keeping the prefix stable, context compaction is about surviving when stability is no longer enough. Those are separate design problems — and the best harnesses solve both.