Folding • Pruning • Summary insertion • Retry loops

When the Window Fills Up

Every serious agent eventually has to answer the same question: when the conversation is too big to resend, what survives? The repos in this set answer with very different machinery — micro-compaction, pruning, structured summaries, folding, visibility metadata, and emergency truncation.

(Alright, ad over. Back to the serious technical analysis.)

Compaction is not the same thing as prompt caching

Prompt caching is about reusing a stable prefix so the provider discounts repeated input. Compaction is a different survival mechanism: it changes the conversation itself so the next request fits in the model window. One tries to preserve the prefix exactly. The other decides which parts of history can be folded, truncated, summarized, hidden, or replaced.

🧭

The architectural split

Caching rewards stability. Compaction accepts loss. The best agents do not blur those jobs: they keep the reusable prefix stable, then compact the rest with explicit logic around thresholds, summaries, and tail preservation.

The pattern taxonomy

Micro-compaction

Cheap first pass: shrink or truncate stale tool results before paying for a real summarization step.

Pruning before summary

Remove the least valuable blobs first, especially verbose tool output, then summarize only if that is still not enough.

Structured summary insertion

Replace older history with a summary that preserves explicit slots like goal, progress, decisions, files, or next steps.

Conversation folding

Convert old turns into one synthetic message while keeping a recent tail verbatim, often with special handling for pinned constraints or skills.

Visibility-only compaction

Keep the original messages for user display or audit, but hide them from the agent and replace them with an agent-only continuation summary.

Reactive truncation and circuit breakers

When token estimates fail or the provider rejects the request, better agents retry mechanically and stop hammering if compaction keeps failing.

Comparison matrix

Repo	Main strategy	What it preserves	Why it stands out
Claude Code source snapshot	Auto-compaction + reactive fallback	Recent tail, command surface, runtime control state	Good example of compaction as part of the core loop
Goose	Full-message compaction plus tool-pair summarization	Latest user text and visibility metadata	Best “agent-visible vs user-visible” story
Pi Mono	Structured compaction entries	Goal, progress, decisions, file reads, modified files	Most explicit partially-lossless structure
Neovate Code	Truncation, pruning, then compaction	Protected tools and recent turns	Best layered budget-defense design
Open Claude Code	Micro-compaction before full compaction	Recent turns	Simple but honest staged approach
DeepSeek Reasonix	History folding plus emergency truncation	Pinned skills, pinned constraints, recent tail	Best token-plus-byte pressure example
DeerFlow	Summarization middleware	Configurable keep budget and rescued skill reads	Most framework-shaped answer
Hermes	Use carefully	Some structured preservation	First-pass compaction framing was too strong
OpenHands	Weak evidence	Context metadata, not a strong compaction pipeline	Important negative control
Oh My Pi	Weak evidence	Post-run summaries rather than a flagship compactor	Do not oversell it

Claude Code and Open Claude Code: compaction as runtime control loop

The Claude Code family is useful because it shows two related but differently mature approaches. The Claude Code snapshot treats compaction as part of the main runtime: automatic thresholds, reactive handling for prompt-too-long errors, and guardrails to stop repeated failures from turning into a tight loop.

Open Claude Code is simpler but still interesting. It first tries a cheap micro-compaction pass that truncates stale tool results. Only after that fails to buy enough space does it perform a heavier full summary compaction. That design is much less sophisticated than Pi Mono or Goose, but it captures an important production lesson: not every overflow deserves an expensive summarization call.

if (micro_compact_is_enough) {
  keep going
} else {
  summarize older history
  preserve recent turns
}

⚠️

Simple can still be lossy

Open Claude Code is intentionally blunt: old tool results get truncated, then older messages get replaced with a short summary. It is a useful baseline precisely because it exposes what advanced repos are adding on top of that pattern.

Goose, Pi Mono, and Neovate: three different “serious” designs

Goose: agent-visible summary, user-visible history

Goose distinguishes between what the user can still inspect and what the agent sees going forward. That makes its compaction model unusually honest: the repo acknowledges that summarization changes the working context, but it does not have to erase the original messages from the transcript entirely.

Pi Mono: partially lossless structure

Pi Mono tracks read files, modified files, compaction boundaries, and branch summarization. It is still lossy — the original middle conversation is gone — but it keeps more explicit state than the repos that collapse everything into one prose paragraph.

Neovate: layer the defenses

Neovate is strongest as a budget-defense story. It has a truncation layer, a pruning layer, and a compaction layer. Protected tools and recent turns get special treatment, which is exactly what you want when some context is much more recoverable than others.

The shared lesson

These repos all reject the naive “just summarize everything” move. The more mature the implementation, the more it distinguishes recent turns, tool output, preserved files, and continuation state.

Reasonix and DeerFlow solve the long-context problem differently

Reasonix CheetahClaws and DeerFlow are two of the most useful counterpoints in this set because they do not look like classic terminal-only agents.

Reasonix: fold under token and byte pressure

Reasonix does real conversation folding, and it also has local preflight logic for emergency truncation when either estimated tokens or JSON body bytes exceed the safety threshold. It preserves pinned skills and pinned constraints explicitly, which makes it more than a simple summarizer.

DeerFlow: middleware and skill rescue

DeerFlow expresses compaction as middleware, not just a helper function. That lets it rescue recent skill-file reads before summarization and makes compaction feel like one policy inside a larger orchestration framework.

Lossy, partially lossless, and “good enough”

Style	Repo examples	What survives	Main tradeoff
Naive lossy summary	Open Claude Code	A short prose recap plus recent messages	Easy to build, easier to lose nuance
Structured lossy summary	Pi Mono, Neovate, Claude Code snapshot	Named slots like goal, progress, files, decisions	Still lossy, but better recovery ergonomics
Partially lossless preservation	Pi Mono, Goose, Reasonix	Recent tail, file lists, visibility metadata, pinned constraints	More code, more bookkeeping, better continuation
Prune-first compaction	Neovate, Open Claude Code	Old tool chatter is reduced before summary	Good token savings without full semantic rewriting

What not to overclaim

⚠️

The correction list matters here

This page is stronger because of the second pass, not the first. Hermes should not be treated as the flagship compaction story. ADK-Rust is not one of the main exemplars for this topic. OpenHands still looks weak in the current snapshot. Oh My Pi is better discussed for editing/runtime ergonomics than for context folding.

That negative evidence is useful. It keeps the page honest and makes the real leaders stand out. A mature compaction implementation is not a checkbox. It usually needs thresholds, preserved tail logic, fallback behavior, and some explicit theory of which information is still recoverable after rewriting history.

Recommendations for new agent builders

Prune before you summarize. Cheap wins on stale tool output reduce cost and preserve more recent detail.
Keep a real recent tail. Most failures happen when compaction rewrites the exact turns the model still needs to act on.
Use structured summaries if recovery matters. Pi Mono and Neovate show why named slots beat one free-form paragraph.
Distinguish token limits from byte limits. Reasonix and Open Claude Code both show that transport size and model window are separate failure modes.
Build retries and breakers into the design. Reactive truncation and circuit breakers are not cleanup. They are part of a production compaction loop.
Keep compaction separate from prompt caching. Stable prefix design helps prompt caching; compaction rewrites the working transcript. They should cooperate, not blur into one policy blob.

Bottom line

Compaction is where agent products stop being demos. Once a task runs long enough, the harness has to decide what counts as durable context and what can be sacrificed. The best repos do not rely on a single summary call. They layer:

cheap pruning,
protected recent context,
structured or carefully bounded folding, and
retry logic for when estimates still miss.

If prompt caching is about keeping the prefix stable, context compaction is about surviving when stability is no longer enough. Those are separate design problems — and the best harnesses solve both.