AI Coding Guides Deep Dives
Folding • Pruning • Summary insertion • Retry loops

When the Window Fills Up

Every serious agent eventually has to answer the same question: when the conversation is too big to resend, what survives? The repos in this set answer with very different machinery — micro-compaction, pruning, structured summaries, folding, visibility metadata, and emergency truncation.

(Alright, ad over. Back to the serious technical analysis.)

Compaction is not the same thing as prompt caching

Prompt caching is about reusing a stable prefix so the provider discounts repeated input. Compaction is a different survival mechanism: it changes the conversation itself so the next request fits in the model window. One tries to preserve the prefix exactly. The other decides which parts of history can be folded, truncated, summarized, hidden, or replaced.

🧭

The architectural split

Caching rewards stability. Compaction accepts loss. The best agents do not blur those jobs: they keep the reusable prefix stable, then compact the rest with explicit logic around thresholds, summaries, and tail preservation.

The pattern taxonomy

Micro-compaction

Cheap first pass: shrink or truncate stale tool results before paying for a real summarization step.

Pruning before summary

Remove the least valuable blobs first, especially verbose tool output, then summarize only if that is still not enough.

Structured summary insertion

Replace older history with a summary that preserves explicit slots like goal, progress, decisions, files, or next steps.

Conversation folding

Convert old turns into one synthetic message while keeping a recent tail verbatim, often with special handling for pinned constraints or skills.

Visibility-only compaction

Keep the original messages for user display or audit, but hide them from the agent and replace them with an agent-only continuation summary.

Reactive truncation and circuit breakers

When token estimates fail or the provider rejects the request, better agents retry mechanically and stop hammering if compaction keeps failing.

Comparison matrix

Repo Main strategy What it preserves Why it stands out
Claude Code source snapshot Auto-compaction + reactive fallback Recent tail, command surface, runtime control state Good example of compaction as part of the core loop
Goose Full-message compaction plus tool-pair summarization Latest user text and visibility metadata Best “agent-visible vs user-visible” story
Pi Mono Structured compaction entries Goal, progress, decisions, file reads, modified files Most explicit partially-lossless structure
Neovate Code Truncation, pruning, then compaction Protected tools and recent turns Best layered budget-defense design
Open Claude Code Micro-compaction before full compaction Recent turns Simple but honest staged approach
DeepSeek Reasonix History folding plus emergency truncation Pinned skills, pinned constraints, recent tail Best token-plus-byte pressure example
DeerFlow Summarization middleware Configurable keep budget and rescued skill reads Most framework-shaped answer
Hermes Use carefully Some structured preservation First-pass compaction framing was too strong
OpenHands Weak evidence Context metadata, not a strong compaction pipeline Important negative control
Oh My Pi Weak evidence Post-run summaries rather than a flagship compactor Do not oversell it

Claude Code and Open Claude Code: compaction as runtime control loop

The Claude Code family is useful because it shows two related but differently mature approaches. The Claude Code snapshot treats compaction as part of the main runtime: automatic thresholds, reactive handling for prompt-too-long errors, and guardrails to stop repeated failures from turning into a tight loop.

Open Claude Code is simpler but still interesting. It first tries a cheap micro-compaction pass that truncates stale tool results. Only after that fails to buy enough space does it perform a heavier full summary compaction. That design is much less sophisticated than Pi Mono or Goose, but it captures an important production lesson: not every overflow deserves an expensive summarization call.

if (micro_compact_is_enough) {
  keep going
} else {
  summarize older history
  preserve recent turns
}
⚠️

Simple can still be lossy

Open Claude Code is intentionally blunt: old tool results get truncated, then older messages get replaced with a short summary. It is a useful baseline precisely because it exposes what advanced repos are adding on top of that pattern.

Goose, Pi Mono, and Neovate: three different “serious” designs

Goose: agent-visible summary, user-visible history

Goose distinguishes between what the user can still inspect and what the agent sees going forward. That makes its compaction model unusually honest: the repo acknowledges that summarization changes the working context, but it does not have to erase the original messages from the transcript entirely.

Pi Mono: partially lossless structure

Pi Mono tracks read files, modified files, compaction boundaries, and branch summarization. It is still lossy — the original middle conversation is gone — but it keeps more explicit state than the repos that collapse everything into one prose paragraph.

Neovate: layer the defenses

Neovate is strongest as a budget-defense story. It has a truncation layer, a pruning layer, and a compaction layer. Protected tools and recent turns get special treatment, which is exactly what you want when some context is much more recoverable than others.

The shared lesson

These repos all reject the naive “just summarize everything” move. The more mature the implementation, the more it distinguishes recent turns, tool output, preserved files, and continuation state.

Reasonix and DeerFlow solve the long-context problem differently

Reasonix CheetahClaws and DeerFlow are two of the most useful counterpoints in this set because they do not look like classic terminal-only agents.

Reasonix: fold under token and byte pressure

Reasonix does real conversation folding, and it also has local preflight logic for emergency truncation when either estimated tokens or JSON body bytes exceed the safety threshold. It preserves pinned skills and pinned constraints explicitly, which makes it more than a simple summarizer.

DeerFlow: middleware and skill rescue

DeerFlow expresses compaction as middleware, not just a helper function. That lets it rescue recent skill-file reads before summarization and makes compaction feel like one policy inside a larger orchestration framework.

Lossy, partially lossless, and “good enough”

Style Repo examples What survives Main tradeoff
Naive lossy summary Open Claude Code A short prose recap plus recent messages Easy to build, easier to lose nuance
Structured lossy summary Pi Mono, Neovate, Claude Code snapshot Named slots like goal, progress, files, decisions Still lossy, but better recovery ergonomics
Partially lossless preservation Pi Mono, Goose, Reasonix Recent tail, file lists, visibility metadata, pinned constraints More code, more bookkeeping, better continuation
Prune-first compaction Neovate, Open Claude Code Old tool chatter is reduced before summary Good token savings without full semantic rewriting

What not to overclaim

⚠️

The correction list matters here

This page is stronger because of the second pass, not the first. Hermes should not be treated as the flagship compaction story. ADK-Rust is not one of the main exemplars for this topic. OpenHands still looks weak in the current snapshot. Oh My Pi is better discussed for editing/runtime ergonomics than for context folding.

That negative evidence is useful. It keeps the page honest and makes the real leaders stand out. A mature compaction implementation is not a checkbox. It usually needs thresholds, preserved tail logic, fallback behavior, and some explicit theory of which information is still recoverable after rewriting history.

Recommendations for new agent builders

  1. Prune before you summarize. Cheap wins on stale tool output reduce cost and preserve more recent detail.
  2. Keep a real recent tail. Most failures happen when compaction rewrites the exact turns the model still needs to act on.
  3. Use structured summaries if recovery matters. Pi Mono and Neovate show why named slots beat one free-form paragraph.
  4. Distinguish token limits from byte limits. Reasonix and Open Claude Code both show that transport size and model window are separate failure modes.
  5. Build retries and breakers into the design. Reactive truncation and circuit breakers are not cleanup. They are part of a production compaction loop.
  6. Keep compaction separate from prompt caching. Stable prefix design helps prompt caching; compaction rewrites the working transcript. They should cooperate, not blur into one policy blob.

Bottom line

Compaction is where agent products stop being demos. Once a task runs long enough, the harness has to decide what counts as durable context and what can be sacrificed. The best repos do not rely on a single summary call. They layer:

  1. cheap pruning,
  2. protected recent context,
  3. structured or carefully bounded folding, and
  4. retry logic for when estimates still miss.

If prompt caching is about keeping the prefix stable, context compaction is about surviving when stability is no longer enough. Those are separate design problems — and the best harnesses solve both.