Reasonix: The Cache-First Coding Agent
Reasonix is a DeepSeek-native AI coding agent for the terminal, engineered around DeepSeek's automatic prefix-cache mechanic so token costs stay low across long sessions. Unlike multi-provider agents that treat the backend as a configurable detail, Reasonix couples every abstraction โ its context partitioning, tool-call repair, and cost control โ to the byte-stable prefix-cache behavior of DeepSeek's API. It is opinionated by design: every layer is justified by a DeepSeek-specific behavior or economic property.
Why Reasonix is different
Most coding agents treat the LLM backend as a configurable detail โ swap the provider, keep the agent. Reasonix does the opposite: it treats the DeepSeek backend as the design constraint around which everything else is organized. The product north star is simple: a coding agent that stays cheap enough to leave on. A tool that quietly burns $200/month on a background project is one nobody uses, so every subsystem is answerable to that goal.
The cache-first invariant
DeepSeek bills cached input at approximately 10% of the miss rate. Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn โ yielding cache hit rates below 20% in practice. Reasonix's entire architecture is designed to keep that prefix stable, so a real user running a single day's workload achieved 99.82% cache hit rate on 435M input tokens, paying ~$12 instead of the ~$61 the same workload would cost with no cache.
Reasonix organizes around three pillars: a cache-first loop that partitions context into immutable, append-only, and volatile regions; a tool-call repair pipeline that handles DeepSeek's specific failure modes; and aggressive cost control mechanisms that default to cheap models and only escalate when necessary.
Pillar 1: Cache-First Loop
Context partitioned into IMMUTABLE PREFIX (system + tools + few-shots), APPEND-ONLY LOG (grows monotonically), and VOLATILE SCRATCH (reset each turn). Prefix is computed once, hashed, and pinned. Log entries are serialized in append order with no rewrites.
Pillar 2: Tool-Call Repair
Four-pass pipeline โ flatten, scavenge, truncation repair, and storm detection โ that handles DeepSeek-specific failure modes like JSON emitted inside thinking tags, dropped arguments on complex schemas, and repeated identical calls.
Pillar 3: Cost Control
Flash-first tiered defaults, turn-end auto-compaction, single-turn /pro
arming, and failure-signal auto-escalation. Every cost increase is user-visible;
silent escalation was considered and rejected.
Pillar 1: The Cache-First Loop
The core of Reasonix is the CacheFirstLoop class in src/loop.ts
(~1,200 lines). It implements a reagent loop that keeps DeepSeek's prefix cache hot by
enforcing strict invariants on how context is constructed each turn.
The system prompt, tool specifications, and few-shot examples are assembled once
per session into an ImmutablePrefix object. A SHA-256 fingerprint is
computed and cached. Any change to the prefix (adding a tool, replacing the system
prompt) invalidates the fingerprint and forces a cache miss on the next turn โ so
tool registration is minimized and system prompt changes are rare.
Every assistant message, tool call, and tool result is appended to the log in strict order. No rewrites, no reorderings. The log preserves the exact byte prefix of prior turns, so DeepSeek's cache can match on subsequent requests. The log is persisted to a JSONL file per session for replay and diff.
Transient state โ R1 thinking, temporary plan variables โ lives in a
VolatileScratch that is reset each turn. Before information from
scratch is folded into the log, it is distilled to remove ephemeral noise. This
prevents the log from accumulating junk that would bloat future prompts without
the cache prefix changing.
Each tool declares parallelSafe?: boolean (default false).
The loop dispatcher groups consecutive parallel-safe calls into chunks and races
them via Promise.allSettled. The first non-parallel-safe call ends
the chunk (serial barrier โ read-after-write order preserved). Read-only tools
(filesystem reads, web search, recall, semantic search) opt in automatically.
MCP tools default to unsafe unless the server explicitly declares safety.
The cache hit metric โ prompt_cache_hit_tokens / (hit + miss) โ is exposed
per-turn and aggregated per-session, visible in the TUI's top-bar cache cell. The
ImmutablePrefix class enforces this by refusing silent mutations: any
change to the system string or tool list clears the fingerprint, making cache drift
impossible to hide.
| Env var | Default | Effect |
|---|---|---|
REASONIX_PARALLEL_MAX |
3 (hard cap 16) | Maximum chunk size for parallel tool dispatch. |
REASONIX_TOOL_DISPATCH=serial |
unset | Forces serial dispatch โ escape hatch for debugging. |
Pillar 2: Tool-Call Repair
DeepSeek's thinking-mode models exhibit specific failure patterns that most generic
agent frameworks don't handle. The ToolCallRepair class in
src/repair/ runs a four-pass pipeline on every assistant response before
dispatching tools.
1. Flatten
Schemas with more than 10 leaf parameters or nesting depth greater than 2 are
auto-detected at ToolRegistry.register() time and presented to the
model in dot-notation form (e.g. file.path instead of nested
{ file: { path: ... } }). At dispatch, nestArguments()
re-nests the flattened args back into the original schema before calling the tool's
implementation function. This reduces argument drop errors on complex tool schemas.
2. Scavenge
A regex + JSON parser sweeps the model's reasoning_content
(the thinking tags) for any tool call that the model generated but forgot to emit
in the final tool_calls array. This is a common DeepSeek failure mode:
the model constructs a valid tool call inside its reasoning, then omits it from the
structured output. Scavenging recovers these lost calls.
3. Truncation Repair
When max_tokens cuts off mid-JSON structure, the repair pipeline
detects unbalanced braces and attempts to close them. If the partial structure is
ambiguous, it requests a continuation completion rather than guessing. This prevents
silent tool-call corruption on long outputs.
4. Storm Detection
A sliding window tracks (tool, args) signatures. When the same
tuple appears repeatedly, the StormBreaker suppresses the duplicate
call and injects a reflection turn, telling the model to reconsider its approach.
This prevents infinite loops where the model calls the same tool with the same
arguments over and over.
Pass order matters
The repair pipeline runs in order: scavenge โ truncation โ storm. Schema flatten runs at loop construction time, not per-turn. This ordering ensures that scavenged calls are repaired before storm detection evaluates them, and truncation-fixed calls are complete before they're checked for duplication.
Pillar 3: Cost Control
Coding agents that default to the frontier model and accumulate full tool results in context cost $150โ$250/month for active users. Reasonix v0.6 introduces four complementary mechanisms, none of which require manual tuning in the common case.
| Preset | Model | Effort | Relative Cost |
|---|---|---|---|
flash |
deepseek-v4-flash |
max |
1ร |
auto (default) |
v4-flash โ v4-pro on hard turns |
max |
1โ3ร |
pro |
deepseek-v4-pro |
max |
~12ร |
All auxiliary calls โ force summaries after iteration limits, subagent spawns,
truncation repair retries โ hard-code v4-flash + effort=high regardless
of the user's preset. There's no reason to pay pro rates for "paraphrase these tool
results into prose" or for an explore subagent's grep chain.
Turn-End Auto-Compaction
Every tool result exceeding TURN_END_RESULT_CAP_TOKENS (3,000 tokens)
is shrunk to that cap when a turn ends. The model had the full text for the turn
that read it; subsequent turns see a compact summary and can re-read if needed.
One extra read_file call is vastly cheaper than dragging 12KB through
every future prompt. A proactive 40% context-ratio threshold runs the same shrink
pre-emptively inside long multi-iter turns before the 80% emergency threshold fires.
/pro Single-Turn Arming
Users who predict a hard task type /pro; the next
turn runs on deepseek-v4-pro, then auto-disarms. No preset churn,
no forgotten revert. The armed state is visible as a yellow โง pro armed
pill in the TUI header. This is a one-shot escalation โ it doesn't persist beyond
the turn it was armed for.
Failure-Signal Auto-Escalation
The loop counts visible "flash is struggling" events per turn: SEARCH-not-found
errors on edit_file/write_file, and ToolCallRepair fires
(scavenge, truncation-fix, storm-break). Once the count hits 3, the
remainder of the current turn runs on v4-pro.
Announced via a yellow warning row โ no silent cost surprises. The counter and
escalation flag reset at every turn start.
Cost Transparency
Per-turn and session cost are colored in the StatsPanel:
turn $0.003 โ green under $0.05, yellow $0.05โ0.20, red at $0.20+.
Session cost uses the same scale ร10. The pricing table is built-in with override
support via config. DeepSeek v4-flash: $0.14/M cached input, $0.28/M output.
v4-pro: $0.435/M cached input, $0.87/M output.
No silent escalation
Every pro-tier model call is surfaced to the user. Automatic escalation was
considered and explicitly rejected โ the team decided that silent cost surprises
are worse than transparent, announced escalations. The red โง pro escalated
pill in the header makes it impossible to miss when the loop has escalated.
The Tool Suite
Reasonix provides a comprehensive set of built-in tools, each registered via
ToolRegistry and dispatched through the cache-first loop.
| Tool category | Tools | Notable |
|---|---|---|
| Filesystem | read_file, write_file, edit_file,
multi_edit, list_directory, directory_tree,
search_files, search_content, glob,
get_file_info |
SEARCH/REPLACE blocks with byte-for-byte match enforcement. Empty SEARCH = create new file. Outline mode for files over 512 KiB. Binary file protection (NUL-byte sniff + extension blocklist). Strict path sandboxing outside rootDir. |
| Shell | run_command, run_background |
Allowlist-based: read-only / test / lint / typecheck commands run immediately;
mutating commands require user confirmation. Supports chain ops (|,
&&, ||, ;) and file redirects.
No real shell โ argv is tokenized natively for cross-platform parity. Background
jobs tracked via JobRegistry. |
| Memory | remember, forget, recall_memory |
User-private knowledge stored in ~/.reasonix/memory/.
Project-scoped in code mode, global-only in chat mode.
Types: user, feedback, project, reference. Loaded into the prefix at session
start. |
| Skills | run_skill, list_skills |
SKILL.md Markdown playbooks with frontmatter. inline mode inlines
body into context; subagent mode spawns isolated child loop.
Claude-format skills in .claude/skills/ also load. |
| Subagents | spawn_subagent |
Isolated child loops inheriting parent registry minus
spawn_subagent + submit_plan. Defaults to
v4-flash + high effort. No hooks in child loops. Up to 3
concurrent children. |
| Web | web_search, web_fetch |
Multi-engine: Mojeek (default), SearXNG, Metaso, or Tavily. Switchable via
/search-engine. Web fetch retrieves page content. |
| MCP | MCP bridge (stdio + SSE + streamable HTTP) | Configured in ~/.reasonix/config.json or via --mcp.
All transports implement McpTransport. Hot bridge/unbridge
supported. MCP tools default to non-parallel-safe. |
| Other | submit_plan (review gate), semantic_search,
todo_write |
Plan mode with /apply review gate. Semantic search backed by local Ollama or OpenAI-compatible embeddings. In-session task tracking. |
Session Persistence and Observability
Reasonix persists every session to JSONL files, enabling transcript replay, diff, and a web dashboard for cross-session statistics.
JSONL Session Persistence
Each session is written to a .jsonl file under the project's
.reasonix/sessions/ directory. Messages are appended in the same
order as the AppendOnlyLog, so the on-disk format matches the
in-memory representation exactly. Sessions can be loaded and resumed with
/session resume, including healing of tool-call pairings and
missing reasoning content stamps.
Transcript Replay and Diff
The reasonix replay and reasonix diff CLI commands
read session JSONL files and render them back as a turn-by-turn transcript or
a unified diff of filesystem changes. The transcript log in src/transcript/
supports write, diff, and replay operations โ useful for post-mortem analysis
of what the agent did and why.
Event Log Kernel
The core event system in src/core/ defines an Event union
type with pure reducers.ts projections and eventize.ts
normalization. Every loop iteration emits events that are reduced into a consistent
state model, separate from the message log. This enables the dashboard to query
structured session data without parsing raw messages.
Web Dashboard
A compiled SPA in dashboard/ exposes a REST API from
src/server/ with cross-session statistics: total cost, cache hit rates,
model usage distribution, and tool-call frequency. Usage records roll up into
~/.reasonix/usage.jsonl for long-term tracking. The reasonix stats
CLI command surfaces the same data in the terminal.
Skills and Memory
Reasonix extends the agent's capabilities through Markdown-based skills and a multi-layered memory system, both designed to load into the prefix cache without breaking it.
SKILL.md Playbooks
Skills are Markdown files with YAML frontmatter (description:, optional
runAs: subagent, allowed-tools:). Created via
/skill new my-skill (project-scoped) or /skill new my-skill --global
(in ~/.reasonix/skills/). When invoked, the skill body is either inlined
into the context or executed by a spawned subagent. Skills can restrict which tools
the subagent has access to via frontmatter.
Claude-format skills in <project>/.claude/skills/<name>/SKILL.md
and ~/.claude/skills/ are read alongside Reasonix's native paths, so
tooling that emits Claude-format skills works out of the box.
Memory Stores
Three memory layers, all loaded into the prefix at session start:
- Project memory โ
REASONIX.mdin the project root, containing conventions, architecture notes, and project-specific knowledge. - User memory โ
~/.reasonix/memory/with types (user, feedback, project, reference). Persistent across all sessions. - Session memory โ distilled from VolatileScratch into the AppendOnlyLog, then optionally compacted at turn end.
Mid-session writes to user memory are durable (written to disk immediately) but do not update the current session's loaded prefix. The snapshot is refreshed on the next session start to avoid breaking the cache.
REASONIX.md as project convention
The REASONIX.md file in the project root is loaded as part of the
immutable prefix at session start. It's the recommended place to document project
conventions, architecture decisions, and coding patterns so the agent understands
the codebase context from turn one. Edits to REASONIX.md take effect
on the next /new command via system prompt rebuild.
How Reasonix compares
| Capability | Reasonix | Claude Code | Aider | Goose | Codex |
|---|---|---|---|---|---|
| Backend | DeepSeek only | Anthropic | Any (OpenRouter) | Multi-model | OpenAI |
| License | MIT | Closed | Apache 2 | Apache 2 | Closed |
| Cost profile | Low per task | Premium | Varies | Varies | Subscription |
| DeepSeek prefix-cache | Engineered | Not applicable | Incidental | Incidental | Not applicable |
| Tool-call repair | 4-pass pipeline | No | No | No | No |
| Auto cost escalation | Transparent, user-visible | No | No | No | No |
| Embedded web dashboard | Yes | No | No | No | No |
| Configurable web search | Mojeek / SearXNG / Metaso / Tavily | Yes | Yes | Yes | Bing |
| Persistent per-workspace sessions | Yes โ JSONL, replay/diff | Partial | No | Partial | Partial |
| Plan mode ยท MCP ยท hooks ยท skills | Yes | Yes | Partial | Yes | Yes |
| Parallel tool dispatch | Yes โ safety-aware chunks | Yes | No | Yes | Yes |
| Open community development | Yes | No | Yes | Yes | No |
Module layout at a glance
| Module | File(s) | Role |
|---|---|---|
| Cache-first loop | src/loop.ts (~1,200 lines) |
Pillar 1 + 3 โ main reagent loop, context assembly, escalation |
| Tool-call repair | src/repair/ (flatten, scavenge, truncation, storm) |
Pillar 2 โ four-pass repair pipeline |
| Memory runtime | src/memory/runtime.ts |
ImmutablePrefix, AppendOnlyLog, VolatileScratch classes |
| Session management | src/memory/session.ts |
JSONL load/rewrite/archive, message healing |
| Telemetry | src/telemetry/stats.ts |
Cost accounting, cache-hit metrics, pricing tables |
| Filesystem tools | src/tools/filesystem.ts (~840 lines) |
READ/WRITE/SEARCH/REPLACE with sandbox, outline mode, binary protection |
| Shell tools | src/tools/shell.ts |
Gated command execution, background jobs, allowlist parsing |
| Subagent | src/tools/subagent.ts |
Isolated child loop spawning, tool restriction, event sink |
| MCP bridge | src/mcp/ |
Stdio + SSE + streamable HTTP transports, registry, spec |
| TUI | src/cli/ui/App.tsx + hooks/components |
Ink 5 React app โ StatsPanel, EventLog, PlanConfirm, slash commands |
| Dashboard | src/server/ + dashboard/ |
REST API + compiled SPA for cross-session stats |
| Event kernel | src/core/ |
Event union, pure reducers, eventize normalization |
Design evolution
Reasonix has evolved through several major versions, each adding a pillar or refining the architecture:
Pillar 1 (cache-first loop) end-to-end, repair pipeline complete, Ink TUI scaffold.
ฯ-bench numbers published, streaming polish, transcript replay.
MCP client (stdio + SSE), session persistence.
reasonix code with SEARCH/REPLACE edits, review/auto
gate, background jobs, hooks.
V4 model support, skills, memory, subagents, actionable error messages.
Pillar 3 (cost control): flash-first defaults, auto-compaction,
/pro one-shot arming, failure-triggered escalation, cost badges.
UI refactor: App.tsx split into hooks/components, slash commands split into
per-topic modules.
branch + harvest features removed entirely
โ rarely paid for themselves and bloated the slash surface.
Installation and usage
Requires Node โฅ 22. Works on macOS, Linux, and Windows (PowerShell, Git Bash, Windows Terminal).
npm install -g reasonix
reasonix code my-project # paste DeepSeek API key on first run; persists after
Or run once without installing:
cd my-project
npx reasonix code # always uses the latest package
| Command | When |
|---|---|
reasonix / reasonix code [dir] |
The coding agent. Start here. |
reasonix chat |
Plain chat โ no filesystem or shell tools. |
reasonix run "task" |
One-shot, streams to stdout. Good for pipes. |
reasonix doctor |
Health check: Node, API key, MCP wiring. |
reasonix update |
Upgrade Reasonix itself. |
Code vs chat mode
code is the default and the only mode with filesystem/shell tools
and SEARCH/REPLACE review. chat is the lighter, tools-off shell โ
reach for it when you want a thinking partner with MCP attached but no disk access.
Memory in code mode is project-scoped + global; in chat
mode it's global only.
Explicit non-goals
Reasonix is opinionated. Some things it deliberately doesn't do โ listed here so you can pick the right tool for your work.
DeepSeek-only on purpose
Coupling to one backend is the feature, not a limitation. Every abstraction is
justified by a DeepSeek-specific behavior. An OpenAI-compatible shim would work
via --model override, but is not tested or supported.
Terminal-first, no IDE integration
The diff lives in git diff, the file tree in ls. The
dashboard is a companion, not a Cursor replacement. The desktop client (Tauri,
prerelease) bundles the same loop with a GUI, but the CLI remains canonical.
No RAG / vector retrieval as primary
Semantic search exists but is secondary to grep and explicit search. The vector
index is local (Ollama or OpenAI-compatible endpoint) and opt-in via
reasonix index.
No multi-agent orchestration
Subagents are a cost-reduction mechanism, not a coordination primitive. Max depth is 1 (no recursive delegation). Children cannot delegate, interact with the user, or write to shared memory.
The tradeoff: DeepSeek-only limitation
The most honest assessment: Reasonix's greatest strength is also its biggest limitation. By coupling entirely to DeepSeek, it can't serve users who need Anthropic, OpenAI, or local models. If your work requires multi-provider flexibility, Aider or Claude Code are better choices. If your priority is keeping costs low on DeepSeek across long sessions, nothing else in this repo set comes close. The cache-first architecture โ prefix partitioning, append-only logging, volatile scratch โ would need to be rethought from scratch for any other backend.
Architecture summary
Reasonix demonstrates that tight coupling to a single LLM provider can produce a more efficient agent than generic frameworks. By treating DeepSeek's prefix-cache mechanic as a design constraint rather than an implementation detail, it achieves cache hit rates that make long coding sessions economically viable. The tool-call repair pipeline handles DeepSeek-specific failure modes that generic agents silently lose. And the cost control mechanisms ensure that users are never surprised by a bill.
The project is MIT-licensed, actively developed, and maintains a bilingual (English/Chinese) community on Discord. The codebase is organized into small modules โ the largest UI component is ~2,000 lines, every slash command handler is under 200 lines โ making it accessible for contributors to add tools, repair passes, or TUI improvements without navigating a monolith.
Further reading
GitHub: esengine/DeepSeek-Reasonix
Website: esengine.github.io/DeepSeek-Reasonix
Architecture: docs/ARCHITECTURE.md
npm: npm install -g reasonix
Discord: discord.gg/XF78rEME2D