DeepSeek-native • Cache-first • Cost-controlled • CLI

Reasonix: The Cache-First Coding Agent

Reasonix is a DeepSeek-native AI coding agent for the terminal, engineered around DeepSeek's automatic prefix-cache mechanic so token costs stay low across long sessions. Unlike multi-provider agents that treat the backend as a configurable detail, Reasonix couples every abstraction — its context partitioning, tool-call repair, and cost control — to the byte-stable prefix-cache behavior of DeepSeek's API. It is opinionated by design: every layer is justified by a DeepSeek-specific behavior or economic property.

(Alright, ad over. Back to the serious technical analysis.)

Why Reasonix is different

Most coding agents treat the LLM backend as a configurable detail — swap the provider, keep the agent. Reasonix does the opposite: it treats the DeepSeek backend as the design constraint around which everything else is organized. The product north star is simple: a coding agent that stays cheap enough to leave on. A tool that quietly burns $200/month on a background project is one nobody uses, so every subsystem is answerable to that goal.

💰

The cache-first invariant

DeepSeek bills cached input at approximately 10% of the miss rate. Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — yielding cache hit rates below 20% in practice. Reasonix's entire architecture is designed to keep that prefix stable, so a real user running a single day's workload achieved 99.82% cache hit rate on 435M input tokens, paying ~$12 instead of the ~$61 the same workload would cost with no cache.

Reasonix organizes around three pillars: a cache-first loop that partitions context into immutable, append-only, and volatile regions; a tool-call repair pipeline that handles DeepSeek's specific failure modes; and aggressive cost control mechanisms that default to cheap models and only escalate when necessary.

Pillar 1: Cache-First Loop

Context partitioned into IMMUTABLE PREFIX (system + tools + few-shots), APPEND-ONLY LOG (grows monotonically), and VOLATILE SCRATCH (reset each turn). Prefix is computed once, hashed, and pinned. Log entries are serialized in append order with no rewrites.

Pillar 2: Tool-Call Repair

Four-pass pipeline — flatten, scavenge, truncation repair, and storm detection — that handles DeepSeek-specific failure modes like JSON emitted inside thinking tags, dropped arguments on complex schemas, and repeated identical calls.

Pillar 3: Cost Control

Flash-first tiered defaults, turn-end auto-compaction, single-turn /pro arming, and failure-signal auto-escalation. Every cost increase is user-visible; silent escalation was considered and rejected.

Pillar 1: The Cache-First Loop

The core of Reasonix is the CacheFirstLoop class in src/loop.ts (~1,200 lines). It implements a reagent loop that keeps DeepSeek's prefix cache hot by enforcing strict invariants on how context is constructed each turn.

Immutable Prefix

The system prompt, tool specifications, and few-shot examples are assembled once per session into an ImmutablePrefix object. A SHA-256 fingerprint is computed and cached. Any change to the prefix (adding a tool, replacing the system prompt) invalidates the fingerprint and forces a cache miss on the next turn — so tool registration is minimized and system prompt changes are rare.

↓

Append-Only Log

Every assistant message, tool call, and tool result is appended to the log in strict order. No rewrites, no reorderings. The log preserves the exact byte prefix of prior turns, so DeepSeek's cache can match on subsequent requests. The log is persisted to a JSONL file per session for replay and diff.

↓

Volatile Scratch

Transient state — R1 thinking, temporary plan variables — lives in a VolatileScratch that is reset each turn. Before information from scratch is folded into the log, it is distilled to remove ephemeral noise. This prevents the log from accumulating junk that would bloat future prompts without the cache prefix changing.

↓

Parallel Tool Dispatch

Each tool declares parallelSafe?: boolean (default false). The loop dispatcher groups consecutive parallel-safe calls into chunks and races them via Promise.allSettled. The first non-parallel-safe call ends the chunk (serial barrier — read-after-write order preserved). Read-only tools (filesystem reads, web search, recall, semantic search) opt in automatically. MCP tools default to unsafe unless the server explicitly declares safety.

The cache hit metric — prompt_cache_hit_tokens / (hit + miss) — is exposed per-turn and aggregated per-session, visible in the TUI's top-bar cache cell. The ImmutablePrefix class enforces this by refusing silent mutations: any change to the system string or tool list clears the fingerprint, making cache drift impossible to hide.

Env var	Default	Effect
`REASONIX_PARALLEL_MAX`	3 (hard cap 16)	Maximum chunk size for parallel tool dispatch.
`REASONIX_TOOL_DISPATCH=serial`	unset	Forces serial dispatch — escape hatch for debugging.

Pillar 2: Tool-Call Repair

DeepSeek's thinking-mode models exhibit specific failure patterns that most generic agent frameworks don't handle. The ToolCallRepair class in src/repair/ runs a four-pass pipeline on every assistant response before dispatching tools.

1. Flatten

Schemas with more than 10 leaf parameters or nesting depth greater than 2 are auto-detected at ToolRegistry.register() time and presented to the model in dot-notation form (e.g. file.path instead of nested { file: { path: ... } }). At dispatch, nestArguments() re-nests the flattened args back into the original schema before calling the tool's implementation function. This reduces argument drop errors on complex tool schemas.

2. Scavenge

A regex + JSON parser sweeps the model's reasoning_content (the thinking tags) for any tool call that the model generated but forgot to emit in the final tool_calls array. This is a common DeepSeek failure mode: the model constructs a valid tool call inside its reasoning, then omits it from the structured output. Scavenging recovers these lost calls.

3. Truncation Repair

When max_tokens cuts off mid-JSON structure, the repair pipeline detects unbalanced braces and attempts to close them. If the partial structure is ambiguous, it requests a continuation completion rather than guessing. This prevents silent tool-call corruption on long outputs.

4. Storm Detection

A sliding window tracks (tool, args) signatures. When the same tuple appears repeatedly, the StormBreaker suppresses the duplicate call and injects a reflection turn, telling the model to reconsider its approach. This prevents infinite loops where the model calls the same tool with the same arguments over and over.

🔧

Pass order matters

The repair pipeline runs in order: scavenge → truncation → storm. Schema flatten runs at loop construction time, not per-turn. This ordering ensures that scavenged calls are repaired before storm detection evaluates them, and truncation-fixed calls are complete before they're checked for duplication.

Pillar 3: Cost Control

Coding agents that default to the frontier model and accumulate full tool results in context cost $150–$250/month for active users. Reasonix v0.6 introduces four complementary mechanisms, none of which require manual tuning in the common case.

Preset	Model	Effort	Relative Cost
`flash`	`deepseek-v4-flash`	`max`	1×
`auto` (default)	`v4-flash` → `v4-pro` on hard turns	`max`	1–3×
`pro`	`deepseek-v4-pro`	`max`	~12×

All auxiliary calls — force summaries after iteration limits, subagent spawns, truncation repair retries — hard-code v4-flash + effort=high regardless of the user's preset. There's no reason to pay pro rates for "paraphrase these tool results into prose" or for an explore subagent's grep chain.

Turn-End Auto-Compaction

Every tool result exceeding TURN_END_RESULT_CAP_TOKENS (3,000 tokens) is shrunk to that cap when a turn ends. The model had the full text for the turn that read it; subsequent turns see a compact summary and can re-read if needed. One extra read_file call is vastly cheaper than dragging 12KB through every future prompt. A proactive 40% context-ratio threshold runs the same shrink pre-emptively inside long multi-iter turns before the 80% emergency threshold fires.

/pro Single-Turn Arming

Users who predict a hard task type /pro; the next turn runs on deepseek-v4-pro, then auto-disarms. No preset churn, no forgotten revert. The armed state is visible as a yellow ⇧ pro armed pill in the TUI header. This is a one-shot escalation — it doesn't persist beyond the turn it was armed for.

Failure-Signal Auto-Escalation

The loop counts visible "flash is struggling" events per turn: SEARCH-not-found errors on edit_file/write_file, and ToolCallRepair fires (scavenge, truncation-fix, storm-break). Once the count hits 3, the remainder of the current turn runs on v4-pro. Announced via a yellow warning row — no silent cost surprises. The counter and escalation flag reset at every turn start.

Cost Transparency

Per-turn and session cost are colored in the StatsPanel: turn $0.003 — green under $0.05, yellow $0.05–0.20, red at $0.20+. Session cost uses the same scale ×10. The pricing table is built-in with override support via config. DeepSeek v4-flash: $0.14/M cached input, $0.28/M output. v4-pro: $0.435/M cached input, $0.87/M output.

⚠️

No silent escalation

Every pro-tier model call is surfaced to the user. Automatic escalation was considered and explicitly rejected — the team decided that silent cost surprises are worse than transparent, announced escalations. The red ⇧ pro escalated pill in the header makes it impossible to miss when the loop has escalated.

The Tool Suite

Reasonix provides a comprehensive set of built-in tools, each registered via ToolRegistry and dispatched through the cache-first loop.

Tool category	Tools	Notable
Filesystem	`read_file`, `write_file`, `edit_file`, `multi_edit`, `list_directory`, `directory_tree`, `search_files`, `search_content`, `glob`, `get_file_info`	SEARCH/REPLACE blocks with byte-for-byte match enforcement. Empty SEARCH = create new file. Outline mode for files over 512 KiB. Binary file protection (NUL-byte sniff + extension blocklist). Strict path sandboxing outside rootDir.
Shell	`run_command`, `run_background`	Allowlist-based: read-only / test / lint / typecheck commands run immediately; mutating commands require user confirmation. Supports chain ops (`\|`, `&&`, `\|\|`, `;`) and file redirects. No real shell — argv is tokenized natively for cross-platform parity. Background jobs tracked via `JobRegistry`.
Memory	`remember`, `forget`, `recall_memory`	User-private knowledge stored in `~/.reasonix/memory/`. Project-scoped in `code` mode, global-only in `chat` mode. Types: user, feedback, project, reference. Loaded into the prefix at session start.
Skills	`run_skill`, `list_skills`	SKILL.md Markdown playbooks with frontmatter. `inline` mode inlines body into context; `subagent` mode spawns isolated child loop. Claude-format skills in `.claude/skills/` also load.
Subagents	`spawn_subagent`	Isolated child loops inheriting parent registry minus `spawn_subagent` + `submit_plan`. Defaults to `v4-flash + high` effort. No hooks in child loops. Up to 3 concurrent children.
Web	`web_search`, `web_fetch`	Multi-engine: Mojeek (default), SearXNG, Metaso, or Tavily. Switchable via `/search-engine`. Web fetch retrieves page content.
MCP	MCP bridge (stdio + SSE + streamable HTTP)	Configured in `~/.reasonix/config.json` or via `--mcp`. All transports implement `McpTransport`. Hot bridge/unbridge supported. MCP tools default to non-parallel-safe.
Other	`submit_plan` (review gate), `semantic_search`, `todo_write`	Plan mode with /apply review gate. Semantic search backed by local Ollama or OpenAI-compatible embeddings. In-session task tracking.

Session Persistence and Observability

Reasonix persists every session to JSONL files, enabling transcript replay, diff, and a web dashboard for cross-session statistics.

JSONL Session Persistence

Each session is written to a .jsonl file under the project's .reasonix/sessions/ directory. Messages are appended in the same order as the AppendOnlyLog, so the on-disk format matches the in-memory representation exactly. Sessions can be loaded and resumed with /session resume, including healing of tool-call pairings and missing reasoning content stamps.

Transcript Replay and Diff

The reasonix replay and reasonix diff CLI commands read session JSONL files and render them back as a turn-by-turn transcript or a unified diff of filesystem changes. The transcript log in src/transcript/ supports write, diff, and replay operations — useful for post-mortem analysis of what the agent did and why.

Event Log Kernel

The core event system in src/core/ defines an Event union type with pure reducers.ts projections and eventize.ts normalization. Every loop iteration emits events that are reduced into a consistent state model, separate from the message log. This enables the dashboard to query structured session data without parsing raw messages.

Web Dashboard

A compiled SPA in dashboard/ exposes a REST API from src/server/ with cross-session statistics: total cost, cache hit rates, model usage distribution, and tool-call frequency. Usage records roll up into ~/.reasonix/usage.jsonl for long-term tracking. The reasonix stats CLI command surfaces the same data in the terminal.

Skills and Memory

Reasonix extends the agent's capabilities through Markdown-based skills and a multi-layered memory system, both designed to load into the prefix cache without breaking it.

SKILL.md Playbooks

Skills are Markdown files with YAML frontmatter (description:, optional runAs: subagent, allowed-tools:). Created via /skill new my-skill (project-scoped) or /skill new my-skill --global (in ~/.reasonix/skills/). When invoked, the skill body is either inlined into the context or executed by a spawned subagent. Skills can restrict which tools the subagent has access to via frontmatter.

Claude-format skills in <project>/.claude/skills/<name>/SKILL.md and ~/.claude/skills/ are read alongside Reasonix's native paths, so tooling that emits Claude-format skills works out of the box.

Memory Stores

Three memory layers, all loaded into the prefix at session start:

Project memory — REASONIX.md in the project root, containing conventions, architecture notes, and project-specific knowledge.
User memory — ~/.reasonix/memory/ with types (user, feedback, project, reference). Persistent across all sessions.
Session memory — distilled from VolatileScratch into the AppendOnlyLog, then optionally compacted at turn end.

Mid-session writes to user memory are durable (written to disk immediately) but do not update the current session's loaded prefix. The snapshot is refreshed on the next session start to avoid breaking the cache.

📋

REASONIX.md as project convention

The REASONIX.md file in the project root is loaded as part of the immutable prefix at session start. It's the recommended place to document project conventions, architecture decisions, and coding patterns so the agent understands the codebase context from turn one. Edits to REASONIX.md take effect on the next /new command via system prompt rebuild.

How Reasonix compares

Capability	Reasonix	Claude Code	Aider	Goose	Codex
Backend	DeepSeek only	Anthropic	Any (OpenRouter)	Multi-model	OpenAI
License	MIT	Closed	Apache 2	Apache 2	Closed
Cost profile	Low per task	Premium	Varies	Varies	Subscription
DeepSeek prefix-cache	Engineered	Not applicable	Incidental	Incidental	Not applicable
Tool-call repair	4-pass pipeline	No	No	No	No
Auto cost escalation	Transparent, user-visible	No	No	No	No
Embedded web dashboard	Yes	No	No	No	No
Configurable web search	Mojeek / SearXNG / Metaso / Tavily	Yes	Yes	Yes	Bing
Persistent per-workspace sessions	Yes — JSONL, replay/diff	Partial	No	Partial	Partial
Plan mode · MCP · hooks · skills	Yes	Yes	Partial	Yes	Yes
Parallel tool dispatch	Yes — safety-aware chunks	Yes	No	Yes	Yes
Open community development	Yes	No	Yes	Yes	No

Module layout at a glance

Module	File(s)	Role
Cache-first loop	`src/loop.ts` (~1,200 lines)	Pillar 1 + 3 — main reagent loop, context assembly, escalation
Tool-call repair	`src/repair/` (flatten, scavenge, truncation, storm)	Pillar 2 — four-pass repair pipeline
Memory runtime	`src/memory/runtime.ts`	ImmutablePrefix, AppendOnlyLog, VolatileScratch classes
Session management	`src/memory/session.ts`	JSONL load/rewrite/archive, message healing
Telemetry	`src/telemetry/stats.ts`	Cost accounting, cache-hit metrics, pricing tables
Filesystem tools	`src/tools/filesystem.ts` (~840 lines)	READ/WRITE/SEARCH/REPLACE with sandbox, outline mode, binary protection
Shell tools	`src/tools/shell.ts`	Gated command execution, background jobs, allowlist parsing
Subagent	`src/tools/subagent.ts`	Isolated child loop spawning, tool restriction, event sink
MCP bridge	`src/mcp/`	Stdio + SSE + streamable HTTP transports, registry, spec
TUI	`src/cli/ui/App.tsx` + hooks/components	Ink 5 React app — StatsPanel, EventLog, PlanConfirm, slash commands
Dashboard	`src/server/` + `dashboard/`	REST API + compiled SPA for cross-session stats
Event kernel	`src/core/`	Event union, pure reducers, eventize normalization

Design evolution

Reasonix has evolved through several major versions, each adding a pillar or refining the architecture:

v0.0.x

Pillar 1 (cache-first loop) end-to-end, repair pipeline complete, Ink TUI scaffold.

↓

v0.1

τ-bench numbers published, streaming polish, transcript replay.

↓

v0.3

MCP client (stdio + SSE), session persistence.

↓

v0.4.x

reasonix code with SEARCH/REPLACE edits, review/auto gate, background jobs, hooks.

↓

v0.5.x

V4 model support, skills, memory, subagents, actionable error messages.

↓

v0.6

Pillar 3 (cost control): flash-first defaults, auto-compaction, /pro one-shot arming, failure-triggered escalation, cost badges. UI refactor: App.tsx split into hooks/components, slash commands split into per-topic modules.

↓

v0.31 (current)

branch + harvest features removed entirely — rarely paid for themselves and bloated the slash surface.

Installation and usage

Requires Node ≥ 22. Works on macOS, Linux, and Windows (PowerShell, Git Bash, Windows Terminal).

npm install -g reasonix
reasonix code my-project   # paste DeepSeek API key on first run; persists after

Or run once without installing:

cd my-project
npx reasonix code          # always uses the latest package

Command	When
`reasonix` / `reasonix code [dir]`	The coding agent. Start here.
`reasonix chat`	Plain chat — no filesystem or shell tools.
`reasonix run "task"`	One-shot, streams to stdout. Good for pipes.
`reasonix doctor`	Health check: Node, API key, MCP wiring.
`reasonix update`	Upgrade Reasonix itself.

💡

Code vs chat mode

code is the default and the only mode with filesystem/shell tools and SEARCH/REPLACE review. chat is the lighter, tools-off shell — reach for it when you want a thinking partner with MCP attached but no disk access. Memory in code mode is project-scoped + global; in chat mode it's global only.

Explicit non-goals

Reasonix is opinionated. Some things it deliberately doesn't do — listed here so you can pick the right tool for your work.

DeepSeek-only on purpose

Coupling to one backend is the feature, not a limitation. Every abstraction is justified by a DeepSeek-specific behavior. An OpenAI-compatible shim would work via --model override, but is not tested or supported.

Terminal-first, no IDE integration

The diff lives in git diff, the file tree in ls. The dashboard is a companion, not a Cursor replacement. The desktop client (Tauri, prerelease) bundles the same loop with a GUI, but the CLI remains canonical.

No RAG / vector retrieval as primary

Semantic search exists but is secondary to grep and explicit search. The vector index is local (Ollama or OpenAI-compatible endpoint) and opt-in via reasonix index.

No multi-agent orchestration

Subagents are a cost-reduction mechanism, not a coordination primitive. Max depth is 1 (no recursive delegation). Children cannot delegate, interact with the user, or write to shared memory.

⚠️

The tradeoff: DeepSeek-only limitation

The most honest assessment: Reasonix's greatest strength is also its biggest limitation. By coupling entirely to DeepSeek, it can't serve users who need Anthropic, OpenAI, or local models. If your work requires multi-provider flexibility, Aider or Claude Code are better choices. If your priority is keeping costs low on DeepSeek across long sessions, nothing else in this repo set comes close. The cache-first architecture — prefix partitioning, append-only logging, volatile scratch — would need to be rethought from scratch for any other backend.

Architecture summary

Reasonix demonstrates that tight coupling to a single LLM provider can produce a more efficient agent than generic frameworks. By treating DeepSeek's prefix-cache mechanic as a design constraint rather than an implementation detail, it achieves cache hit rates that make long coding sessions economically viable. The tool-call repair pipeline handles DeepSeek-specific failure modes that generic agents silently lose. And the cost control mechanisms ensure that users are never surprised by a bill.

The project is MIT-licensed, actively developed, and maintains a bilingual (English/Chinese) community on Discord. The codebase is organized into small modules — the largest UI component is ~2,000 lines, every slash command handler is under 200 lines — making it accessible for contributors to add tools, repair passes, or TUI improvements without navigating a monolith.

📖

Reasonix: The Cache-First Coding Agent

Why Reasonix is different

The cache-first invariant

Pillar 1: Cache-First Loop

Pillar 2: Tool-Call Repair

Pillar 3: Cost Control

Pillar 1: The Cache-First Loop

Pillar 2: Tool-Call Repair

1. Flatten

2. Scavenge

3. Truncation Repair

4. Storm Detection

Pass order matters

Pillar 3: Cost Control

Turn-End Auto-Compaction

/pro Single-Turn Arming

Failure-Signal Auto-Escalation

Cost Transparency

No silent escalation

The Tool Suite

Session Persistence and Observability

JSONL Session Persistence

Transcript Replay and Diff

Event Log Kernel

Web Dashboard

Skills and Memory

SKILL.md Playbooks

Memory Stores

REASONIX.md as project convention

How Reasonix compares

Module layout at a glance

Design evolution

Installation and usage

Code vs chat mode

Explicit non-goals

DeepSeek-only on purpose

Terminal-first, no IDE integration

No RAG / vector retrieval as primary

No multi-agent orchestration

The tradeoff: DeepSeek-only limitation

Architecture summary

Further reading