CodeWhale: The Constitutional DeepSeek Harness
CodeWhale is one of the more opinionated agent harnesses in this repo set. The interesting part is not just that it targets DeepSeek V4. It is that the repo turns that choice into a full operating model: constitution-sized prompting, provider-aware routing, durable tasks, background sub-agents, runtime APIs, and a file-edit pipeline that tries to leave behind evidence instead of just optimism.
What CodeWhale actually is
The repo name and README pitch make CodeWhale look like a single terminal app for DeepSeek V4. That is true, but it undersells the architecture. The root workspace splits responsibilities across multiple crates for the TUI, core turn loop, model registry, app server, execution policy, state persistence, MCP, tools, secrets, and TUI infrastructure.
So the real shape is not "one Rust CLI binary." It is a Rust agent platform whose default face happens to be a TUI. The same repo also exposes a runtime server, ACP adapter, MCP server mode, durable task timelines, and health-reporting surfaces for external supervisors.
| Layer | Main files | Why it matters |
|---|---|---|
| Prompt and laws | crates/tui/src/prompts/base.md |
Defines the Constitution, verification rules, and sub-agent strategy |
| Model routing | crates/agent/src/lib.rs, docs/PROVIDERS.md |
Maps model ids to provider families and capability hints |
| Runtime loop | crates/tui/src/core/engine/ |
Streams turns, manages tool execution, and handles retries |
| Durable work | crates/tui/src/task_manager.rs |
Keeps background tasks and timelines alive across sessions |
| External integration | docs/RUNTIME_API.md |
Documents HTTP, ACP, MCP, and machine-readable health surfaces |
The Constitution is the product thesis
A lot of coding agents say they care about truth, verification, and tool discipline. CodeWhale turns those ideas into explicit law. The base prompt is written as a Constitution with articles for identity, truth, user agency, direct action, verification, and coordination, followed by a nine-tier hierarchy of authority.
That matters because it changes the harness from "a pile of system prompt advice" into a repo with a visible operating doctrine. The repo is betting that long, stable prompt scaffolding becomes affordable when DeepSeek prefix caching is strong enough, so the prompt can afford to be formal instead of compressed into vague slogans.
Truth outranks style
The prompt explicitly forbids invented verification, fabricated tool output, and memory-shaped evidence.
User intent is tiered cleanly
The current user message sits above project instructions and memory, but below constitutional safety and truth constraints.
Coordination is treated as a duty
The prompt frames durable handoff state as part of the agent's job, not as optional good citizenship.
This is more than branding
Many harnesses have a strong tone. Fewer have a strong legal structure. CodeWhale's prompt is detailed enough that you can map design decisions in the repo back to it: mandatory verification, aggressive tool use, bounded result retrieval, parallel-first behavior, and heavy use of sub-agents.
DeepSeek-first, but not trapped there
CodeWhale is genuinely DeepSeek-centered. The default provider uses the DeepSeek beta endpoint because that path enables strict tool mode, chat-prefix behavior, and fill-in-middle support. The README and docs also lean into DeepSeek V4 economics, thinking tokens, and prefix-cache reuse as first-order design concerns.
But the repo is not DeepSeek-only. The provider docs and model registry show a broader compatibility layer: NVIDIA NIM, OpenAI-compatible routes, OpenRouter, Moonshot/Kimi, and self-hosted routes like vLLM, SGLang, and Ollama. So CodeWhale is better described as DeepSeek-native with escape valves rather than a single-provider dead end.
Why the default is narrow
DeepSeek beta exposes the features the harness most wants: reasoning-aware routing, strict tool behavior, and the prompt-cache economics that justify a long constitutional scaffold.
Why the registry is broad
The rest of the stack still has to survive enterprise gateways, alternate hosts, and local models. The registry and provider docs are the repo's answer to that tension.
Orchestration: durable tasks, background children, and symbolic state
CodeWhale does not stop at "the model can call tools." The tool surface and docs show a full orchestration layer: durable tasks, checklist tools, verification gates, background shell jobs, persistent sub-agents, and RLM sessions that move oversized symbolic state into a Python REPL instead of bloating the parent transcript.
The combination is unusual. Many harnesses have sub-agents. Many have task lists. Very few combine both with a persistent large-context side channel like RLM plus bounded handle reads to keep the top-level conversation light.
| Subsystem | Surface | Architectural effect |
|---|---|---|
| Sub-agents | agent_open, agent_eval, agent_close |
Parallel read-only, planning, implementation, and verification work |
| Durable tasks | task_create, task_read, gates, timelines |
Long-lived work survives beyond one volatile turn |
| RLM | rlm_open, rlm_eval, handle_read |
Keeps huge context and intermediate state off the parent transcript |
| Memory | /memory, composer prefix, remember tool |
Optional persistent user facts inside the prompt prefix |
One README claim needs a footnote
The README says sub-agents run concurrently "up to 20." The deeper docs clarify the actual default cap is 10 and only becomes 20 when configured higher. That is a small mismatch, but it is worth noting because the repo otherwise documents its runtime boundaries carefully.
File editing is verification-heavy rather than ideology-heavy
CodeWhale is not as doctrinaire as Reasonix, and it is not as
provider-native as ADK-Rust. The local edit stack has three main paths:
write_file, edit_file, and
apply_patch. Each path tries to leave behind compact,
useful evidence: diffs, preflight summaries, failure messages, and LSP
diagnostics when the manager is active.
That is the right framing for the repo. The clever part is not one universal edit grammar. The clever part is that writes are staged and summarized, patch inputs get parsed before mutation, and post-edit diagnostics feed back into the same runtime loop.
Whole-file writes
write_file snapshots prior contents, creates parent
directories when needed, writes directly, then returns a diff plus
optional LSP feedback.
Scoped string edits
edit_file starts exact, then falls back only to
indentation-tolerant and punctuation-normalized matching, while
still rejecting ambiguous fuzzy hits.
Transactional patching
apply_patch preflights touched files and hunks, builds
pending writes first, and rolls back already-written files if a
later write fails.
Security posture: real OS hooks, pragmatic limits
The sandbox docs are better than most repo README blurbs. CodeWhale is explicit about what each layer actually does: macOS Seatbelt, Linux Landlock, seccomp, early process hardening, optional bubblewrap, and a Windows Job Object path that still calls itself v1.
That honesty matters. The docs name what is not covered as clearly as what is covered. Network is not magically solved by every layer. The Windows path is containment-first, not full filesystem isolation yet. Bubblewrap is optional. Those are signs of a repo trying to describe reality instead of hiding behind a single "sandboxed" word.
Strongest Linux story
Landlock, seccomp, process hardening, and optional bubblewrap stack into a layered answer rather than one mechanism pretending to do everything.
Clean approval mapping
Modes and approval policies are documented separately, which keeps tool safety, sandboxing, and user confirmation from collapsing into one fuzzy setting.
Runtime API turns it into more than a TUI
The runtime API docs are the clearest proof that CodeWhale is not just
a terminal interface. The repo exposes HTTP/SSE thread and task routes,
a narrower ACP adapter for editor clients, MCP server mode, and a
machine-readable health report through codewhale doctor --json.
This is where the architecture starts to look like an agent runtime service rather than merely a chat UI with shell access. Threads can be resumed and forked, automations can be scheduled, tasks can be listed and canceled, and clients can introspect skills and MCP tools without scraping the terminal.
The repo is quietly aiming higher than the README headline
"Terminal coding agent for DeepSeek V4" is accurate, but incomplete. The runtime API, ACP mode, automations, and durable tasks show a repo that wants to be embeddable, supervisable, and scriptable outside the terminal too.
What I would steal from it
Prompt law with explicit precedence
The constitutional hierarchy is easier to audit than the usual blob of stacked prompt instructions.
Bounded retrieval for large outputs
RLM plus handle_read is a more serious answer to
long-context work than just hoping the transcript never gets too
large.
Post-edit diagnostics as normal output
Attaching LSP diagnostics directly to edit results makes the write path feel like part of one feedback loop instead of a blind file mutation.
Docs that admit their own edges
The sandbox and sub-agent docs both state where defaults stop and where configuration or future work begins.
Further reading
Source material
GitHub:
Hmbown/CodeWhale
Install: npm install -g codewhale
Key docs: docs/ARCHITECTURE.md,
docs/PROVIDERS.md, docs/SUBAGENTS.md,
docs/TOOL_SURFACE.md,
docs/RUNTIME_API.md
Key code: crates/tui/src/prompts/base.md,
crates/tui/src/task_manager.rs,
crates/tui/src/tools/file.rs,
crates/tui/src/tools/apply_patch.rs