DeepSeek V4 first • Rust workspace • Constitutional prompt stack • Runtime API • Sub-agents + RLM • LSP post-edit feedback

CodeWhale: The Constitutional DeepSeek Harness

CodeWhale is one of the more opinionated agent harnesses in this repo set. The interesting part is not just that it targets DeepSeek V4. It is that the repo turns that choice into a full operating model: constitution-sized prompting, provider-aware routing, durable tasks, background sub-agents, runtime APIs, and a file-edit pipeline that tries to leave behind evidence instead of just optimism.

(Ad over. Back to the whale.)

What CodeWhale actually is

The repo name and README pitch make CodeWhale look like a single terminal app for DeepSeek V4. That is true, but it undersells the architecture. The root workspace splits responsibilities across multiple crates for the TUI, core turn loop, model registry, app server, execution policy, state persistence, MCP, tools, secrets, and TUI infrastructure.

So the real shape is not "one Rust CLI binary." It is a Rust agent platform whose default face happens to be a TUI. The same repo also exposes a runtime server, ACP adapter, MCP server mode, durable task timelines, and health-reporting surfaces for external supervisors.

Layer	Main files	Why it matters
Prompt and laws	`crates/tui/src/prompts/base.md`	Defines the Constitution, verification rules, and sub-agent strategy
Model routing	`crates/agent/src/lib.rs`, `docs/PROVIDERS.md`	Maps model ids to provider families and capability hints
Runtime loop	`crates/tui/src/core/engine/`	Streams turns, manages tool execution, and handles retries
Durable work	`crates/tui/src/task_manager.rs`	Keeps background tasks and timelines alive across sessions
External integration	`docs/RUNTIME_API.md`	Documents HTTP, ACP, MCP, and machine-readable health surfaces

The Constitution is the product thesis

A lot of coding agents say they care about truth, verification, and tool discipline. CodeWhale turns those ideas into explicit law. The base prompt is written as a Constitution with articles for identity, truth, user agency, direct action, verification, and coordination, followed by a nine-tier hierarchy of authority.

That matters because it changes the harness from "a pile of system prompt advice" into a repo with a visible operating doctrine. The repo is betting that long, stable prompt scaffolding becomes affordable when DeepSeek prefix caching is strong enough, so the prompt can afford to be formal instead of compressed into vague slogans.

Truth outranks style

The prompt explicitly forbids invented verification, fabricated tool output, and memory-shaped evidence.

User intent is tiered cleanly

The current user message sits above project instructions and memory, but below constitutional safety and truth constraints.

Coordination is treated as a duty

The prompt frames durable handoff state as part of the agent's job, not as optional good citizenship.

📜

This is more than branding

Many harnesses have a strong tone. Fewer have a strong legal structure. CodeWhale's prompt is detailed enough that you can map design decisions in the repo back to it: mandatory verification, aggressive tool use, bounded result retrieval, parallel-first behavior, and heavy use of sub-agents.

DeepSeek-first, but not trapped there

CodeWhale is genuinely DeepSeek-centered. The default provider uses the DeepSeek beta endpoint because that path enables strict tool mode, chat-prefix behavior, and fill-in-middle support. The README and docs also lean into DeepSeek V4 economics, thinking tokens, and prefix-cache reuse as first-order design concerns.

But the repo is not DeepSeek-only. The provider docs and model registry show a broader compatibility layer: NVIDIA NIM, OpenAI-compatible routes, OpenRouter, Moonshot/Kimi, and self-hosted routes like vLLM, SGLang, and Ollama. So CodeWhale is better described as DeepSeek-native with escape valves rather than a single-provider dead end.

Why the default is narrow

DeepSeek beta exposes the features the harness most wants: reasoning-aware routing, strict tool behavior, and the prompt-cache economics that justify a long constitutional scaffold.

Why the registry is broad

The rest of the stack still has to survive enterprise gateways, alternate hosts, and local models. The registry and provider docs are the repo's answer to that tension.

Orchestration: durable tasks, background children, and symbolic state

CodeWhale does not stop at "the model can call tools." The tool surface and docs show a full orchestration layer: durable tasks, checklist tools, verification gates, background shell jobs, persistent sub-agents, and RLM sessions that move oversized symbolic state into a Python REPL instead of bloating the parent transcript.

The combination is unusual. Many harnesses have sub-agents. Many have task lists. Very few combine both with a persistent large-context side channel like RLM plus bounded handle reads to keep the top-level conversation light.

Subsystem	Surface	Architectural effect
Sub-agents	`agent_open`, `agent_eval`, `agent_close`	Parallel read-only, planning, implementation, and verification work
Durable tasks	`task_create`, `task_read`, gates, timelines	Long-lived work survives beyond one volatile turn
RLM	`rlm_open`, `rlm_eval`, `handle_read`	Keeps huge context and intermediate state off the parent transcript
Memory	`/memory`, composer prefix, remember tool	Optional persistent user facts inside the prompt prefix

⚠️

One README claim needs a footnote

The README says sub-agents run concurrently "up to 20." The deeper docs clarify the actual default cap is 10 and only becomes 20 when configured higher. That is a small mismatch, but it is worth noting because the repo otherwise documents its runtime boundaries carefully.

File editing is verification-heavy rather than ideology-heavy

CodeWhale is not as doctrinaire as Reasonix, and it is not as provider-native as ADK-Rust. The local edit stack has three main paths: write_file, edit_file, and apply_patch. Each path tries to leave behind compact, useful evidence: diffs, preflight summaries, failure messages, and LSP diagnostics when the manager is active.

That is the right framing for the repo. The clever part is not one universal edit grammar. The clever part is that writes are staged and summarized, patch inputs get parsed before mutation, and post-edit diagnostics feed back into the same runtime loop.

Whole-file writes

write_file snapshots prior contents, creates parent directories when needed, writes directly, then returns a diff plus optional LSP feedback.

Scoped string edits

edit_file starts exact, then falls back only to indentation-tolerant and punctuation-normalized matching, while still rejecting ambiguous fuzzy hits.

Transactional patching

apply_patch preflights touched files and hunks, builds pending writes first, and rolls back already-written files if a later write fails.

Security posture: real OS hooks, pragmatic limits

The sandbox docs are better than most repo README blurbs. CodeWhale is explicit about what each layer actually does: macOS Seatbelt, Linux Landlock, seccomp, early process hardening, optional bubblewrap, and a Windows Job Object path that still calls itself v1.

That honesty matters. The docs name what is not covered as clearly as what is covered. Network is not magically solved by every layer. The Windows path is containment-first, not full filesystem isolation yet. Bubblewrap is optional. Those are signs of a repo trying to describe reality instead of hiding behind a single "sandboxed" word.

Strongest Linux story

Landlock, seccomp, process hardening, and optional bubblewrap stack into a layered answer rather than one mechanism pretending to do everything.

Clean approval mapping

Modes and approval policies are documented separately, which keeps tool safety, sandboxing, and user confirmation from collapsing into one fuzzy setting.

Runtime API turns it into more than a TUI

The runtime API docs are the clearest proof that CodeWhale is not just a terminal interface. The repo exposes HTTP/SSE thread and task routes, a narrower ACP adapter for editor clients, MCP server mode, and a machine-readable health report through codewhale doctor --json.

This is where the architecture starts to look like an agent runtime service rather than merely a chat UI with shell access. Threads can be resumed and forked, automations can be scheduled, tasks can be listed and canceled, and clients can introspect skills and MCP tools without scraping the terminal.

🛰️

The repo is quietly aiming higher than the README headline

"Terminal coding agent for DeepSeek V4" is accurate, but incomplete. The runtime API, ACP mode, automations, and durable tasks show a repo that wants to be embeddable, supervisable, and scriptable outside the terminal too.

What I would steal from it

Prompt law with explicit precedence

The constitutional hierarchy is easier to audit than the usual blob of stacked prompt instructions.

Bounded retrieval for large outputs

RLM plus handle_read is a more serious answer to long-context work than just hoping the transcript never gets too large.

Post-edit diagnostics as normal output

Attaching LSP diagnostics directly to edit results makes the write path feel like part of one feedback loop instead of a blind file mutation.

Docs that admit their own edges

The sandbox and sub-agent docs both state where defaults stop and where configuration or future work begins.