Twenty repos, twenty distinct engineering choices
This is the specific read: what each repo is actually built from, what files do the most interesting work, and where the implementation choices reveal the product's real priorities.
Quick profile matrix
| Repo | Primary stack | Feels like | Best at | Main caveat |
|---|---|---|---|---|
| Pochi | TypeScript (Bun monorepo — Bun required for wa-sqlite WASM) | Six-vendor aggregation platform | Wraps 6 backends (Tabby, Pochi, Gemini CLI, Codex, GitHub Copilot, Qwen Code), apply-diff safety, CustomAgent schema, exponential-backoff retry, Hono+Zod API | Complexity grows because it bridges many ecosystems at once; not a standalone coding agent |
| Neovate Code | TypeScript (pnpm workspace) | Provider-rich CLI with strong security opinions | Broad provider support, multi-stage context compression, and the most opinionated bash safety in the set | More generic than distinct; identity comes from integrations, not one native runtime |
| Mux | TypeScript / Electron / web | Workspace-centric multi-runtime agent product | 7 runtime backends, 1Password secret refs, regex tool allow/deny, enforced agent_report protocol, SSH credential forwarding | Large surface area makes it harder to read quickly than tighter CLIs |
| Crush | Go | Productized terminal app with loop detection | Coherence, readability, SHA-256 loop detection, and binary-embedded prompt templates | Less ecosystem breadth than the most adapter-heavy TypeScript projects |
| Kimi CLI | Python + own kosong package |
Protocol-aware terminal assistant with a multi-provider abstraction layer | ACP bridging, hooks system (pre/post tool use), and provider-aware message conversion | More platform-centered than provider-neutral |
| Qwen Code | TypeScript | Config-heavy multi-model CLI | Five-layer model config resolution, declarative tools, and MCP lifecycle management | Still inherits some shape from Gemini CLI ancestry |
| OpenHands | Python + TypeScript | Agent platform runtime | Sandbox and app/server architecture | The most important modern agent core is not fully in this repo snapshot |
| Claude Code | TypeScript / Bun / React Ink | Bespoke coding-agent runtime with deep security engineering | 10+8 state machine, token budget constants, undercover mode, Zsh attack mitigation, tree-sitter shell parsing, 40+ tools, teams/tasks subsystem | Much less provider-neutral than peers like Mux or Qwen |
| Open Claude Code | TypeScript / Node.js (ESM) | Clean-room implementation targeting Claude Code v2.1.91 | Async generator loop (13 events), 25 tools, 5 providers (Anthropic/OpenAI/Google/Bedrock/Vertex), 7 hook events, git worktree isolation, file checkpointing, 40 slash commands, automated nightly releases | Independent rebuild via ruDevolution decompilation — functional but adds unique features (multi-agent teams, session teleport) |
| DeerFlow | Python + LangGraph + FastAPI | Super-agent harness with 14-layer middleware and SSE streaming | 13-middleware stack, composable loop detection, skill self-evolution, IM channel integration, LangGraph thread management | Less of a single polished CLI identity than Claude Code or Crush |
| Hermes | Python (Nous Research) | Self-improving multi-platform agent | Learning loop (skills), multi-platform (Telegram/Discord/Slack), MoA synthesis, RL training, remote execution | Breadth over focus — not as sharp as a dedicated coding agent |
| Pi Mono | TypeScript (Node.js 20+, v0.73.0, MIT) | Minimalist extension-first kernel | 26 providers across 10 APIs, tree-structured JSONL v3 sessions, differential TUI, Pi Packages (npm/git bundles), parallel tool execution, precise multi-edit with file mutation queue, 4 run modes | No built-in permissions, no MCP, no sub-agents — all must be added via extensions or packages |
| Oh My Pi | TypeScript / Bun / Rust N-API (v14.7.8, MIT) | Power-tool fork platform | Default hashline editing, native engine, MCP and plugin discovery, task/subagent tooling, TTSR rules, browser/Python/ LSP surfaces, and an edit-benchmark package | Much broader and less ideologically minimal than Pi Mono; the repo's breadth can hide the core agent story if you only skim the README |
| OpenCode | TypeScript / Bun / Effect / SolidJS | Provider-neutral client-server coding runtime |
TUI + web + desktop clients, apply_patch and
exact edit tools, wildcard permissions, worktrees, skills,
MCP, ACP, and SDK access
|
Broader surface area than minimalist CLIs, and permissioning is not the same thing as a real sandbox |
| ADK-Rust | Rust (34-member Cargo workspace, Apache-2.0) | Publishable agent framework ecosystem | Feature-tier packaging, provider-native tool wrappers, graph workflows, A2A/AWP/payment extensions, and a strong crate boundary story | Less of a single end-user product; several advanced crates are explicitly experimental |
| Codex | Rust (Cargo workspace, 70+ crates, 3,805 files, Apache-2.0) | OpenAI's production coding agent — deeply integrated product runtime | Platform-specific sandboxing (Seatbelt/bubblewrap/Windows tokens), bidirectional MCP, multi-agent job execution, IDE extensions | OpenAI-centric (Responses API only), Rust-only contribution barrier, no built-in learning loop |
| Wintermolt | Zig 0.15 (3 MB native binary) | Everything-agent platform with 7 modes and dozens of features | 6 backends, 16 tools, cron scheduling, Tailscale, camera vision, browser automation, MCP bidirectional, 4 chat bridges, macOS menu bar | Enormous feature surface; bash safety is pattern-based (not AST); TypeScript sidecars for non-CLI modes |
| Zaica | Zig 0.15 (~9,100 lines) | Focused coding specialist with structured workflows | Chain mode (.chain.md), Wyhash loop detection (3-tier), reactive state (zefx), parallel sub-agent dispatch, Cyrillic REPL | Only 5 core tools; no MCP, no RAG, no web UI; terminal-only experience |
| Goose | Rust (Cargo workspace, Apache-2.0, v1.32.0) | Extension-first agent with LLM-based security | 15+ providers, 5-layer security inspector stack, AdversaryInspector (LLM-based review), GooseMode (Auto/Approve/SmartApprove/Chat), recipe framework, MOIM injection | Extension-first model means core needs extensions for full functionality |
| Dirac | TypeScript (fork of Cline) | Hash-anchored coding agent with AST precision | Hash-anchored parallel edits, AST-native precision, multi-file batching, 64.8% cost reduction, no MCP, 8-type hook system, git checkpoints, state mutex, 40+ providers | No MCP support; inherits Cline architecture |
Deep per-agent profiles
Pochi — A six-vendor aggregator, not a single agent
The most important thing to understand about Pochi: it is not a
single coding agent. It is a
multi-vendor aggregation platform that wraps six
distinct backends: vendor-tabby,
vendor-pochi, vendor-gemini-cli,
vendor-codex, vendor-github-copilot, and
vendor-qwen-code. The runtime requires
Bun (not Node.js) specifically because it needs
wa-sqlite WASM support for its local database layer.
Pochi's internal API layer uses Hono +
Zod validation and exposes
POST /api/chat/stream, POST /api/chat,
and GET /api/models. Models carry a
costType: "basic" | "premium" that maps to
user-facing labels "swift" and "super". Despite
aggregating six backends, the vendor-pochi package exposes only
2 tools: webFetch and
webSearch — everything else comes from the wrapped
backends.
The retry strategy is explicit: withRetry() caps at
3 attempts with 1000ms initial delay, 10000ms
maximum, and a multiplier of 2 (exponential backoff).
Authentication uses better-auth with JWT +
device-link plugins; a set-auth-token response header
silently renews credentials before they expire.
The apply-diff tool includes an
expectedReplacements safety parameter — if you
declare two replacements but the pattern matches three times, the
edit fails. The CustomAgent shape in
new-task.ts lets you define inline sub-agents; the
planner agent uniquely retains
askFollowupQuestion while others lose it.
Key files
packages/vendor-pochi/src/vendor.ts (withRetry,
better-auth),
packages/vendor-pochi/src/pochi-api.ts (Hono +
Zod routes),
packages/tools/src/apply-diff.ts
(expectedReplacements safety),
packages/tools/src/new-task.ts (CustomAgent
schema), packages/vendor-codex/,
packages/vendor-gemini-cli/
Neovate Code — The most opinionated bash tool in the set
Neovate's bash tool (src/tools/bash.ts) contains the
most elaborate pre-execution security logic in this repository
set. The banned command list is concrete and non-trivial:
alias, aria2c, axel, bash, chrome, curl, curlie, eval,
firefox, fish, http-prompt, httpie, links, lynx, nc,
rm, safari, sh, source, telnet, w3m, wget, xh, zsh
Beyond static bans, it detects command substitution by actually
parsing the shell syntax: it tracks single-quote,
double-quote, and backslash states to correctly identify
$() and backtick substitutions that would survive a
naive regex check. The same character state machine is used to
split pipeline segments correctly before checking each segment
independently.
For high-risk detection it checks patterns like
rm -rf, sudo, dd if=,
mkfs, and curl | sh, as well as every
segment in a pipeline individually — if any segment is high-risk,
the whole command is considered high-risk.
The context compression system (src/compression.ts)
is also more configurable than most: it has separate pruning and
compaction phases, a triggerRatio that triggers
compaction when context usage exceeds a percentage of the model
limit, protectedTools that are never pruned, a
protectTurns count, and an
autoContinue mode that automatically resumes after
compaction.
Key files
src/tools/bash.ts (banned commands, command
substitution detection, pipeline segment analysis),
src/compression.ts (compaction + pruning config),
src/tools/task.ts (AI sub-task invocation)
Mux — Seven runtimes, 1Password secrets, and regex-gated tool lists
Mux ships seven distinct runtime backends in
src/node/runtime/: CoderSSHRuntime,
DevcontainerRuntime, DockerRuntime,
LocalRuntime, SSHRuntime,
WorktreeRuntime, and RemoteRuntime —
each with its own .test.ts. That list alone explains
why the repo is large: it supports everywhere code can run, not
just local.
The infrastructure around those runtimes is equally serious:
gitBundleSync.ts syncs git bundles to remote
environments, credentialForwarding.ts tunnels SSH
credentials into containers,
openSshPromptMediation.ts mediates
SSH_ASKPASS prompts, and
SSH2ConnectionPool.ts manages a connection pool.
Config is JSONC and uses secret://op/... references
for 1Password integration — secrets never hard-coded.
The 9 built-in agents are defined as Markdown files with YAML
frontmatter in src/node/builtinAgents/. Tool lists
use regex allow/deny patterns:
- .* adds every available tool;
- file_edit_.* removes all file-edit tools;
- mux_agents_.* blocks config tools from sub-agents.
The exec.md agent has a hard protocol rule:
"Before your stream ends, you MUST call
agent_report exactly once."
The orchestrator.md agent has an explicit
prohibition:
"Do NOT create pull requests, push to remote branches, or run
any gh pr / git push commands."
Key files
src/node/runtime/ (7 runtime backends),
src/node/builtinAgents/exec.md (enforced
agent_report protocol),
src/node/builtinAgents/orchestrator.md (no-push
rule), src/node/gitBundleSync.ts,
src/node/SSH2ConnectionPool.ts,
src/node/config.ts (1Password secret references)
Crush — Go discipline, binary-embedded prompts, SHA-256 loop detection, and LSP integration
Crush uses Go's //go:embed directive to embed its
system prompt templates directly into the binary at compile time.
The three templates — coder.md.tpl,
task.md.tpl, initialize.md.tpl — are Go
text/template files that render with runtime data:
working directory, git repo status, date, platform, context files,
and available skill XML.
The coder prompt template runs to hundreds of lines and is
remarkably specific. It contains sections for
<critical_rules>,
<communication_style>,
<workflow>,
<decision_making>,
<editing_files>,
<whitespace_and_exact_matching>,
<task_completion>, and
<error_handling>. MCP server instructions are
injected as a separate <mcp-instructions> block
appended to the system prompt at runtime.
The most distinctive engineering decision in this repo is the
SHA-256 loop detection in
internal/agent/loop_detection.go. For each agent
step, a signature is computed by hashing the concatenated
tool_name + "\x00" + input + "\x00" + output for
every tool call in that step. If any signature appears more than 5
times in the last 10 steps, the agent is considered stuck. This is
far more robust than checking tool names alone — the same tool
called with different arguments or producing different output gets
a different hash.
Crush also includes a dedicated
internal/agent/agentic_fetch_tool.go
that uses a smaller model to browse the web on behalf of the main
agent — a mini-agent within the agent for cost-effective
information retrieval.
Two more tools stand out as highly unusual in a coding CLI:
-
Sourcegraph search
(
sourcegraph.go) — a native tool that queries Sourcegraph's code search API. Parameters includequery,count,context_window, andtimeout. This is the only agent in the set with first-class Sourcegraph integration. -
Full LSP integration —
diagnostics.goexposes anlsp_diagnosticstool that returns project-wide or file-level diagnostics from a live Language Server.references.goexposes a symbol references tool: given a symbol name, it queries the LSP to find all references across the project. These tools make Crush the only agent in this set that can call into an actual Language Server during a session.
Key files
internal/agent/loop_detection.go (SHA-256 tool
signature hashing),
internal/agent/templates/coder.md.tpl (full
system prompt),
internal/agent/prompts.go (go:embed usage),
internal/agent/tools/sourcegraph.go (Sourcegraph
search), internal/agent/tools/diagnostics.go (LSP
diagnostics),
internal/agent/tools/references.go (LSP symbol
references),
internal/agent/agentic_fetch_tool.go (mini-agent
for web browsing)
Crush — Recent: PreToolUse hooks, Hyper provider, Azure support
Major recent additions to Crush:
PreToolUse hook system (dc003bf7, ~1,196 lines
across 4 new files): Shell commands fire on tool events. Hooks
receive structured JSON via stdin (CRUSH_EVENT,
CRUSH_TOOL_NAME, CRUSH_CWD, etc.) and
return decisions: allow, deny,
halt (stops the whole turn), and optional
context or updated_input JSON patches.
Hooks run in parallel via goroutines with configurable timeouts.
Output format supports both Crush format and Claude Code hook
format for compatibility. Exit code 2 blocks, exit code 49 halts
the turn.
Hyper provider: Crush ships with a new
hyper provider from charm.land
(internal/agent/hyper/provider.go,
provider.json). Enable via
HYPER_API_KEY or HYPER_URL env vars.
Provider.json is embedded at compile time via
//go:embed. Models include GLM-5, GLM-5.1, GPT-OSS,
Kimi K2.5, Kimi K2.6 with per-model pricing, context windows, and
reasoning levels. A new quickstyle.go (980 lines)
provides theme support with a themes.go system.
Azure provider support added. DeepSeek V4 reasoning content support fixed and maintained. Bedrock adaptive thinking improvements. The UI received significant style refactoring with semantic color names and improved theme switching.
New key files
internal/hooks/hooks.go (Runner, hook execution),
internal/hooks/input.go (payload/env builder),
internal/hooks/runner.go (parallel execution,
exit code semantics),
internal/agent/hyper/ (Hyper provider +
provider.json),
internal/ui/styles/quickstyle.go (new 980-line
style system)
Kimi CLI — Provider-native message conversion and a hooks architecture
Kimi CLI is built around its own kosong abstraction
package which handles provider-specific message conversion at the
low level. The package contains dedicated converters for Anthropic
(with tool_use/tool_use_id), Google
GenAI (with function_call parts and
thought_signature for thinking tokens), and OpenAI
Responses API (with function_call/function_call_output
items).
One concrete example of the care here: the Google GenAI converter
handles the fact that Gemini rejects an id field in
function_call or function_response parts
— there are API snapshot tests specifically for this case
(test_google_genai_no_id_in_function_call_or_response).
The hooks system (src/kimi_cli/hooks/events.py) is
another distinctive feature: three events are defined for every
tool call —
pre_tool_use, post_tool_use, and
post_tool_use_failure. This lets external code
intercept tool calls before they run, observe results after they
run, and handle failures separately. The integration and E2E test
suites confirm this system is well-tested.
The ACP bridge layer converts internal tool results into
protocol-transportable content. The docs honestly list current
gaps — missing session/set_mode and
session/set_model
— rather than implying perfect coverage.
Key files
packages/kosong/src/kosong/contrib/chat_provider/anthropic.py,
packages/kosong/src/kosong/contrib/chat_provider/google_genai.py, src/kimi_cli/hooks/events.py,
packages/kosong/tests/api_snapshot_tests/
Qwen Code — Five-layer model config resolution
Qwen Code's model configuration system is the most rigorous in
this set. The ModelConfigResolver in
packages/core/src/models/modelConfigResolver.ts
defines five named source layers with explicit precedence:
- modelProvider — explicit selection from ModelProviders config (highest priority)
-
CLI arguments —
--model,--openaiApiKey, etc. -
Environment variables —
OPENAI_API_KEY,OPENAI_MODEL - Settings — user/workspace settings file
- Defaults — built-in default values (lowest priority)
Each layer is a typed ConfigLayer object. The
resolver uses named source types (cliSource,
settingsSource, modelProvidersSource,
envLayer, defaultSource,
computedSource) so you can always trace which layer a
resolved value came from. Qwen Code also has a special
QWEN_OAUTH_ALLOWED_MODELS list that gates the OAuth
auth path to specific models. The system prompt can be overridden
by setting QWEN_SYSTEM_MD to a file path (e.g.
.qwen/system.md), set to 0/false
to disable, or to 1/true
for the default path.
The turn.ts event model defines
16 named event types in the
GeminiEventType enum: Content,
ToolCallRequest, ToolCallResponse,
ToolCallConfirmation, UserCancelled,
Error, ChatCompressed,
Thought, MaxSessionTurns,
SessionTokenLimitExceeded, Finished,
LoopDetected, Citation,
Retry, HookSystemMessage,
ToolCallConfirmation. The
LoopDetected and ChatCompressed events
are first-class system conditions, not error states.
The truncation recovery is equally detailed: two constants —
TRUNCATION_PARAM_GUIDANCE and
TRUNCATION_EDIT_REJECTION — handle the case where the
model's output is cut off mid-tool-call. The scheduler imports
diff and fast-levenshtein to verify
proposed file edits aren't corrupted by truncation.
Key files
packages/core/src/models/modelConfigResolver.ts
(5-layer resolution),
packages/core/src/core/turn.ts (16
GeminiEventType values),
packages/core/src/core/coreToolScheduler.ts
(truncation recovery),
packages/core/src/models/modelRegistry.ts,
packages/core/src/mcp/
OpenHands — Platform architecture with an ingenious retry strategy
OpenHands is the hardest repo to score fairly because the local snapshot is explicitly described as incomplete. The modern V1 agent core moved to a separate Software Agent SDK repository. But what remains is still architecturally interesting.
The retry logic (openhands/llm/retry_mixin.py) uses
the
tenacity library with a documented, intentional
quirk: on LLMNoResponseError
when temperature is 0, it automatically bumps temperature
to 1.0 on the next attempt. The rationale: a
deterministic model that returns nothing is stuck in a degenerate
fixed point. Adding randomness breaks the loop. This is one of the
more thoughtful retry strategies in the set.
fn_call_converter.py — marked
LEGACY V0, removal April 1, 2026 — converts
between JSON function-calling and XML for models that don't
support native tool calls. The XML format uses
<function=name><parameter=key>value</parameter></function>
and uses </function as a stream-stop word. The
refine_prompt() function replaces
'bash' with 'powershell' automatically
on Windows.
Most striking: OpenHands defines a
CondensationRequestTool — the
agent itself can request history condensation,
not just the runtime. All tool calls carry a
security_risk attribute validated against a
RISK_LEVELS dict.
The system prompt uses
9 Jinja2 .j2 templates with named
XML sections: <ROLE>,
<EFFICIENCY>, <SECURITY>,
<SECURITY_RISK_ASSESSMENT> (a separate included
template), <PULL_REQUESTS>,
<PROBLEM_SOLVING_WORKFLOW>, and more. The
long-horizon variant adds <TASK_MANAGEMENT> and
<TASK_TRACKING_PERSISTENCE> sections.
Claude Code — Undercover mode, a 10-state machine, and 15-file shell engineering
Claude Code's BashTool is not one file — it is a directory with 15 specialized modules. This is the most specific signal of how Claude Code treats coding-agent behavior as its own software domain:
BashTool.tsx— main tool definition-
bashSecurity.ts— Zsh-specific attack detection bashPermissions.ts— permission gate logic-
commandSemantics.ts— semantic classification of commands -
destructiveCommandWarning.ts— explicit user warnings -
sedEditParser.ts+sedValidation.ts— sed-style inline edits modeValidation.ts— mode checks per commandpathValidation.ts— path safety checks-
readOnlyValidation.ts— read-only mode enforcement -
shouldUseSandbox.ts— sandbox routing decision -
bashCommandHelpers.ts,commentLabel.ts,utils.ts
The bashSecurity.ts file alone covers Zsh-specific
attack vectors that no other agent in this set defends against
explicitly. ZSH_DANGEROUS_COMMANDS is a
Set for O(1) lookup. Blocks include:
zmodload, emulate,
sysopen/sysread/syswrite, zpty,
zsocket, all zf_* filesystem primitives,
process substitution
<(/>(/=(,
heredoc-in-substitution $\(.*<<, and even
PowerShell comment syntax <#
as "defense in depth against future changes." Tree-sitter parses
the shell AST to detect these reliably.
The query engine runs as a named state machine
(src/query/transitions.ts) with
10 terminal exit reasons and
8 continue reasons. Terminal:
'completed', 'blocking_limit',
'image_error', 'model_error',
'aborted_streaming', 'aborted_tools',
'prompt_too_long',
'stop_hook_prevented', 'hook_stopped',
'max_turns'. Continue: 'tool_use',
'reactive_compact_retry',
'max_output_tokens_recovery',
'max_output_tokens_escalate',
'collapse_drain_retry',
'stop_hook_blocking',
'token_budget_continuation',
'queued_command'.
Claude Code contains an "undercover mode" (isUndercover()
in src/tools/BashTool/prompt.ts) that activates when
process.env.USER_TYPE === 'ant'. Purpose: prevent the
model from volunteering internal Anthropic codenames in commit
messages. The code comments note: "Defense-in-depth: undercover
instructions must survive even if the user has disabled git
instructions entirely." It is built with Bun's
import { feature } from 'bun:bundle'.
Token budget tracking (src/query/tokenBudget.ts) uses
COMPLETION_THRESHOLD = 0.9 to nudge at 90% budget and
DIMINISHING_THRESHOLD = 500 — if the per-check token
delta drops below 500 for three consecutive checks, the agent is
considered done. Sub-agents skip budget tracking entirely when
agentId is present.
Key files
src/tools/BashTool/bashSecurity.ts (Zsh attack
catalog),
src/tools/BashTool/prompt.ts (undercover mode),
src/query/transitions.ts (10+8 state machine),
src/query/tokenBudget.ts
(COMPLETION_THRESHOLD=0.9),
src/tools/TeamCreateTool/,
src/QueryEngine.ts
Open Claude Code — Async generator clone with nightly verification
Open Claude Code 2.0 is a clean-room rebuild of Claude Code
v2.1.91 via "ruDevolution" — AI-powered decompilation of the
published npm package. The
archive/ contains the decompiled 7.3MB CLI;
v2/ is the clean-room reimplementation: 61 files,
8,314 lines, 1,581 tests.
Its
async generator agent loop
(v2/src/core/agent-loop.mjs) yields 13 event types
and recursively calls itself after tool execution (yield* run(null, { continuation: true })). The loop handles streaming, token tracking, auto-compaction at
80% threshold, and 7 hook events: PreToolUse, PostToolUse,
PreToolUseFailure, PostToolUseFailure, Notification, Stop,
SessionStart. Exit code 2 blocks; exit code 49 halts the turn.
5 providers: Anthropic (primary), OpenAI, Google,
AWS Bedrock, Google Vertex. Request/response transforms normalize
across API shapes. 25 tools with
validateInput/call interface. File checkpointing via
checkpoints.mjs before dangerous ops. Git worktree
isolation via EnterWorktree/ExitWorktree for parallel
agent tasks — unique among these agents. Session export/import for
"teleport" between machines.
Nightly release pipeline: Automated CI/CD detects
new Claude Code npm releases (03:00 UTC), runs 903+ tests, npm
audit, and Claude Sonnet 4.6 AI-powered change analysis. Only
publishes if ALL gates pass. The rudevolution
submodule tracks 34,759+ function declarations with 95.7% naming
accuracy.
Key files
v2/src/core/agent-loop.mjs (462 lines, async
generator), v2/src/tools/bash.mjs (148 lines,
sandboxed shell), v2/src/tools/agent.mjs (127
lines, worktree support),
v2/src/core/providers.mjs (5-provider
multi-client), v2/src/hooks/engine.mjs (7 hook
types), v2/src/ui/commands.mjs (40 slash
commands),
archive/open_claude_code/cli.mjs (7.3MB
decompiled)
DeerFlow — LangGraph harness with 14-layer middleware and SSE streaming
DeerFlow is the only agent in this set built on LangGraph, and it
shows. The backend is a FastAPI application
(backend/app/gateway/app.py) that initializes a
LangGraph runtime on startup: checkpointer, store, StreamBridge,
and RunManager all come up as async components in the application
lifespan handler.
Every agent turn passes through a
14-layer middleware stack in
agents/middlewares/:
LoopDetectionMiddleware,
TokenUsageMiddleware, MemoryMiddleware,
TodoMiddleware, TitleMiddleware,
ClarificationMiddleware,
SubagentLimitMiddleware,
ViewImageMiddleware,
SandboxAuditMiddleware,
DeferredToolFilterMiddleware,
DanglingToolCallMiddleware,
ToolErrorHandlingMiddleware,
UploadsMiddleware. No other repo in this set has a
composable middleware architecture; most handle these concerns
inline or not at all.
DeerFlow's loop detection
(agents/middlewares/loop_detection_middleware.py) has
a noteworthy special case: calls to read_file have
their line numbers bucketed into 200-line groups before
hashing, to avoid false positives from paginated reads. On warn (3
repeats), it injects a
HumanMessage("you are repeating yourself — wrap up").
On hard limit (5 repeats), it
strips tool_calls entirely
from the response, forcing a plain-text answer.
Skill self-evolution is explicit in the system prompt: triggers
include "5+ tool calls used," "user corrected approach," and
"non-obvious errors encountered." The prompt warns hard:
"HARD ERROR. The system WILL discard excess [sub-agent] calls
and you WILL lose work."
Skills cache pre-loads in a daemon background thread
(warm_enabled_skills_cache(timeout=5.0)) at startup.
Key files
backend/app/gateway/app.py (FastAPI + LangGraph
init),
agents/middlewares/loop_detection_middleware.py
(200-line bucket, tool_calls strip),
agents/middlewares/ (13-middleware stack),
backend/app/gateway/routers/runs.py (SSE
streaming),
backend/langgraph.json
Hermes — Self-improving agent with multi-platform reach and RL infrastructure
Hermes by Nous Research is unique in this study: it is the only agent with a closed learning loop, multi-platform messaging support, and RL training infrastructure in the same codebase.
The skill system stores reusable procedures as
SKILL.md files in ~/.hermes/skills/.
Skills are injected as user messages (not system prompt)
to preserve the prompt cache. The agent can create, edit, and
delete skills via skill_manager_tool.py, and every
agent-created skill is security-scanned before saving.
Two memory files persist knowledge across sessions:
MEMORY.md (agent notes) and
USER.md (user model). Both are scanned for prompt
injection before loading. The context compressor
(agent/context_compressor.py) uses a five-step
algorithm with structured summaries (Goal, Progress, Decisions,
Files, Next Steps) and iterative update on repeated compression.
The tools/mixture_of_agents_tool.py implements MoA:
four reference models (Claude Opus, Gemini Pro, GPT-5, DeepSeek)
run in parallel, and an aggregator model synthesizes the results.
This is an optional tool the agent can invoke for hard problems —
unique in this set.
The gateway covers 14+ messaging platforms: Telegram, Discord, Slack, WhatsApp, Signal, WeChat/WeCom, Matrix, Mattermost, Feishu/Lark, DingTalk, Email, SMS, HomeAssistant, and a generic Webhook adapter. Each platform module is a full adapter with auth, inbound message handling, allowlists, dedup, and ACK logic.
See the dedicated Hermes page for full coverage.
Which code looks best designed?
1. Claude Code
Best end-to-end product coherence. Every tool is a directory, not a file. Permissions, UX, security, and task management are all native product concerns.
2. Crush
Best structural cleanliness. Binary-embedded templates, SHA-256 loop detection, and the Go type system keep the design honest and maintainable.
3. Qwen Code
A very solid multi-provider CLI core. The five-layer config resolver and clean tool/MCP separation are the strongest generic engineering here.
4. Mux
Strongest large-surface product architecture among the provider-rich repos. Ambitious, broad, and still impressively organized.
5. Hermes
Most functionally unique. The learning loop, multi-platform, and RL infrastructure are unlike anything else here. The tradeoff is focus.
OpenCode — The 4,531-file client-server runtime
OpenCode (opencode, v1.14.40, MIT)
is one of the clearest "runtime-first" agent repos in this set.
Instead of treating the terminal UI as the product, it builds a
shared backend that powers a
SolidJS + opentui TUI, a
web console, a desktop shell, an
SDK, and optional server mode.
The tool layer mirrors that broader architecture. OpenCode ships
both a heavyweight apply_patch tool
and a smaller exact-match edit tool.
The patch path handles add/update/delete/move operations,
BOM-aware writes, formatting hooks, and permission requests with
diff metadata; the edit path keeps fast exact replacement for
smaller mutations.
Permissions are modeled explicitly: allow,
deny, and ask rules can target
wildcarded paths or commands, and replies can be
once, always, or reject.
That is a stronger approval surface than many TypeScript peers,
but still not an OS-level sandbox.
OpenCode also treats collaboration as architecture, not marketing. The README exposes build, plan, and internal @general flows, while the ACP layer handles session initialization, permission requests, usage updates, prompt submission, and session forking.
Provider support is broad and intentionally vendor-neutral. The core package depends on Anthropic, OpenAI, Azure, Google, Vertex, Bedrock, Mistral, Cohere, Groq, Cerebras, OpenRouter, TogetherAI, Alibaba, and more through the AI SDK ecosystem. System prompts are chosen by model family rather than forcing one universal persona on every backend.
Key files
packages/opencode/src/tool/apply_patch.ts (patch
parser + writes),
packages/opencode/src/permission/index.ts
(allow/deny/ask rules),
packages/opencode/src/acp/agent.ts (session
lifecycle + permissions),
packages/opencode/src/mcp/index.ts
(stdio/SSE/Streamable HTTP MCP),
packages/opencode/src/session/system.ts
(provider-specific prompt selection)
Pi Mono — The refreshed extension-first kernel
Pi Mono (@mariozechner/pi-coding-agent,
v0.73.0, MIT, by Mario Zechner) is a TypeScript
coding agent that deliberately omits features other agents bake
in: no MCP, no sub-agents, no permissions, no plan mode, no todos.
Instead it ships a
fresh 874-file repo snapshot with 7 core tools,
26-provider support, tree-structured JSONL v3 sessions, and a
differential-rendering TUI.
The current monorepo layout is tighter than older versions of this
blog suggested: packages/ now contains
agent, ai, coding-agent,
tui, and web-ui. Older references to
mom and pods
no longer match the fresh repo snapshot.
The 7 tools (read, bash, edit, write, grep, find, ls) use TypeBox JSON schemas with AJV validation. The edit tool supports multiple disjoint edits per call with fuzzy matching (normalizes Unicode quotes, dashes, spaces), uniqueness validation, reverse-order application, and line ending preservation. A file mutation queue serializes concurrent writes to the same file — a subtle race condition that plagues many other agents.
The bash tool uses a
pluggable BashOperations interface
with streaming output, process tree killing, 10MB stdout/stderr
caps, and temp file fallback for overflow. Extensions can
intercept via
BashSpawnHook.
Sessions are tree-structured JSONL (v3) with
id/parentId fields enabling in-place
branching. Entry types include message,
thinking_level_change, model_change,
compaction (with firstKeptEntryId),
branch_summary, and label (bookmarks).
The /tree command navigates the tree;
/fork
creates a new session file from a branch point.
The agent runtime supports
parallel tool execution
(default, configurable to sequential),
steering messages (delivered after current turn —
real-time interruption),
followup messages (delivered after agent stops),
and a PendingMessageQueue with two delivery
modes. The event system exposes 20+ lifecycle events including
beforeToolCall/afterToolCall hooks.
26 providers across 10 API implementations now
span subscriptions plus API-key providers: Anthropic, OpenAI,
Azure OpenAI, DeepSeek, Google Gemini, Google Vertex, Amazon
Bedrock, Mistral, Groq, Cerebras, Cloudflare AI Gateway,
Cloudflare Workers AI, xAI, OpenRouter, Vercel AI Gateway, ZAI,
OpenCode Zen, OpenCode Go, Hugging Face, Fireworks, Kimi For
Coding, MiniMax, and Xiaomi MiMo variants. The
ModelRegistry supports glob patterns, thinking level
suffixes (model:high), alias preference over dated
versions, and ambiguity rejection.
The extension system uses jiti (TypeScript runtime executor) with virtual module support for compiled Bun binaries. Extensions can register tools, commands, shortcuts, flags, message renderers, custom providers, and subscribe to 20+ events. Pi Packages (installable via npm or git) auto-discover extensions, skills, prompts, and themes from their directory structure.
The TUI uses differential rendering — only changed terminal cells are redrawn at 60fps throttle (16ms min). It supports Kitty graphics protocol, hardware cursor via APC escape sequences, and overlay stack with focus management.
The web UI (Lit components) includes a full artifact system (HtmlArtifact, ImageArtifact, MarkdownArtifact, etc.), sandboxed iframes with runtime providers, a JavaScript REPL, and document extraction (PDF, DOCX, XLSX).
Key files
packages/coding-agent/src/core/tools/*.ts
(TypeBox tool definitions),
packages/coding-agent/src/core/tools/edit-diff.ts
(multi-edit, fuzzy matching),
packages/coding-agent/src/core/tools/file-mutation-queue.ts
(per-file locking),
packages/agent/src/agent-loop.ts (parallel
execution, steering/followup queues),
packages/ai/src/providers/models.generated.ts
(auto-generated model catalog, 26 providers),
packages/tui/src/tui.ts (differential rendering),
packages/coding-agent/src/core/extensions/
(jiti-based extension system),
packages/web-ui/src/tools/artifacts/ (artifact
system, JS REPL)
ADK-Rust — The 34-member framework workspace
ADK-Rust (adk-rust, v0.8.0,
Apache-2.0) is not best understood as one coding CLI. It is a
publishable Rust framework ecosystem built as a
34-member workspace with a small default tier and progressively
broader feature sets for tools, auth, graph workflows, browser
automation, payments, AWP, and sandboxing.
The architectural center of gravity is the crate boundary. Core
contracts live in adk-core and
adk-agent, provider breadth in
adk-model, execution in adk-runner,
tools in adk-tool, and web or protocol surfaces in
crates such as adk-server, adk-awp, and
adk-payments.
The file-editing story is revealing. Anthropic-native editor tools
are locally executed with strict exact-match semantics and 1-based
lines, while OpenAI's apply_patch is exposed as a
built-in declaration that expects the Responses API item protocol
to handle the actual patch semantics. ADK-Rust does not spend its
complexity budget on a giant house editing engine.
Delegation is equally framework-shaped. Instead of branded child
personas, it provides SequentialAgent,
ParallelAgent, LoopAgent, and the
checkpointed GraphAgent, with
SharedState
as the coordination primitive.
Key files
adk-rust/Cargo.toml (feature tiers),
adk-core/src/model.rs (minimal LLM trait),
adk-model/src/lib.rs (provider facade),
adk-tool/src/builtin/anthropic.rs (native editor
executor),
adk-tool/src/builtin/openai.rs (apply_patch
declaration), STABILITY.md
(stable/beta/experimental boundaries)
Oh My Pi — The hashline-heavy power-tool fork
Oh My Pi (@oh-my-pi/pi-coding-agent,
v14.7.8, MIT) is the clearest example in this set
of a fork changing category. It starts from Pi Mono's terminal
agent lineage, but the current repo is a Bun monorepo plus Rust
workspace with 9 top-level packages, a native engine, MCP and
plugin plumbing, swarm-style delegation, and an edit benchmark
package.
The most important architectural shift is the edit surface. In
packages/coding-agent/src/utils/edit-mode.ts, the
default mode is hashline. The central
edit tool fans into replace,
patch, hashline, vim, and
apply_patch modes, while
line-hash.ts and hashline.ts define the
compact line-plus-hash anchor system and its execution rules.
The surrounding platform is what makes that believable instead of
gimmicky. Prompt helpers generate real anchor examples, the README
advertises TTSR rules, Python, LSP, browser tooling, model roles,
and universal config discovery, and the
typescript-edit-benchmark package exists solely to
measure edit variants instead of hand-waving about them.
Delegation is now built in as well. The README's task tool ships bundled agents and isolation backends, and the swarm extension's executor runs child agents as real OMP subprocesses with full tool access against the swarm workspace.
Key files
package.json (workspace/package catalog),
Cargo.toml (Rust workspace versioning),
packages/coding-agent/src/edit/index.ts (mode
router),
packages/coding-agent/src/edit/modes/hashline.ts
(anchor edit executor),
packages/coding-agent/src/edit/line-hash.ts
(bigram anchor format),
packages/swarm-extension/src/swarm/executor.ts
(subprocess delegation), docs/mcp-config.md
(MCP config model)
Codex — OpenAI's production coding agent in Rust
Codex is OpenAI's open-source coding agent, implemented entirely in Rust as a Cargo workspace with 70+ crates across 3,805 files (Apache-2.0 license). It represents the most production-grade agent codebase OpenAI has released — the same runtime that powers their internal product, now available for inspection and contribution.
The sandboxing system is the most platform-specific in this set: macOS uses Seatbelt (sandbox-exec), Linux uses bubblewrap (bwrap), and Windows uses process tokens and job objects. Each platform gets a native sandbox implementation rather than a cross-platform abstraction. The sandbox profiles define fine-grained filesystem, network, and process isolation policies.
MCP is bidirectional — Codex can act as an MCP server (exposing its tools to external clients) and as an MCP client (loading external MCP servers). Multi-agent job execution is built-in: the agent can spawn sub-jobs with their own tool scopes and collect results asynchronously. IDE extensions are first-class: VS Code and JetBrains integrations ship alongside the CLI.
The core business logic lives in codex-rs/core, the
TUI uses Ratatui in codex-rs/tui,
tool schemas are defined in codex-rs/tools, platform
sandboxes are in codex-rs/sandboxing, and the
execution policy rule engine is in
codex-rs/execpolicy.
Key files
codex-rs/core/ (business logic),
codex-rs/tui/ (Ratatui TUI),
codex-rs/tools/ (tool schemas),
codex-rs/sandboxing/ (Seatbelt/bubblewrap/Windows
tokens), codex-rs/execpolicy/ (rule engine)
Wintermolt — The 3 MB everything-agent in Zig
Wintermolt compiles to a
single 3 MB native binary
with zero runtime (no Node.js, no Python, no Electron). It links
only
libcurl and sqlite3, both pre-installed
on macOS and most Linux distributions. It can be cross-compiled to
any Zig target — including ARM boards like Jetson and Raspberry Pi
— with one command:
zig build -Dtarget=aarch64-linux-gnu.
The codebase is
~18,400 lines across 51 Zig source files, plus
two prebuilt static libraries from sibling projects. The agentic
loop (src/agent/loop.zig, 646 lines) runs up to
25 tool iterations per turn with automatic
fallback (primary → Ollama → OpenAI) if the backend fails.
6 AI backends share a unified wire protocol
(src/api/protocol.zig, 281 lines). Three hand-written
streaming parsers handle the different formats: SSE for Anthropic
(461 lines), SSE for OpenAI (287 lines), and NDJSON for Ollama
(250 lines). The DeepSeekClient is effectively a
universal OpenAI-compatible client reused for OpenAI, DeepSeek,
Qwen, Gemini, and custom endpoints.
16+ built-in tools span bash, file I/O, grep, glob, HTTP, web search, camera capture, image processing, Chrome DevTools Protocol browser automation (~895 lines), Pinecone RAG memory search, cron scheduling, Tailscale mesh queries, A2UI canvas rendering, and CLI-Anything harness generation. Tool dispatch uses a 3-layer system: built-in → runtime skills → MCP remote tools.
MCP is bidirectional — server (~214 lines)
exposes 20+ tools via JSON-RPC 2.0 over stdio; client (~421 lines)
loads external MCP servers from config, spawns them as child
processes, runs the 3-step handshake, and prefixes discovered
tools with "servername__".
The
cron scheduler
(src/agent/scheduler.zig, 756 lines) is
SQLite-persisted with three schedule types (every/at/cron) and
auto-disables jobs after max_retries failures.
No other agent in this set has a built-in cron
scheduler.
All sidecars (chat, web, menubar, gateway) follow the same
JSON-lines-over-stdio IPC pattern — Zig spawns a
child process (TypeScript/Swift), communicates via clean
stdin/stdout pipes. The chat bridge uses a 7-tier priority binding
system (src/agent/router.zig, 585 lines).
Key files
src/agent/loop.zig (agentic loop, 646 lines),
src/agent/tools.zig (tool dispatch, 717 lines),
src/agent/scheduler.zig (cron, 756 lines),
src/tools/browser.zig (CDP, 895 lines),
src/web/bridge.zig (WebSocket, 1,062 lines),
src/mcp/client.zig (MCP client, 421 lines),
src/agent/rag.zig (Pinecone RAG, 577 lines)
Zaica — The focused Zig coding specialist
Zaica is a ~9,100-line Zig 0.15 coding assistant
with zero runtime dependencies beyond the standard library. It is
distributed via Homebrew for macOS (aarch64 +
x86_64) and Linux (aarch64 + x86_64). It uses
std.http.Client
directly — no HTTP library needed.
The central abstraction is the Node in
src/node.zig (731 lines) — the generic agentic loop
used by both the REPL and sub-agents. In terminal mode, tools run
in parallel using std.Thread.spawn;
in silent mode (sub-agents, chain mode), they run
sequentially to avoid threads-inside-threads.
Zaica's most distinctive feature is its Wyhash-based loop detection: a ring buffer of tool call signatures detects repeating patterns of length 1, 2, or 3 within a 10-call window with 3-tier escalation (warning → stronger warning → force break). The same tool with different arguments gets a different hash — so legitimate iteration isn't flagged. This is the most sophisticated loop detection in this entire directory set.
Chain mode (src/chain.zig, 528
lines) implements structured workflows via
.chain.md files with per-step tool filtering,
variable substitution ({task},
{previous}), and max iterations.
No other agent in this set offers chain mode.
Instead of imperative state management, zaica uses a custom
reactive state graph via
zefx (Effector-inspired). Events trigger Store
reducers, which trigger watchers in a two-phase flush. The status
bar is a watcher on derived stores —
unique architecture for a coding agent.
The 3-tier permission model is elegantly simple:
[y]es all / [s]afe only /
[n]o, asked once per session. The bash wrapper
redirects stdin from /dev/null, kills process trees
on timeout, and caps output at 1 MB.
The config system uses a
6-layer JSON priority chain (comptime defaults →
provider presets → user config → project config → env vars → CLI
flags) with deep object merging. The REPL
(src/repl.zig, 2,153 lines) implements manual line
editing with full UTF-8 and Cyrillic support.
Key files
src/node.zig (agentic loop, 731 lines),
src/repl.zig (full REPL, 2,153 lines),
src/chain.zig (structured workflows, 528 lines),
src/state.zig (reactive state, 600 lines),
src/tools.zig (7 tools + permissions, 631 lines),
src/client/sse.zig (SSE parser, 408 lines),
lib/zefx/ (reactive engine)
Dirac — Hash-anchored edits, AST precision, and 64.8% cost reduction
Dirac is a TypeScript coding agent — a fork of Cline — that takes
a fundamentally different approach to file editing. Where most
agents use line numbers (which drift when the file changes), Dirac
uses
stable line hashes to anchor edits. When the
model reads a file, each line gets a deterministic hash. Edits are
then specified as
anchor + end_anchor + replacement text rather than
start_line + end_line + text. This means edits
survive file shifts and multiple disjoint edits can be applied in
a single batch with no coordinate conflicts.
The
EditExecutor
(src/core/task/tools/handlers/edit-file/EditExecutor.ts)
resolves anchors by checking that the anchor name starts with a
capital letter, exists in the file's line hash list, and that the
provided content matches the actual file content at that hash. If
any check fails, the edit fails with a diagnostic. The
BatchProcessor applies multiple edits in reverse line
order — highest line index first — so that earlier edits don't
shift the coordinates of later ones.
Beyond hash-anchored text edits, Dirac has AST-aware tools that target specific symbols — classes, functions, interfaces — directly rather than by text position. This means edits are always syntactically valid; JSDoc comments, decorators, and type annotations are preserved automatically. This is the structural equivalent of the hash-anchored system: both prevent the "friction" of coordinate-based editing.
Dirac's token efficiency is a deliberate engineering target backed
by multiple mechanisms: hash-anchored multi-file batching
(multiple files edited in a single LLM roundtrip),
get_file_skeleton for project structure mapping
without reading every line, ContextManager truncation with
half/quarter strategies, concurrent tool calling, and a minimal
PRIME DIRECTIVES system prompt. On 8 real-world refactoring tasks,
Dirac achieved 8/8 correct at an average cost of
$0.18 — versus $0.38–$0.73 for competitors. That
is a 2.8x cost reduction.
The state mutex pattern (via
p-mutex) is used to serialize all state modifications
in the main Task loop. This prevents race conditions between the
concurrent tool executor and the main task loop without
sacrificing the performance benefit of parallel tool calls. Every
state modification — tool results, message updates, task history
writes — goes through withStateLock.
Dirac has an 8-type hook system: onTaskStart,
onTaskComplete, onTaskCancel, onTaskResume, preToolUse,
postToolUse, preCompact, and preRequest. Hooks are discovered via
auto-discovery from AGENTS.md, .claude/,
or .agents/ directories. The hook executor runs
scripts with structured JSON via stdin and supports cancellation
via AbortController. Hooks can return a
cancel decision, a
contextModification to alter behavior, or an
errorMessage.
The git checkpoint system creates commits before
risky operations, enabling revert.
Plan/Act mode separation is first-class — Plan
mode gathers information and presents a strategy before asking for
user approval to switch to Act mode.
YOLO mode (dirac -y) runs fully
autonomously with auto-approval.
Shell command validation uses
DIRAC_COMMAND_PERMISSIONS — a JSON object with
allow/deny glob patterns — more flexible than a ban list. The
subagent system (SubagentToolHandler) spawns isolated
children with their own configuration. Skills are
auto-discovered from AGENTS.md,
.claude/, and .agents/. Provider support
covers 40+ APIs
including Anthropic, OpenAI, Google, AWS Bedrock, Azure, and many
OpenAI-compatible gateways.
The most notable gap: Dirac does not implement MCP support. This is an explicit design decision, not an oversight. For users who need MCP integration, this is a significant gap. For users who want a tightly integrated, self-contained agent with the best structural edit accuracy in the TypeScript agent space, this is a feature.
Key files
src/core/task/index.ts (1,868 lines — agent loop,
state mutex, tool orchestration),
src/core/task/tools/handlers/edit-file/EditExecutor.ts
(hash-anchor resolution),
src/core/task/tools/handlers/edit-file/BatchProcessor.ts
(reverse-order batch processor),
src/core/context/context-management/ContextManager.ts
(half/quarter truncation),
src/core/hooks/hook-executor.ts (8-type hook
system with streaming and cancellation),
src/core/prompts/system-prompt/template.ts (PRIME
DIRECTIVES), src/core/api/retry.ts (exponential
backoff 2s, 4s, 8s), cli/man/dirac.1.md (full CLI
reference)
Important caveat
"Best designed" here means best aligned between code and product intent, not "best for every user." Kimi is more protocol-focused, OpenHands is more platform-shaped, DeerFlow is more compositional, and Neovate is more security-conscious about shell execution. Different goals produce different tradeoffs.