The Agent Harness Field Guide
This blog audits the code that is actually present in
coding-agents\: how each agent wires models, exposes
tools, runs shell commands, speaks MCP or ACP, and where Claude Code
and Hermes feel fundamentally different from the rest of the field —
plus Oh My Pi's hashline-heavy Bun + Rust fork of Pi Mono, OpenCode's
client-server runtime, Reasonix's exact-match edit gate, ADK-Rust's
feature-gated Rust framework stack, CheetahClaws's Python-native
daemon-plus-kernel architecture, and OpenAI's own Codex CLI, a
Rust-native agent with platform-specific sandboxes and bidirectional
MCP support.
What this deep dive covers
The interesting part of coding agents is not the marketing layer. It is the runtime beneath it: whether the tool layer is generic or bespoke, whether shell access is tightly guarded or casually wrapped, whether model support is truly abstracted or just superficially multiplexed, and whether the repo reads like a productized operating environment or a fast-moving integration shell.
I read the local repositories for Pochi, Neovate Code, Mux, Crush, Kimi CLI, Qwen Code, OpenHands, Claude Code, DeerFlow, Hermes Agent (by Nous Research), Pi Mono, Oh My Pi, OpenCode, Codex CLI (by OpenAI), Wintermolt, Zaica, Goose, Dirac, Reasonix, CodeWhale, CheetahClaws, and Open Claude Code 2.0 (a clean-room implementation via AI decompilation), then mapped them against the same questions on every page of this site.
Methodology note
This is a local-only repo study. I did not use outside documentation beyond what is already checked into the worktree. That matters most for OpenHands, where the repo itself says the newer V1 agent core now lives elsewhere.
Repos audited
- Pochi - TypeScript monorepo with vendor-specific model adapters and built-in agents.
- Neovate Code - TypeScript CLI with AI SDK providers, MCP, and hardened bash tooling.
- Mux - large TypeScript desktop/browser agent platform with workspaces and provider routing.
- Crush - Go-based terminal product with custom tools, permissions, and provider metadata plumbing.
- Kimi CLI - Python terminal agent focused on Kimi platform flows plus ACP and MCP bridges.
- Qwen Code - Gemini CLI descendant with strong config resolution, declarative tools, and MCP lifecycle management.
- OpenHands - platform/runtime repo with sandbox and legacy CodeAct agent pieces still present locally.
- Claude Code - deeply integrated Bun + React/Ink runtime centered on Anthropic models and a huge internal tool surface.
- DeerFlow - LangGraph-based super-agent harness with middleware, subagents, SSE streaming, and config-driven model factories.
- Hermes Agent (Nous Research) - Python agent with persistent skill learning, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, 6 execution backends, and RL training infrastructure.
- Codex CLI (OpenAI) - Rust-native coding agent (3,805 files, 70+ crates) with macOS Seatbelt, Linux bubblewrap/Landlock, Windows sandbox, MCP client and server, multi-agent spawning, OpenAI Responses API plumbing, configurable TOML config, IDE extensions, and a Ratatui TUI.
- Pi Mono - TypeScript minimalist harness (874 files) with tree-structured JSONL v3 sessions, differential TUI rendering, Pi Packages (shareable bundles via npm/git), 26 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT license, author Mario Zechner.
- Oh My Pi - Bun + TypeScript + Rust fork of Pi Mono (v14.7.8, MIT) with default hashline editing, a Rust native engine, MCP and plugin discovery, LSP, Python, browser tooling, task/subagent execution, and a dedicated edit-benchmark package.
-
OpenCode - TypeScript/Bun client-server coding
runtime (4,531 files) with a SolidJS opentui terminal UI, web and
desktop clients,
apply_patchplus exact-string editing, wildcard permissions, worktrees, MCP, ACP, skills, and an SDK. - ADK-Rust - Rust framework workspace (v0.8.0, Apache-2.0, 34 workspace members) with feature-tiered crates, a minimal LLM trait, provider-native Anthropic/OpenAI tool wrappers, graph workflows, MCP integration, A2A APIs, AWP deployment support, and optional browser/realtime/sandbox modules.
- Wintermolt - Zig 0.15 native binary (3 MB, zero runtime) with 6 AI backends, 16 tools, cron scheduling, Tailscale mesh, camera vision, browser automation, MCP bidirectional, chat bridges to 4 platforms, and a macOS menu bar app.
- Zaica - Zig 0.15 focused coding agent (~9,100 lines, zero runtime) with multi-provider LLM support, chain-mode structured workflows, parallel sub-agent dispatch, reactive state management (zefx), and Wyhash-based loop detection.
- Goose (AAIF/Linux Foundation) - Rust-native AI agent (v1.32.0, Apache-2.0) with 15+ providers, 5-layer security inspector stack, LLM-based AdversaryInspector, 4 GooseModes (Auto/Approve/SmartApprove/Chat), extension system, recipe framework, and MOIM injection.
- Open Claude Code 2.0 - Clean-room implementation of Claude Code via AI-powered decompilation (1,581 tests, async generator architecture, multi-agent teams, git worktree isolation, 7-type hook system).
- Dirac - TypeScript fork of Cline with hash-anchored parallel edits, AST-native precision, multi-file batching, 64.8% cost reduction, no MCP, hook system, git checkpoints, state mutex, and 40+ provider support. 8/8 on TerminalBench 2.0 evals.
The landscape in one screen
Bespoke runtime products
Claude Code and Crush feel like full terminal operating environments, not thin wrappers. Their tool, permission, and UX layers are part of the product, not just adapters around a chat loop.
Most opinionatedProvider multiplexers
Mux, Neovate, and Qwen Code all build serious provider catalogs and shared abstractions. They want broad model reach more than a single model-native identity.
Most configurableProtocol and adapter layers
Pochi and Kimi CLI stand out for their ecosystem bridges. Pochi ships vendor-specific packages for Codex, Qwen, Copilot, and others. Kimi invests heavily in ACP and in translating internal tool output into protocol-friendly shapes.
Most bridge-heavyClient-server coding runtimes
OpenCode feels less like a single CLI and more like a shared runtime: one backend powering TUI, web, desktop, SDK, MCP, ACP, skills, worktrees, and a permission bus. It is the clearest open-source example here of a terminal agent growing into a platform.
Most runtime-shapedFrameworks and execution harnesses
DeerFlow, OpenHands, and ADK-Rust are less about one CLI persona and more about reusable orchestration environments. DeerFlow and OpenHands lean into services and sandboxes. ADK-Rust pushes furthest toward a publishable crate ecosystem with feature tiers, graph workflows, provider-native tools, A2A, AWP, and payment adapters.
Most framework-shapedThe Pi lineage: kernel to power-tool fork
Pi Mono is still the sharp minimalist kernel: tree-structured JSONL sessions, differential TUI rendering, Pi Packages, and a tight extension-first philosophy. Oh My Pi takes that lineage in the opposite direction with Bun workspaces, Rust native modules, default hashline edits, MCP, plugins, browser tooling, a task tool, and an edit benchmark lab.
Most visible fork splitSelf-improving multi-platform agents
Hermes Agent (Nous Research) is in a category of its own: a persistent skill-learning loop, six remote execution backends, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, and RL training infrastructure.
Most functionally uniqueZero-runtime native binaries
Wintermolt and Zaica are both written entirely in Zig 0.15 — no Node.js, no Python, no garbage collector. They compile to single native binaries (Wintermolt is 3 MB) with cross-compilation to any Zig target including ARM boards. Wintermolt goes wide (7 modes, cron, Tailscale, camera, browser, MCP, chat bridges). Zaica goes deep (chain-mode workflows, reactive state, Wyhash loop detection).
Most portableOpenAI's own coding agent
Codex CLI is OpenAI's production coding agent — a Rust workspace of 70+ crates with platform-specific sandboxes (Seatbelt, bubblewrap/Landlock, Windows restricted tokens), bidirectional MCP (client and server), multi-agent job execution, and a configurable provider system that supports Ollama and LM Studio.
Most sandboxedExtension-first security advocates
Goose (AAIF at Linux Foundation) takes a unique
approach to security: it uses an LLM-based
AdversaryInspector that fires a second LLM call to
review tool calls against user-defined rules from
~/.config/goose/adversary.md. This is
defense-in-depth for multi-agent setups where parent agents
delegate to sub-agents. With 15+ providers, 4 GooseModes
(Auto/Approve/SmartApprove/Chat), and a recipe framework, Goose is
Rust-native with extensive feature gates (local-inference,
aws-providers, otel).
Fast takeaways
| Question | Best answer from this snapshot | Why |
|---|---|---|
| Which repo feels most different? | Claude Code | It is the least generic and the most integrated: Anthropic-first, huge tool catalog, plan/worktree/team flows, React terminal UI, permission system, and a massive central query runtime. |
| Which repos are most model-agnostic? | Mux, Neovate, Qwen Code, Goose | All invest in provider registries, routing layers, and shared config resolution instead of pinning themselves to one native model family. Goose ships 35+ provider modules across direct APIs, ACP bridges, and declarative JSON configs. |
| Which repo adapts multiple ecosystems most explicitly? | Pochi | It does not stop at a generic provider interface; it ships vendor-specific packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and more. |
| Which repo is most reusable as a framework? | ADK-Rust | It is organized as a 34-member Rust workspace with a minimal default tier, publishable crates for agents/models/tools/server, and explicit add-on layers for graph workflows, payments, AWP, browser automation, realtime, and sandboxing. |
| Which repo treats edit reliability like an engineering lab? | Oh My Pi |
Default hashline mode, prompt helpers for real anchors,
separator tuning, compact previews, a dedicated
typescript-edit-benchmark package, and benchmark
scripts for the hashline variant all point to a repo that is
explicitly iterating on mechanical edit failure modes.
|
| Which shell tooling is most safety-conscious? | Neovate Code, Claude Code, and Goose | Neovate hard-codes command bans and high-risk detection (22-item banned list, quote-aware pipeline parser), while Claude layers permissions, tree-sitter AST analysis, and Zsh-specific attack detection over a richer command surface. Hermes uses supply-chain verification (cosign provenance) for its execution environment. Goose uniquely uses an LLM-based AdversaryInspector that fires a second LLM call to review tool calls against user-defined rules. |
| Which repo has the most unique capabilities? | Hermes Agent | Self-improving skill loop, 6 remote backends, multi-platform IM gateways, MoA synthesis across 4 frontier models, and RL training infrastructure — none of which appear anywhere else in this set. |
| Which repo is hardest to judge from local code alone? | OpenHands | The local repo still contains useful architecture, but its own docs say the newer V1 agent core moved to a separate Software Agent SDK repository. |
| Which code feels most polished? | Claude Code, Crush, Mux, Qwen Code | These four snapshots show the clearest internal consistency between product goals, tool design, configuration, and error handling. Crush is notable for being the only agent with native LSP diagnostics and Sourcegraph code search as first-class tools. |
| Which repo is the most extension-friendly? | Pi Mono | A tight minimalist harness that deliberately ships without MCP, permissions, or sub-agents — expecting you to compose them via extensions. Pi Packages let you bundle and share configurations across projects via npm or git. |
| Which repo feels most like a platform runtime? | OpenCode | One backend drives the TUI, browser console, desktop shell, MCP, ACP, skills, worktrees, and SDK. It reads more like a small agent platform than a one-window CLI. |
| Which repo has the most platform-specific sandboxing? | Codex CLI | Three separate sandbox implementations — macOS Seatbelt, Linux bubblewrap/Landlock, and Windows restricted tokens — each with split-filesystem awareness and carveout support. Also the only agent in this set that doubles as an MCP server for other agents. |
| Which repo handles web-grounded research best? | DeerFlow overall; Crush if you want no extra search API bill | DeerFlow combines free default search/fetch with the most explicit deep-research methods in the current snapshot. Crush is the strongest product-style coding agent that pairs free DuckDuckGo search with a delegated multi-step fetch workflow. See Web Research for the full comparison, including Wintermolt, Claude Code, Codex, and Reasonix. |
Approximate codebase size by file count
File count is not the same thing as quality, but it does reveal where the implementation surface is broadest.
ADK-Rust is intentionally omitted here: its 34-member framework workspace and extracted companion repos make raw file-count comparison less useful than with single-product CLIs.
OpenCode
4531 filesCodex CLI
3805 filesOpenHands
2774 filesMux
2226 filesClaude Code
2137 filesQwen Code
2038 filesPochi
1315 filesKimi CLI
899 filesDeerFlow
810 filesCrush
799 filesNeovate
582 filesHermes
~450 filesPi Mono
874 filesWintermolt
51 Zig files (~18,400 lines)Zaica
~13 files (~9,100 lines)Open Claude Code 2.0
61 files (~8,300 lines)Goose
Rust Cargo workspace (~6+ crates)Dirac
TypeScript monorepo (fork of Cline)My high-level verdict
Best designed, if you value a coherent product runtime
Claude Code is the standout. It is not the most provider-flexible repo, but it is the clearest example of an agent built as its own operating model: tool schemas, permissioning, commands, tasking, worktrees, UI, feature flags, and retry logic all sit inside one deliberate runtime.
Best designed, if you value clean systems engineering
Crush is the nicest surprise. The Go codebase feels disciplined, modular, and product-minded without being bloated. Its provider plumbing, permissions, and TUI organization are easier to reason about than many faster-moving TypeScript peers.
Best multi-model architecture
Mux and Qwen Code lead here. Mux has a broad provider routing layer with desktop app ambitions, while Qwen Code has a particularly strong configuration and runtime model-resolution story.
Most extensible framework shape
DeerFlow wins on composability. It feels more like a harness for building agent systems than a single agent persona, which makes it powerful but also less opinionated than Claude Code or Crush.
Most publishable framework ecosystem
ADK-Rust. The repo is built as a crate ecosystem, not just a runnable app: minimal-by-default packaging, typed tools, workflow agents, graph orchestration, A2A/AWP surfaces, and an honest stability file that separates mature crates from frontier modules.
Most functionally unique
Hermes Agent by Nous Research. The self-improving skill loop, 14+ messaging platform gateways, MoA tool (4 frontier models in parallel), and RL training infrastructure are not features in any other repo here. It is the only agent that explicitly tries to get better at your tasks over time.
Most portable — zero runtime, one binary
Wintermolt and Zaica are the only agents here that compile to a single native binary with zero runtime dependency. Wintermolt (3 MB, 18,400 lines) is the most ambitious agent in any language. Zaica (~9,100 lines) is the most focused coding specialist with chain-mode workflows and best-in-class loop detection.
Most extension-friendly kernel
Pi Mono by Mario Zechner. A tight 874-file TypeScript kernel with tree-structured JSONL v3 sessions, differential TUI rendering, 26 providers across 10 APIs, Pi Packages (shareable bundles via npm/git), parallel tool execution, a file mutation queue, and 4 run modes. MIT licensed and deliberately minimal so you can build MCP, permissions, or sub-agents yourself.
Most serious edit-lab fork
Oh My Pi. It starts from Pi Mono's terminal-agent kernel and then turns editing into a research surface: default hashline mode, prompt/runtime anchor helpers, a native engine, MCP and plugin plumbing, built-in task execution, and benchmark infrastructure dedicated to edit variants.
Most complete open-source runtime
OpenCode. The repo combines a terminal UI, browser console, desktop shell, SDK, MCP, ACP, worktrees, skills, and a permission bus behind one backend runtime. It is the clearest open-source "agent platform" in this snapshot.
Most security-conscious sandboxing
Codex CLI by OpenAI. Three platform-specific
sandbox implementations (macOS Seatbelt, Linux
bubblewrap/Landlock, Windows restricted tokens), split-filesystem
awareness, an execution policy engine with a rule DSL,
bidirectional MCP (client and server), and a strict clippy lint
policy that bans unwrap_used and
expect_used across 70+ crates.
How to read the rest of this site
Compare tool schemas, shell execution, MCP support, and recovery patterns.
Read per-repo profiles, strengths, weaknesses, and fit.
See who is genuinely provider-neutral and who writes model-specific logic.
The dedicated page on why Claude Code feels like a category of its own.
The LangGraph-based super agent harness with 14-layer middleware, skill evolution, and sub-agent orchestration.
The platform-shaped agent with Docker sandboxing and ingenious temperature-bumping retry logic.
Deep dive into shell injection defense, prompt injection scanning, permissions, sandboxing, and loop detection.
MCP and ACP implementation compared — transports, OAuth, lifecycle, and deferred tool loading.
How agents delegate work, isolate children, enforce concurrency limits, and collect results.
The completely separate deep dive on the most unusual agent in the set — self-improving, multi-platform, and RL-augmented.
The minimalist kernel — 874 files, tree-structured JSONL v3 sessions, differential TUI, Pi Packages, 26 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT licensed.
The client-server runtime — SolidJS terminal UI, web and desktop
clients,
apply_patch, wildcard permissions, ACP, MCP,
worktrees, and skills.
The Rust framework workspace: 34 member crates, feature tiers, provider-native tool wrappers, graph workflows, A2A, AWP, and optional sandbox/browser/realtime modules.
OpenAI's production agent: 3,805 files, 70+ Rust crates, three platform-specific sandboxes, bidirectional MCP, multi-agent jobs, and IDE extensions.
The 3 MB everything-agent: 6 backends, 16 tools, cron, Tailscale, camera, browser, MCP, chat bridges, and a macOS menu bar app.
The focused specialist: chain-mode workflows, reactive state management, Wyhash loop detection, and a hand-crafted terminal REPL.
The extension-first Rust agent: LLM-based AdversaryInspector, 4 GooseModes, 15+ providers, recipe framework, and MOIM injection.
Head-to-head comparison: two agents, one language, opposite philosophies — platform vs. specialist, 18,400 lines vs. ~9,100.
Clean-room rebuild of Claude Code v2.1.91 via ruDevolution decompilation: async generator loop, 25 tools, 5 providers, nightly releases.
Hash-anchored parallel edits, AST-native precision, 64.8% cost reduction vs competitors, no MCP, 8-type hook system, git checkpoints.
DeepSeek-native coding agent with byte-exact SEARCH/REPLACE, edit-gate review, repair stages, and strict sandbox enforcement.
DeepSeek-first Rust agent with a constitution prompt, durable task manager, persistent subagents, runtime APIs, and a transactional edit stack.
Python-native multi-provider agent with a plugin registry, daemon server, capability-gated kernel, MCP plumbing, and a mixed direct-write plus diff editing stack.