Repository Study • 22 agents • Local snapshot

The Agent Harness Field Guide

This blog audits the code that is actually present in coding-agents\: how each agent wires models, exposes tools, runs shell commands, speaks MCP or ACP, and where Claude Code and Hermes feel fundamentally different from the rest of the field — plus Oh My Pi's hashline-heavy Bun + Rust fork of Pi Mono, OpenCode's client-server runtime, Reasonix's exact-match edit gate, ADK-Rust's feature-gated Rust framework stack, CheetahClaws's Python-native daemon-plus-kernel architecture, and OpenAI's own Codex CLI, a Rust-native agent with platform-specific sandboxes and bidirectional MCP support.

Hand-drawn diagram grouping coding agents into bespoke runtimes, provider multiplexers, protocol bridges, orchestration platforms, and special cases around a local repo snapshot. — The repo set breaks into a handful of recurring runtime shapes. Architecture goes deep on the families, while Protocols adds the MCP versus ACP map.

(Alright, ad over. Back to the serious technical analysis.)

What this deep dive covers

The interesting part of coding agents is not the marketing layer. It is the runtime beneath it: whether the tool layer is generic or bespoke, whether shell access is tightly guarded or casually wrapped, whether model support is truly abstracted or just superficially multiplexed, and whether the repo reads like a productized operating environment or a fast-moving integration shell.

I read the local repositories for Pochi, Neovate Code, Mux, Crush, Kimi CLI, Qwen Code, OpenHands, Claude Code, DeerFlow, Hermes Agent (by Nous Research), Pi Mono, Oh My Pi, OpenCode, Codex CLI (by OpenAI), Wintermolt, Zaica, Goose, Dirac, Reasonix, CodeWhale, CheetahClaws, and Open Claude Code 2.0 (a clean-room implementation via AI decompilation), then mapped them against the same questions on every page of this site.

🔍

Methodology note

This is a local-only repo study. I did not use outside documentation beyond what is already checked into the worktree. That matters most for OpenHands, where the repo itself says the newer V1 agent core now lives elsewhere.

Repos audited

Pochi - TypeScript monorepo with vendor-specific model adapters and built-in agents.
Neovate Code - TypeScript CLI with AI SDK providers, MCP, and hardened bash tooling.
Mux - large TypeScript desktop/browser agent platform with workspaces and provider routing.
Crush - Go-based terminal product with custom tools, permissions, and provider metadata plumbing.
Kimi CLI - Python terminal agent focused on Kimi platform flows plus ACP and MCP bridges.
Qwen Code - Gemini CLI descendant with strong config resolution, declarative tools, and MCP lifecycle management.
OpenHands - platform/runtime repo with sandbox and legacy CodeAct agent pieces still present locally.
Claude Code - deeply integrated Bun + React/Ink runtime centered on Anthropic models and a huge internal tool surface.
DeerFlow - LangGraph-based super-agent harness with middleware, subagents, SSE streaming, and config-driven model factories.
Hermes Agent (Nous Research) - Python agent with persistent skill learning, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, 6 execution backends, and RL training infrastructure.
Codex CLI (OpenAI) - Rust-native coding agent (3,805 files, 70+ crates) with macOS Seatbelt, Linux bubblewrap/Landlock, Windows sandbox, MCP client and server, multi-agent spawning, OpenAI Responses API plumbing, configurable TOML config, IDE extensions, and a Ratatui TUI.
Pi Mono - TypeScript minimalist harness (874 files) with tree-structured JSONL v3 sessions, differential TUI rendering, Pi Packages (shareable bundles via npm/git), 26 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT license, author Mario Zechner.
Oh My Pi - Bun + TypeScript + Rust fork of Pi Mono (v14.7.8, MIT) with default hashline editing, a Rust native engine, MCP and plugin discovery, LSP, Python, browser tooling, task/subagent execution, and a dedicated edit-benchmark package.
OpenCode - TypeScript/Bun client-server coding runtime (4,531 files) with a SolidJS opentui terminal UI, web and desktop clients, apply_patch plus exact-string editing, wildcard permissions, worktrees, MCP, ACP, skills, and an SDK.
ADK-Rust - Rust framework workspace (v0.8.0, Apache-2.0, 34 workspace members) with feature-tiered crates, a minimal LLM trait, provider-native Anthropic/OpenAI tool wrappers, graph workflows, MCP integration, A2A APIs, AWP deployment support, and optional browser/realtime/sandbox modules.
Wintermolt - Zig 0.15 native binary (3 MB, zero runtime) with 6 AI backends, 16 tools, cron scheduling, Tailscale mesh, camera vision, browser automation, MCP bidirectional, chat bridges to 4 platforms, and a macOS menu bar app.
Zaica - Zig 0.15 focused coding agent (~9,100 lines, zero runtime) with multi-provider LLM support, chain-mode structured workflows, parallel sub-agent dispatch, reactive state management (zefx), and Wyhash-based loop detection.
Goose (AAIF/Linux Foundation) - Rust-native AI agent (v1.32.0, Apache-2.0) with 15+ providers, 5-layer security inspector stack, LLM-based AdversaryInspector, 4 GooseModes (Auto/Approve/SmartApprove/Chat), extension system, recipe framework, and MOIM injection.
Open Claude Code 2.0 - Clean-room implementation of Claude Code via AI-powered decompilation (1,581 tests, async generator architecture, multi-agent teams, git worktree isolation, 7-type hook system).
Dirac - TypeScript fork of Cline with hash-anchored parallel edits, AST-native precision, multi-file batching, 64.8% cost reduction, no MCP, hook system, git checkpoints, state mutex, and 40+ provider support. 8/8 on TerminalBench 2.0 evals.

The landscape in one screen

🧠

Bespoke runtime products

Claude Code and Crush feel like full terminal operating environments, not thin wrappers. Their tool, permission, and UX layers are part of the product, not just adapters around a chat loop.

Most opinionated

🔌

Provider multiplexers

Mux, Neovate, and Qwen Code all build serious provider catalogs and shared abstractions. They want broad model reach more than a single model-native identity.

Most configurable

🧩

Protocol and adapter layers

Pochi and Kimi CLI stand out for their ecosystem bridges. Pochi ships vendor-specific packages for Codex, Qwen, Copilot, and others. Kimi invests heavily in ACP and in translating internal tool output into protocol-friendly shapes.

Most bridge-heavy

🏗️

Client-server coding runtimes

OpenCode feels less like a single CLI and more like a shared runtime: one backend powering TUI, web, desktop, SDK, MCP, ACP, skills, worktrees, and a permission bus. It is the clearest open-source example here of a terminal agent growing into a platform.

Most runtime-shaped

🧱

Frameworks and execution harnesses

DeerFlow, OpenHands, and ADK-Rust are less about one CLI persona and more about reusable orchestration environments. DeerFlow and OpenHands lean into services and sandboxes. ADK-Rust pushes furthest toward a publishable crate ecosystem with feature tiers, graph workflows, provider-native tools, A2A, AWP, and payment adapters.

Most framework-shaped

The Pi lineage: kernel to power-tool fork

Pi Mono is still the sharp minimalist kernel: tree-structured JSONL sessions, differential TUI rendering, Pi Packages, and a tight extension-first philosophy. Oh My Pi takes that lineage in the opposite direction with Bun workspaces, Rust native modules, default hashline edits, MCP, plugins, browser tooling, a task tool, and an edit benchmark lab.

Most visible fork split

Self-improving multi-platform agents

Hermes Agent (Nous Research) is in a category of its own: a persistent skill-learning loop, six remote execution backends, 14+ messaging platform gateways (Telegram, Discord, Slack, WhatsApp, Signal, WeChat, Matrix, Mattermost, Feishu, DingTalk, Email, SMS, HomeAssistant, Webhook), MoA synthesis, and RL training infrastructure.

Most functionally unique

Zero-runtime native binaries

Wintermolt and Zaica are both written entirely in Zig 0.15 — no Node.js, no Python, no garbage collector. They compile to single native binaries (Wintermolt is 3 MB) with cross-compilation to any Zig target including ARM boards. Wintermolt goes wide (7 modes, cron, Tailscale, camera, browser, MCP, chat bridges). Zaica goes deep (chain-mode workflows, reactive state, Wyhash loop detection).

Most portable

⚡

OpenAI's own coding agent

Codex CLI is OpenAI's production coding agent — a Rust workspace of 70+ crates with platform-specific sandboxes (Seatbelt, bubblewrap/Landlock, Windows restricted tokens), bidirectional MCP (client and server), multi-agent job execution, and a configurable provider system that supports Ollama and LM Studio.

Most sandboxed

Extension-first security advocates

Goose (AAIF at Linux Foundation) takes a unique approach to security: it uses an LLM-based AdversaryInspector that fires a second LLM call to review tool calls against user-defined rules from ~/.config/goose/adversary.md. This is defense-in-depth for multi-agent setups where parent agents delegate to sub-agents. With 15+ providers, 4 GooseModes (Auto/Approve/SmartApprove/Chat), and a recipe framework, Goose is Rust-native with extensive feature gates (local-inference, aws-providers, otel).

Most LLM-secured

Fast takeaways

Question	Best answer from this snapshot	Why
Which repo feels most different?	Claude Code	It is the least generic and the most integrated: Anthropic-first, huge tool catalog, plan/worktree/team flows, React terminal UI, permission system, and a massive central query runtime.
Which repos are most model-agnostic?	Mux, Neovate, Qwen Code, Goose	All invest in provider registries, routing layers, and shared config resolution instead of pinning themselves to one native model family. Goose ships 35+ provider modules across direct APIs, ACP bridges, and declarative JSON configs.
Which repo adapts multiple ecosystems most explicitly?	Pochi	It does not stop at a generic provider interface; it ships vendor-specific packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and more.
Which repo is most reusable as a framework?	ADK-Rust	It is organized as a 34-member Rust workspace with a minimal default tier, publishable crates for agents/models/tools/server, and explicit add-on layers for graph workflows, payments, AWP, browser automation, realtime, and sandboxing.
Which repo treats edit reliability like an engineering lab?	Oh My Pi	Default hashline mode, prompt helpers for real anchors, separator tuning, compact previews, a dedicated `typescript-edit-benchmark` package, and benchmark scripts for the hashline variant all point to a repo that is explicitly iterating on mechanical edit failure modes.
Which shell tooling is most safety-conscious?	Neovate Code, Claude Code, and Goose	Neovate hard-codes command bans and high-risk detection (22-item banned list, quote-aware pipeline parser), while Claude layers permissions, tree-sitter AST analysis, and Zsh-specific attack detection over a richer command surface. Hermes uses supply-chain verification (cosign provenance) for its execution environment. Goose uniquely uses an LLM-based AdversaryInspector that fires a second LLM call to review tool calls against user-defined rules.
Which repo has the most unique capabilities?	Hermes Agent	Self-improving skill loop, 6 remote backends, multi-platform IM gateways, MoA synthesis across 4 frontier models, and RL training infrastructure — none of which appear anywhere else in this set.
Which repo is hardest to judge from local code alone?	OpenHands	The local repo still contains useful architecture, but its own docs say the newer V1 agent core moved to a separate Software Agent SDK repository.
Which code feels most polished?	Claude Code, Crush, Mux, Qwen Code	These four snapshots show the clearest internal consistency between product goals, tool design, configuration, and error handling. Crush is notable for being the only agent with native LSP diagnostics and Sourcegraph code search as first-class tools.
Which repo is the most extension-friendly?	Pi Mono	A tight minimalist harness that deliberately ships without MCP, permissions, or sub-agents — expecting you to compose them via extensions. Pi Packages let you bundle and share configurations across projects via npm or git.
Which repo feels most like a platform runtime?	OpenCode	One backend drives the TUI, browser console, desktop shell, MCP, ACP, skills, worktrees, and SDK. It reads more like a small agent platform than a one-window CLI.
Which repo has the most platform-specific sandboxing?	Codex CLI	Three separate sandbox implementations — macOS Seatbelt, Linux bubblewrap/Landlock, and Windows restricted tokens — each with split-filesystem awareness and carveout support. Also the only agent in this set that doubles as an MCP server for other agents.
Which repo handles web-grounded research best?	DeerFlow overall; Crush if you want no extra search API bill	DeerFlow combines free default search/fetch with the most explicit deep-research methods in the current snapshot. Crush is the strongest product-style coding agent that pairs free DuckDuckGo search with a delegated multi-step fetch workflow. See Web Research for the full comparison, including Wintermolt, Claude Code, Codex, and Reasonix.

Approximate codebase size by file count

File count is not the same thing as quality, but it does reveal where the implementation surface is broadest.

ADK-Rust is intentionally omitted here: its 34-member framework workspace and extracted companion repos make raw file-count comparison less useful than with single-product CLIs.

OpenCode

4531 files

Codex CLI

3805 files

OpenHands

2774 files

Mux

2226 files

Claude Code

2137 files

Qwen Code

2038 files

Pochi

1315 files

Kimi CLI

899 files

DeerFlow

810 files

Crush

799 files

Neovate

582 files

Hermes

~450 files

Pi Mono

874 files

Wintermolt

51 Zig files (~18,400 lines)

Zaica

~13 files (~9,100 lines)

Open Claude Code 2.0

61 files (~8,300 lines)

Goose

Rust Cargo workspace (~6+ crates)

Dirac

TypeScript monorepo (fork of Cline)

My high-level verdict

Best designed, if you value a coherent product runtime

Claude Code is the standout. It is not the most provider-flexible repo, but it is the clearest example of an agent built as its own operating model: tool schemas, permissioning, commands, tasking, worktrees, UI, feature flags, and retry logic all sit inside one deliberate runtime.

Best designed, if you value clean systems engineering

Crush is the nicest surprise. The Go codebase feels disciplined, modular, and product-minded without being bloated. Its provider plumbing, permissions, and TUI organization are easier to reason about than many faster-moving TypeScript peers.

Best multi-model architecture

Mux and Qwen Code lead here. Mux has a broad provider routing layer with desktop app ambitions, while Qwen Code has a particularly strong configuration and runtime model-resolution story.

Most extensible framework shape

DeerFlow wins on composability. It feels more like a harness for building agent systems than a single agent persona, which makes it powerful but also less opinionated than Claude Code or Crush.

Most publishable framework ecosystem

ADK-Rust. The repo is built as a crate ecosystem, not just a runnable app: minimal-by-default packaging, typed tools, workflow agents, graph orchestration, A2A/AWP surfaces, and an honest stability file that separates mature crates from frontier modules.

Most functionally unique

Hermes Agent by Nous Research. The self-improving skill loop, 14+ messaging platform gateways, MoA tool (4 frontier models in parallel), and RL training infrastructure are not features in any other repo here. It is the only agent that explicitly tries to get better at your tasks over time.

Most portable — zero runtime, one binary

Wintermolt and Zaica are the only agents here that compile to a single native binary with zero runtime dependency. Wintermolt (3 MB, 18,400 lines) is the most ambitious agent in any language. Zaica (~9,100 lines) is the most focused coding specialist with chain-mode workflows and best-in-class loop detection.

Most extension-friendly kernel

Pi Mono by Mario Zechner. A tight 874-file TypeScript kernel with tree-structured JSONL v3 sessions, differential TUI rendering, 26 providers across 10 APIs, Pi Packages (shareable bundles via npm/git), parallel tool execution, a file mutation queue, and 4 run modes. MIT licensed and deliberately minimal so you can build MCP, permissions, or sub-agents yourself.

Most serious edit-lab fork

Oh My Pi. It starts from Pi Mono's terminal-agent kernel and then turns editing into a research surface: default hashline mode, prompt/runtime anchor helpers, a native engine, MCP and plugin plumbing, built-in task execution, and benchmark infrastructure dedicated to edit variants.

Most complete open-source runtime

OpenCode. The repo combines a terminal UI, browser console, desktop shell, SDK, MCP, ACP, worktrees, skills, and a permission bus behind one backend runtime. It is the clearest open-source "agent platform" in this snapshot.

Most security-conscious sandboxing

Codex CLI by OpenAI. Three platform-specific sandbox implementations (macOS Seatbelt, Linux bubblewrap/Landlock, Windows restricted tokens), split-filesystem awareness, an execution policy engine with a rule DSL, bidirectional MCP (client and server), and a strict clippy lint policy that bans unwrap_used and expect_used across 70+ crates.

How to read the rest of this site

Architecture

Compare tool schemas, shell execution, MCP support, and recovery patterns.

↓

Agents

Read per-repo profiles, strengths, weaknesses, and fit.

↓

Models

See who is genuinely provider-neutral and who writes model-specific logic.

↓

Claude Code

The dedicated page on why Claude Code feels like a category of its own.

↓

DeerFlow

The LangGraph-based super agent harness with 14-layer middleware, skill evolution, and sub-agent orchestration.

↓

OpenHands

The platform-shaped agent with Docker sandboxing and ingenious temperature-bumping retry logic.

↓

Security

Deep dive into shell injection defense, prompt injection scanning, permissions, sandboxing, and loop detection.

↓

Protocols

MCP and ACP implementation compared — transports, OAuth, lifecycle, and deferred tool loading.

↓

Subagents

How agents delegate work, isolate children, enforce concurrency limits, and collect results.

↓

Hermes Agent

The completely separate deep dive on the most unusual agent in the set — self-improving, multi-platform, and RL-augmented.

↓

Pi Mono

The minimalist kernel — 874 files, tree-structured JSONL v3 sessions, differential TUI, Pi Packages, 26 providers across 10 APIs, parallel tool execution, file mutation queue, 4 run modes, MIT licensed.

↓

OpenCode

The client-server runtime — SolidJS terminal UI, web and desktop clients, apply_patch, wildcard permissions, ACP, MCP, worktrees, and skills.

↓

ADK-Rust

The Rust framework workspace: 34 member crates, feature tiers, provider-native tool wrappers, graph workflows, A2A, AWP, and optional sandbox/browser/realtime modules.

↓

Codex CLI

OpenAI's production agent: 3,805 files, 70+ Rust crates, three platform-specific sandboxes, bidirectional MCP, multi-agent jobs, and IDE extensions.

↓

Wintermolt

The 3 MB everything-agent: 6 backends, 16 tools, cron, Tailscale, camera, browser, MCP, chat bridges, and a macOS menu bar app.

↓

Zaica

The focused specialist: chain-mode workflows, reactive state management, Wyhash loop detection, and a hand-crafted terminal REPL.

↓

Goose

The extension-first Rust agent: LLM-based AdversaryInspector, 4 GooseModes, 15+ providers, recipe framework, and MOIM injection.

↓

Zig Agents

Head-to-head comparison: two agents, one language, opposite philosophies — platform vs. specialist, 18,400 lines vs. ~9,100.

↓

Open Claude Code

Clean-room rebuild of Claude Code v2.1.91 via ruDevolution decompilation: async generator loop, 25 tools, 5 providers, nightly releases.

→

Dirac

Hash-anchored parallel edits, AST-native precision, 64.8% cost reduction vs competitors, no MCP, 8-type hook system, git checkpoints.

→

Reasonix

DeepSeek-native coding agent with byte-exact SEARCH/REPLACE, edit-gate review, repair stages, and strict sandbox enforcement.

→

CodeWhale

DeepSeek-first Rust agent with a constitution prompt, durable task manager, persistent subagents, runtime APIs, and a transactional edit stack.

→

CheetahClaws

Python-native multi-provider agent with a plugin registry, daemon server, capability-gated kernel, MCP plumbing, and a mixed direct-write plus diff editing stack.