Providers • Model config • Tuning depth

Generic adapters vs model-native engineering

Almost every repo says it supports many models. The real question is how deep that support goes: shared config only, or real provider-specific behavior, auth, and prompt or tool adaptation?

(Alright, ad over. Back to the serious technical analysis.)

The key split

Provider-agnostic systems

Mux, Neovate, Qwen Code, Pi Mono, Oh My Pi, and OpenCode put significant effort into treating providers as interchangeable enough to route, catalog, and configure them through shared abstractions.

Model-native systems

Claude Code, Kimi CLI, and parts of DeerFlow or Pochi are more willing to write code that is obviously shaped around a specific model family or platform.

Model plumbing matrix

Repo	Provider strategy	Model-specific tuning depth	Key evidence from the repo
Claude Code	Anthropic-centered	Very high	Anthropic SDK dependency, model-first runtime assumptions, and deep integration with Claude-style capabilities
Pochi	Shared config plus vendor-specific packages	High	Separate packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and others
DeerFlow	Config-driven class loading	High	Factory chooses provider classes; vLLM provider preserves Qwen reasoning fields; Codex and Claude CLI-backed models appear in examples
Kimi CLI	Platform-centered with configurable providers	Medium-high	Moonshot and Kimi platform auth flows, managed model refresh, ACP handling
Mux	Provider catalog and routing factory	Medium	Known model catalog, provider constants, fetch wrapping, gateway and local-provider logic
Qwen Code	Unified resolver with special auth paths	Medium	ModelConfigResolver, models registry, runtime snapshots, Qwen OAuth path alongside generic APIs
Neovate Code	Large explicit provider matrix	Medium	Many provider files and shared provider types under one AI SDK-style layer
OpenCode	AI SDK router plus broad provider package set	Medium-high	`packages/opencode/package.json` pulls in Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Cohere, Groq, Cerebras, OpenRouter, TogetherAI, Alibaba, and more; `session/system.ts` switches prompt files by model family.
ADK-Rust	Minimal core trait plus feature-gated provider adapters	Medium-high	`adk-core/src/model.rs` keeps `Llm` intentionally small, while `adk-model/src/lib.rs` fans out Gemini, OpenAI, Anthropic, Bedrock, Ollama, and a broad OpenAI-compatible family with provider metadata carried on responses when needed.
Pi Mono	Subscription + API-key provider matrix via `@mariozechner/pi-ai`	High breadth	The current README lists 26 providers across 10 APIs, with built-in model registries, thinking-level suffixes, and custom providers via `models.json` or extensions.
Oh My Pi	Role-based routing plus broad provider discovery	High breadth	The README advertises model roles for `default`, `smol`, `slow`, `plan`, and `commit`, along with multi-credential routing, many provider integrations, and discovery of config from multiple coding-tool ecosystems.
Crush	Generic provider abstraction with productized metadata handling	Medium	Provider metadata fetching and caching in Go; broad support without a single dominant native model identity
OpenHands	Historically broad via LiteLLM-style compatibility	Hard to judge	The local repo shows enough to infer flexibility, but not enough to fully score the current V1 agent core
Hermes	Model-neutral via OpenRouter with 200+ model support	High breadth	No native model bias; uses OpenRouter as the abstraction layer. MoA tool runs claude-opus, gemini-pro, gpt-5, deepseek simultaneously.

The Qwen Code five-layer config resolver

Qwen Code's ModelConfigResolver in packages/core/src/models/modelConfigResolver.ts is the most rigorous model configuration system in this TypeScript set. It defines five typed source layers with explicit precedence (highest to lowest):

modelProvider

Explicit ModelProviders config selection — highest authority. Maps to modelProvidersSource type.

↓

CLI arguments

Flags like --model, --openaiApiKey. Maps to cliSource.

↓

Environment variables

OPENAI_API_KEY, OPENAI_MODEL, etc. Maps to envLayer.

↓

Settings file

User or workspace settings. Maps to settingsSource.

↓

Defaults

Built-in fallback values. Maps to defaultSource / computedSource.

Every resolved configuration field carries its source type, so you can always trace where a value came from. The OAuth flow for Qwen models is gated by a QWEN_OAUTH_ALLOWED_MODELS list — Qwen-specific auth is not exposed for non-Qwen models even if someone builds a config that tries to use it.

Kimi CLI's `kosong` package — provider-native message conversion

Kimi CLI is the only agent in this set that ships its own separate abstraction package (kosong) for multi-provider message conversion. Rather than a single generic converter, it has a dedicated file per provider family:

File	Handles	Key non-obvious detail
`anthropic.py`	Anthropic Messages API	Tracks `tool_use_id` for result correlation; handles Anthropic's `tool_use` block format specifically
`google_genai.py`	Google GenAI / Gemini	Strips `id` field from `function_call` and `function_response` parts — Gemini rejects it. `thought_signature` is preserved for thinking tokens.
`openai_responses.py`	OpenAI Responses API	Handles `function_call`/`function_call_output` item types; tracks conversation state across multi-turn tool use

The Google adapter is worth noting specifically: most agents that "support Google" discover the id-field rejection in production and add a hotfix. Kimi CLI has API snapshot tests for this specific case, meaning it was caught and tested before shipping.

Prompt caching is where provider abstractions stop being generic

Prompt caching is a good stress test for every "multi-provider" claim on this page. The feature looks simple at the UI layer — cached tokens, lower bills, faster repeated turns — but the implementation differs sharply by provider. Anthropic wants explicit cache_control markers, Gemini uses cached-content resources, DashScope exposes vendor headers, and some OpenAI-style stacks mostly report cache hits after the fact.

Support tier	Repos	What that really means
Direct request shaping	Claude Code snapshot, Dirac, DeerFlow, ADK-Rust, OpenCode	The harness emits real provider-side cache markers or cached-content references and manages breakpoint/TTL behavior.
Cross-provider transform layer	Qwen Code, Dirac, OpenCode	Caching is implemented as serializer or adapter logic, not as a generic config toggle.
Telemetry only	Codex, Zaica	The repo reads cached-token usage and exposes it, but does not meaningfully shape the request prefix.
Cache-aware edge cases	Reasonix, OpenHands, Kimi CLI	Useful cache-adjacent behavior exists, but the current snapshot is weaker than the flagship request-shaping implementations.

🧩

New deep dive

For the full repo-by-repo breakdown — including Anthropic marker placement, Gemini cached-content lifecycle, telemetry-only support, and cache-busting avoidance — see Prompt Caching That Actually Reaches the API.

Kimi CLI's compression prompt — structured XML output format

Kimi CLI's context compression is driven by a dedicated prompts/compact.md prompt that instructs the model to produce structured XML output — not free-form prose. The required output uses named tags with explicit retention rules:

Tag	Content	Retention rule
`<current_focus>`	Active task state	Always keep — this is what the agent is doing right now
`<active_issues>`	Errors, stack traces, working solutions	MUST KEEP errors and working solutions verbatim
`<code_state>`	Nested `<file>` blocks (final versions only)	Keep full if <20 lines; else signature + key logic only. REMOVE failed attempts.
`<completed_tasks>`	What has been done	MERGE similar discussions; CONDENSE to outcomes
`<environment>`	System context	Keep stable facts; drop ephemeral state
`<important_context>`	Design decisions and TODO items	Keep design rationale; REMOVE redundant explanations

Kimi also ships a prompts/init.md that instructs the model to explore the project codebase and produce an AGENTS.md file: project overview, build/test commands, code style, testing instructions, and security considerations — using the project's native language. This is a structured project-onboarding ritual that no other agent formalizes as a named prompt.

OpenHands — Nine Jinja2 prompt templates with XML sections

OpenHands has the most modular prompt system in the set: 9 Jinja2 .j2 templates in openhands/agenthub/codeact_agent/prompts/, assembled at runtime. The main template uses named XML sections: <ROLE>, <EFFICIENCY>, <FILE_SYSTEM_GUIDELINES>, <CODE_QUALITY>, <VERSION_CONTROL>, <PULL_REQUESTS>, <PROBLEM_SOLVING_WORKFLOW>, <SECURITY>, <EXTERNAL_SERVICES>, and <ENVIRONMENT_SETUP>. The security section includes {% include 'security_risk_assessment.j2' %} — a composable sub-template, not inline text.

The long-horizon variant (system_prompt_long_horizon.j2) extends the base to add <TASK_MANAGEMENT> and <TASK_TRACKING_PERSISTENCE> for the task_tracker tool. Additional templates: in_context_learning_example.j2, microagent_info.j2, additional_info.j2, system_prompt_interactive.j2, system_prompt_tech_philosophy.j2.

Codex — OpenAI-native with provider extensibility

Codex is built by OpenAI, and its default wire protocol is the Responses API (wire_api = "responses"). The legacy Chat Completions API (wire_api = "chat") is explicitly no longer supported — Codex produces a migration error at config parse time if you try to use it. However, Codex is not hard-coded to OpenAI's cloud:

Feature	Detail
User-defined providers	`[model_providers]` table in `config.toml` — supports Ollama, LM Studio, any OpenAI-compatible endpoint
Dedicated integration crates	`codex-ollama` and `codex-lmstudio` — full client implementations with discovery, connection, and error handling, not thin config wrappers
Retry defaults	300,000 ms stream idle timeout, 5 stream max retries, 4 request max retries (hard cap: 100)
Plan mode reasoning	`plan_mode_reasoning_effort` config for model-specific reasoning presets (defaults to `medium`, supports `none`)
Tracing	OpenTelemetry via `codex-otel` crate
Provider registry	`codex-model-provider-info` — built-in defaults for OpenAI, user overrides from config at runtime

The WireApi enum currently has a single variant (Responses), which means Codex is more OpenAI-centric than agents like Mux or Qwen Code that abstract across many API shapes. But the dedicated Ollama and LM Studio crates show OpenAI is investing in local model support beyond just cloud API access.

Hermes — Maximum provider breadth via OpenRouter

Hermes takes the opposite approach from Claude Code. Rather than being shaped around one model family, it is model-neutral by design: OpenRouter provides access to 200+ models through a single API endpoint. There is no native model assumption anywhere in the codebase.

This unlocks a unique feature: the Mixture of Agents (MoA) tool. Because Hermes has no model loyalty, it can run four different frontier models (Claude Opus, Gemini Pro, GPT-5, DeepSeek v3) in parallel via ThreadPoolExecutor, collect all four responses, and pass them to a fifth aggregator model that synthesizes the best answer. This is the only built-in MoA implementation in this set.

See the dedicated Hermes page for detailed coverage of the MoA tool and aggregation algorithm.

DeerFlow — LangGraph model factory

DeerFlow's model loading uses LangGraph's native model factory pattern. The vLLM provider specializes to preserve Qwen's reasoning fields, and the factory supports CLI-backed models (models where the actual inference is done by a subprocess CLI command rather than an API call). This extensibility is a result of the LangGraph architecture — any LangChain-compatible model can be plugged in via config.

The langgraph.json file defines graph entrypoints and can specify different models per node — meaning the planner, writer, and tools can theoretically use different models without any code changes, just configuration.

Claude Code

The purest example of model-native engineering. Instead of abstracting away Claude's shape, the repo leans into it and builds a full product around those assumptions.

Pochi

The most explicit multi-ecosystem adapter. It proves that "supports many models" can still mean dedicated code for each ecosystem, not just endpoint compatibility.

DeerFlow

The most flexible in a framework sense. It allows provider classes to specialize around thinking tokens, vLLM quirks, and CLI-backed model wrappers.

Codex

OpenAI-native but provider-extensible. Responses API by default, with dedicated Ollama and LM Studio crates, configurable retry logic, and plan-mode reasoning presets. The single-variant WireApi enum makes it less generic than Mux, but the local model support is more substantial than thin config adapters.

Auth and configuration maturity

Mux, Qwen Code, and Neovate do the best job of making model choice feel like a governed system instead of a loose settings file. Qwen is especially strong here: it resolves model configuration from multiple sources, tracks origin, and snapshots runtime state.

Claude Code and Kimi CLI are far more willing to admit that provider behavior is not generic. Claude's repo is structurally centered on Anthropic. Kimi includes explicit OAuth and managed platform logic for Moonshot and Kimi Code experiences.

DeerFlow approaches the problem from a harness angle. Instead of one canonical provider matrix, it loads configured model classes and lets middleware or role configuration shape how they are used. That is more extensible, but less immediately uniform.

My verdict on model handling

If you want the best provider-neutral design, look at Mux and Qwen Code. If you want the most honest model-native design, Claude Code is the clearest answer. If you want the most explicit multi-ecosystem adapter, Pochi wins. If you want the most extensible model factory, DeerFlow is the most interesting. If you want the most maximum model breadth (200+ models, no native preference), Hermes via OpenRouter wins outright. And if you want OpenAI-native with real local model support, Codex offers dedicated Ollama and LM Studio crates, not just config-level provider overrides.

The lesson is that "supports many models" is not one thing. Some repos are really doing transport compatibility. Some are doing routing. Some are doing full provider-specific product behavior. And some — like Hermes with MoA — are doing multi-model synthesis that requires genuine model-neutrality to work. The difference matters a lot when tool calling, reasoning settings, auth flows, or MCP behavior start to diverge.

Generic adapters vs model-native engineering

The key split

Provider-agnostic systems

Model-native systems

Model plumbing matrix

The Qwen Code five-layer config resolver

Kimi CLI's kosong package — provider-native message conversion

Prompt caching is where provider abstractions stop being generic

New deep dive

Kimi CLI's compression prompt — structured XML output format

OpenHands — Nine Jinja2 prompt templates with XML sections

Codex — OpenAI-native with provider extensibility

Hermes — Maximum provider breadth via OpenRouter

DeerFlow — LangGraph model factory

Claude Code

Pochi

DeerFlow

Codex

Auth and configuration maturity

My verdict on model handling

Kimi CLI's `kosong` package — provider-native message conversion