Generic adapters vs model-native engineering
Almost every repo says it supports many models. The real question is how deep that support goes: shared config only, or real provider-specific behavior, auth, and prompt or tool adaptation?
The key split
Provider-agnostic systems
Mux, Neovate, Qwen Code, Pi Mono, Oh My Pi, and OpenCode put significant effort into treating providers as interchangeable enough to route, catalog, and configure them through shared abstractions.
Model-native systems
Claude Code, Kimi CLI, and parts of DeerFlow or Pochi are more willing to write code that is obviously shaped around a specific model family or platform.
Model plumbing matrix
| Repo | Provider strategy | Model-specific tuning depth | Key evidence from the repo |
|---|---|---|---|
| Claude Code | Anthropic-centered | Very high | Anthropic SDK dependency, model-first runtime assumptions, and deep integration with Claude-style capabilities |
| Pochi | Shared config plus vendor-specific packages | High | Separate packages for Codex, Qwen Code, GitHub Copilot, Gemini CLI, and others |
| DeerFlow | Config-driven class loading | High | Factory chooses provider classes; vLLM provider preserves Qwen reasoning fields; Codex and Claude CLI-backed models appear in examples |
| Kimi CLI | Platform-centered with configurable providers | Medium-high | Moonshot and Kimi platform auth flows, managed model refresh, ACP handling |
| Mux | Provider catalog and routing factory | Medium | Known model catalog, provider constants, fetch wrapping, gateway and local-provider logic |
| Qwen Code | Unified resolver with special auth paths | Medium | ModelConfigResolver, models registry, runtime snapshots, Qwen OAuth path alongside generic APIs |
| Neovate Code | Large explicit provider matrix | Medium | Many provider files and shared provider types under one AI SDK-style layer |
| OpenCode | AI SDK router plus broad provider package set | Medium-high |
packages/opencode/package.json pulls in
Anthropic, OpenAI, Google, Azure, Bedrock, Mistral, Cohere,
Groq, Cerebras, OpenRouter, TogetherAI, Alibaba, and more;
session/system.ts switches prompt files by model
family.
|
| ADK-Rust | Minimal core trait plus feature-gated provider adapters | Medium-high |
adk-core/src/model.rs keeps
Llm intentionally small, while
adk-model/src/lib.rs
fans out Gemini, OpenAI, Anthropic, Bedrock, Ollama, and a
broad OpenAI-compatible family with provider metadata carried
on responses when needed.
|
| Pi Mono |
Subscription + API-key provider matrix via
@mariozechner/pi-ai
|
High breadth |
The current README lists 26 providers across 10 APIs, with
built-in model registries, thinking-level suffixes, and custom
providers via models.json or extensions.
|
| Oh My Pi | Role-based routing plus broad provider discovery | High breadth |
The README advertises model roles for
default, smol, slow,
plan, and commit, along with
multi-credential routing, many provider integrations, and
discovery of config from multiple coding-tool ecosystems.
|
| Crush | Generic provider abstraction with productized metadata handling | Medium | Provider metadata fetching and caching in Go; broad support without a single dominant native model identity |
| OpenHands | Historically broad via LiteLLM-style compatibility | Hard to judge | The local repo shows enough to infer flexibility, but not enough to fully score the current V1 agent core |
| Hermes | Model-neutral via OpenRouter with 200+ model support | High breadth | No native model bias; uses OpenRouter as the abstraction layer. MoA tool runs claude-opus, gemini-pro, gpt-5, deepseek simultaneously. |
The Qwen Code five-layer config resolver
Qwen Code's ModelConfigResolver in
packages/core/src/models/modelConfigResolver.ts is the
most rigorous model configuration system in this TypeScript set. It
defines five typed source layers with explicit precedence (highest to
lowest):
Explicit ModelProviders config selection — highest authority.
Maps to modelProvidersSource type.
Flags like --model, --openaiApiKey.
Maps to cliSource.
OPENAI_API_KEY, OPENAI_MODEL, etc.
Maps to envLayer.
User or workspace settings. Maps to settingsSource.
Built-in fallback values. Maps to defaultSource /
computedSource.
Every resolved configuration field carries its source type, so you can
always trace where a value came from. The OAuth flow for Qwen models
is gated by a
QWEN_OAUTH_ALLOWED_MODELS list — Qwen-specific auth is
not exposed for non-Qwen models even if someone builds a config that
tries to use it.
Kimi CLI's kosong package — provider-native message
conversion
Kimi CLI is the only agent in this set that ships its
own separate abstraction package (kosong) for
multi-provider message conversion. Rather than a single generic
converter, it has a dedicated file per provider family:
| File | Handles | Key non-obvious detail |
|---|---|---|
anthropic.py |
Anthropic Messages API |
Tracks tool_use_id for result correlation;
handles Anthropic's tool_use block format
specifically
|
google_genai.py |
Google GenAI / Gemini |
Strips id field from
function_call and
function_response parts — Gemini rejects it.
thought_signature is preserved for thinking
tokens.
|
openai_responses.py |
OpenAI Responses API |
Handles function_call/function_call_output
item types; tracks conversation state across multi-turn tool
use
|
The Google adapter is worth noting specifically: most agents that
"support Google" discover the id-field rejection in
production and add a hotfix. Kimi CLI has API snapshot tests for this
specific case, meaning it was caught and tested before shipping.
Prompt caching is where provider abstractions stop being generic
Prompt caching is a good stress test for every "multi-provider" claim
on this page. The feature looks simple at the UI layer — cached tokens,
lower bills, faster repeated turns — but the implementation differs
sharply by provider. Anthropic wants explicit
cache_control markers, Gemini uses cached-content
resources, DashScope exposes vendor headers, and some OpenAI-style
stacks mostly report cache hits after the fact.
| Support tier | Repos | What that really means |
|---|---|---|
| Direct request shaping | Claude Code snapshot, Dirac, DeerFlow, ADK-Rust, OpenCode | The harness emits real provider-side cache markers or cached-content references and manages breakpoint/TTL behavior. |
| Cross-provider transform layer | Qwen Code, Dirac, OpenCode | Caching is implemented as serializer or adapter logic, not as a generic config toggle. |
| Telemetry only | Codex, Zaica | The repo reads cached-token usage and exposes it, but does not meaningfully shape the request prefix. |
| Cache-aware edge cases | Reasonix, OpenHands, Kimi CLI | Useful cache-adjacent behavior exists, but the current snapshot is weaker than the flagship request-shaping implementations. |
New deep dive
For the full repo-by-repo breakdown — including Anthropic marker placement, Gemini cached-content lifecycle, telemetry-only support, and cache-busting avoidance — see Prompt Caching That Actually Reaches the API.
Kimi CLI's compression prompt — structured XML output format
Kimi CLI's context compression is driven by a dedicated
prompts/compact.md prompt that instructs the model to
produce structured XML output — not free-form prose. The required
output uses named tags with explicit retention rules:
| Tag | Content | Retention rule |
|---|---|---|
<current_focus> |
Active task state | Always keep — this is what the agent is doing right now |
<active_issues> |
Errors, stack traces, working solutions | MUST KEEP errors and working solutions verbatim |
<code_state> |
Nested <file> blocks (final versions only)
|
Keep full if <20 lines; else signature + key logic only. REMOVE failed attempts. |
<completed_tasks> |
What has been done | MERGE similar discussions; CONDENSE to outcomes |
<environment> |
System context | Keep stable facts; drop ephemeral state |
<important_context> |
Design decisions and TODO items | Keep design rationale; REMOVE redundant explanations |
Kimi also ships a prompts/init.md that instructs the
model to explore the project codebase and produce an
AGENTS.md file: project overview, build/test commands,
code style, testing instructions, and security considerations — using
the project's native language. This is a structured project-onboarding
ritual that no other agent formalizes as a named prompt.
OpenHands — Nine Jinja2 prompt templates with XML sections
OpenHands has the most modular prompt system in the set: 9 Jinja2
.j2 templates in
openhands/agenthub/codeact_agent/prompts/, assembled at
runtime. The main template uses named XML sections:
<ROLE>, <EFFICIENCY>,
<FILE_SYSTEM_GUIDELINES>,
<CODE_QUALITY>,
<VERSION_CONTROL>,
<PULL_REQUESTS>,
<PROBLEM_SOLVING_WORKFLOW>,
<SECURITY>, <EXTERNAL_SERVICES>,
and <ENVIRONMENT_SETUP>. The security section
includes {% include 'security_risk_assessment.j2' %} — a
composable sub-template, not inline text.
The long-horizon variant (system_prompt_long_horizon.j2)
extends the base to add <TASK_MANAGEMENT> and
<TASK_TRACKING_PERSISTENCE> for the
task_tracker tool. Additional templates:
in_context_learning_example.j2,
microagent_info.j2, additional_info.j2,
system_prompt_interactive.j2,
system_prompt_tech_philosophy.j2.
Codex — OpenAI-native with provider extensibility
Codex is built by OpenAI, and its default wire protocol is the
Responses API (wire_api = "responses").
The legacy Chat Completions API (wire_api = "chat") is
explicitly no longer supported — Codex produces a migration error at
config parse time if you try to use it. However, Codex is not
hard-coded to OpenAI's cloud:
| Feature | Detail |
|---|---|
| User-defined providers |
[model_providers] table in
config.toml — supports Ollama, LM Studio, any
OpenAI-compatible endpoint
|
| Dedicated integration crates |
codex-ollama and codex-lmstudio —
full client implementations with discovery, connection, and
error handling, not thin config wrappers
|
| Retry defaults | 300,000 ms stream idle timeout, 5 stream max retries, 4 request max retries (hard cap: 100) |
| Plan mode reasoning |
plan_mode_reasoning_effort config for
model-specific reasoning presets (defaults to
medium, supports none)
|
| Tracing | OpenTelemetry via codex-otel crate |
| Provider registry |
codex-model-provider-info — built-in defaults for
OpenAI, user overrides from config at runtime
|
The WireApi enum currently has a single variant
(Responses), which means Codex is more OpenAI-centric
than agents like Mux or Qwen Code that abstract across many API
shapes. But the dedicated Ollama and LM Studio crates show OpenAI is
investing in local model support beyond just cloud API access.
Hermes — Maximum provider breadth via OpenRouter
Hermes takes the opposite approach from Claude Code. Rather than being shaped around one model family, it is model-neutral by design: OpenRouter provides access to 200+ models through a single API endpoint. There is no native model assumption anywhere in the codebase.
This unlocks a unique feature: the
Mixture of Agents (MoA) tool. Because Hermes has no
model loyalty, it can run four different frontier models (Claude Opus,
Gemini Pro, GPT-5, DeepSeek v3) in parallel via
ThreadPoolExecutor, collect all four responses, and pass
them to a fifth aggregator model that synthesizes the best answer.
This is the only built-in MoA implementation in this set.
See the dedicated Hermes page for detailed coverage of the MoA tool and aggregation algorithm.
DeerFlow — LangGraph model factory
DeerFlow's model loading uses LangGraph's native model factory
pattern. The vLLM provider specializes to preserve Qwen's
reasoning fields, and the factory supports CLI-backed models (models
where the actual inference is done by a subprocess CLI command rather
than an API call). This extensibility is a result of the LangGraph
architecture — any LangChain-compatible model can be plugged in via
config.
The langgraph.json file defines graph entrypoints and can
specify different models per node — meaning the planner, writer, and
tools can theoretically use different models without any code changes,
just configuration.
Claude Code
The purest example of model-native engineering. Instead of abstracting away Claude's shape, the repo leans into it and builds a full product around those assumptions.
Pochi
The most explicit multi-ecosystem adapter. It proves that "supports many models" can still mean dedicated code for each ecosystem, not just endpoint compatibility.
DeerFlow
The most flexible in a framework sense. It allows provider classes to specialize around thinking tokens, vLLM quirks, and CLI-backed model wrappers.
Codex
OpenAI-native but provider-extensible. Responses API by default,
with dedicated Ollama and LM Studio crates, configurable retry
logic, and plan-mode reasoning presets. The single-variant
WireApi
enum makes it less generic than Mux, but the local model support
is more substantial than thin config adapters.
Auth and configuration maturity
Mux, Qwen Code, and Neovate do the best job of making model choice feel like a governed system instead of a loose settings file. Qwen is especially strong here: it resolves model configuration from multiple sources, tracks origin, and snapshots runtime state.
Claude Code and Kimi CLI are far more willing to admit that provider behavior is not generic. Claude's repo is structurally centered on Anthropic. Kimi includes explicit OAuth and managed platform logic for Moonshot and Kimi Code experiences.
DeerFlow approaches the problem from a harness angle. Instead of one canonical provider matrix, it loads configured model classes and lets middleware or role configuration shape how they are used. That is more extensible, but less immediately uniform.
My verdict on model handling
If you want the best provider-neutral design, look at Mux and Qwen Code. If you want the most honest model-native design, Claude Code is the clearest answer. If you want the most explicit multi-ecosystem adapter, Pochi wins. If you want the most extensible model factory, DeerFlow is the most interesting. If you want the most maximum model breadth (200+ models, no native preference), Hermes via OpenRouter wins outright. And if you want OpenAI-native with real local model support, Codex offers dedicated Ollama and LM Studio crates, not just config-level provider overrides.
The lesson is that "supports many models" is not one thing. Some repos are really doing transport compatibility. Some are doing routing. Some are doing full provider-specific product behavior. And some — like Hermes with MoA — are doing multi-model synthesis that requires genuine model-neutrality to work. The difference matters a lot when tool calling, reasoning settings, auth flows, or MCP behavior start to diverge.