Sandboxing and Execution Isolation
When an AI agent runs shell commands on your machine, sandboxing determines whether a malicious or buggy command can escape and damage your system. This page examines every sandbox mechanism across 18 coding agents â from Docker containers to OS-level Seatbelt profiles.
Sandboxing spectrum overview
These agents span an enormous range of sandboxing investment. At one end, some agents implement full containerized isolation with Docker per session. At the other, agents execute directly on the host with no OS-level sandboxing at all â relying entirely on permission prompts and user vigilance.
| Agent | Sandbox Type | Platform Support | Isolation Level |
|---|---|---|---|
| Codex | OS-native (Seatbelt/bwrap/restricted tokens) | macOS + Linux + Windows | Strong |
| Claude Code | sandbox-runtime (Seatbelt/bwrap) + optional | macOS + Linux | Moderate |
| OpenHands | Docker per-session | Cross-platform (Docker) | Strong |
| DeerFlow | AioSandboxProvider (Docker/Apple Container/K8s) | macOS + Linux + K8s | Strong |
| Hermes | 6 backends (Docker/SSH/Modal/Daytona/Singularity) | Cross-platform | Strong |
| ADK-Rust | adk-sandbox (Seatbelt/bwrap/AppContainer) | macOS + Linux + Windows | Moderate |
| Qwen Code | Docker/Podman container | Cross-platform (Docker) | Strong |
| Goose | ACP protocol, Docker optional | Docker optional | Moderate |
| Wintermolt | Optional Docker | Docker optional | Moderate |
| Pi Mono | None (container recommended in README) | None | None |
| Neovate | Host execution + banned commands | None | None |
| Crush | Host execution + TUI permission prompts | None | None |
| OpenCode | Host execution + permission model | None | None |
| Dirac | Git checkpoints (revert-based) | None | Minimal |
| Pochi | Host execution + apply-diff safety | None | None |
| Kimi CLI | Host execution + protocol-aware | None | None |
| Zaica | Multi-backend (inherits from Hermes lineage) | Docker optional | Moderate |
| Oh My Pi | Permission layers, no OS sandbox | None | None |
Codex: The gold standard for cross-platform sandboxing
Codex implements the most comprehensive cross-platform sandboxing of any agent in this set. It detects the host OS at startup and selects an appropriate sandboxing mechanism â Seatbelt on macOS, bubblewrap on Linux, and restricted tokens on Windows â all governed by a unified policy layer.
SandboxMode enum: three isolation levels
Codex exposes a SandboxMode enum with three tiers:
- ReadOnly â All filesystem paths are read-only. No writes permitted anywhere. Useful for inspection-only tasks.
- WorkspaceWrite â The project workspace is writable, but
.git,.agents, and.codexmetadata directories remain read-only. This is the default mode. - DangerFullAccess â No sandbox restrictions. Full host access. Only for trusted environments.
The CLI exposes --sandbox with these levels, plus codex sandbox {macos,linux,windows} for testing the sandbox on a given platform.
macOS: Seatbelt sandbox-exec
On macOS, Codex uses /usr/bin/sandbox-exec with .sbpl policy files inspired by Chrome's sandbox. The base policy follows a deny-default model:
(deny default)
(allow process-exec)
(allow process-fork)
;; Specific sysctl allowlists
;; PTY support for interactive shells
;; OpenMP shared memory access
The network policy is proxy-aware with loopback detection, Unix domain socket rules, and explicit deny rules for outbound connections when network is restricted. Protected metadata directories (.git, .agents, .codex) stay read-only even under writable roots â a critical defense-in-depth measure.
Linux: bubblewrap + Landlock fallback
On Linux, Codex wraps commands with bwrap (bubblewrap):
--die-with-parentâ child process dies when the wrapper exits--unshare-pidâ isolated PID namespace--unshare-netâ network namespace isolation when network is disabled--ro-bindfor read-only path mounts--bindfor read-write path mounts
Codex detects WSL1 and emits a warning â WSL1 lacks proper namespace isolation. It probes user namespace access with a 500ms timeout. When bwrap is unavailable, Codex falls back to Landlock, the Linux kernel's filesystem restriction mechanism. It also uses seccomp for syscall filtering as an additional layer.
Windows: Restricted tokens + AppContainer
On Windows, Codex uses the CreateRestrictedToken API to create a process token with reduced privileges. It integrates AppContainer for additional isolation and uses split-filesystem policies to separate read-only and writable paths. This is the hardest platform to sandbox correctly, and Codex is the only agent with a full Windows implementation.
FileSystemSandboxPolicy and NetworkSandboxPolicy
Codex implements a structured policy layer:
- FileSystemSandboxPolicy â entries with path, access mode (read/write/none), and special paths (project-root, tmp, home). Resolution uses most-specific-path-wins, then access-mode tie-breaker.
- NetworkSandboxPolicy â three modes: Unrestricted, Restricted (loopback only), or Managed (proxy-enforced with domain allowlists).
Codex is the only agent with all three platforms
Every other agent either skips Windows entirely (Claude Code), doesn't implement OS-level sandboxing (most agents), or has only a stub for one platform (ADK-Rust's WindowsEnforcer). Codex's 3-platform coverage is a significant engineering investment.
ADK-Rust: Declarative sandbox policies with platform enforcers
ADK-Rust takes a declarative approach to sandboxing. Policies are defined as data structures and enforced by platform-specific backends. This separation of policy from enforcement enables fail-closed semantics â if the backend cannot enforce a requested control, execution is rejected rather than silently downgraded.
SandboxPolicy and SandboxPolicyBuilder
The SandboxPolicy struct carries:
allowed_pathsâ mapped toAccessMode::ReadOnlyorAccessMode::ReadWriteallow_networkâ boolean network access togglenetwork_rulesâ per-domain allow/deny rulesenvironmentâ environment variable restrictions
A fluent SandboxPolicyBuilder API enables chain construction:
SandboxPolicyBuilder::new()
.allow_read("/usr")
.allow_read_write("/tmp/work")
.allow_network_domain("api.example.com")
.build()
MacOsEnforcer: Seatbelt DSL wrapping
On macOS, MacOsEnforcer generates .sbpl profiles dynamically and invokes sandbox-exec -p <profile>. The generated Seatbelt DSL follows the deny-default pattern:
(version 1)
(deny default)
(allow process-exec)
(allow file-read* (subpath "/usr/lib"))
(allow file-read* file-write* (subpath "/tmp/work"))
LinuxEnforcer: bubblewrap wrapping
On Linux, LinuxEnforcer wraps commands with bwrap:
--die-with-parentâ ensures child process cleanup--unshare-pidâ PID namespace isolation--unshare-netâ network namespace when network is disabled--ro-bind/--bindâ filesystem access mapping--new-sessionâ process session isolation
WindowsEnforcer: AppContainer with ACLs
The WindowsEnforcer takes a fundamentally different approach from macOS/Linux. Instead of wrapping the command in a sandbox process, it:
- Creates an AppContainer profile with a unique SID
- Sets ACLs on allowed filesystem paths
- Configures the process token via
configure_command()
This is the hardest platform to sandbox. ADK-Rust's implementation is marked as a stub â it creates the AppContainer profile but has partial coverage of the full Windows security model.
ProcessBackend vs WasmBackend
ADK-Rust supports two execution backends:
- ProcessBackend â runs a subprocess with optional enforcer wrapping
- WasmBackend â runs code in-process via WebAssembly with built-in isolation
Fail-closed semantics
The critical design decision in ADK-Rust is validate_policy(): before executing any code, the system verifies that the backend can actually enforce the requested policy. If it cannot, execution is rejected. This is the opposite of silent downgrade â most other agents fall back to weaker isolation without telling the user.
Feature flags
Sandboxing is controlled by Cargo feature flags: sandbox-macos, sandbox-linux, sandbox-windows, and sandbox-native. This allows platforms to opt into only the sandboxing they can support.
Windows is a stub implementation
ADK-Rust's WindowsEnforcer creates AppContainer profiles but does not match Codex's full Windows sandbox coverage. The Windows security model (restricted tokens, job objects, integrity levels, AppLocker) is complex and requires significant investment to implement correctly.
Claude Code: Permission-driven sandbox routing
Claude Code uses the @anthropic-ai/sandbox-runtime package to implement platform-specific sandboxing. Unlike Codex's always-on approach, Claude Code makes sandboxing decisions per-command and integrates sandbox state deeply into the permission system.
Platform-specific enforcement
Claude Code implements sandboxing on two platforms:
- macOS â Seatbelt via
sandbox-exec - Linux/WSL â bubblewrap (bwrap)
- Windows â Not supported
shouldUseSandbox.ts: per-command decision
The shouldUseSandbox.ts module decides per-command whether to sandbox. This is a key differentiator â most agents apply sandbox settings globally. Claude Code evaluates each command individually:
- Read-only commands like
git statusmay bypass sandboxing - Commands that write outside the workspace are always sandboxed
- Commands matching excluded patterns can bypass sandboxing
SandboxManager class
The SandboxManager class coordinates sandbox state:
isSandboxingEnabled()â checks if sandbox is activeisAutoAllowBashIfSandboxedEnabled()â auto-approval for sandboxed commandsareUnsandboxedCommandsAllowed()â whether non-sandboxed commands are permittedcheckDependencies()â verifies bwrap/Seatbelt availability
/sandbox command
Users can toggle sandboxing at runtime via the /sandbox command:
- Toggle sandbox on/off
- Exclude specific command patterns from sandboxing
- View violation count
UI feedback: SandboxViolationEvent and SandboxPromptFooterHint
Claude Code provides visible feedback when sandboxing blocks commands:
SandboxViolationEventandSandboxViolationExpandedViewrender UI feedback for blocked commandsSandboxPromptFooterHintshows "Sandbox blocked N attempts" in the prompt footer
Environment variables and settings
CLAUDE_CODE_FORCE_SANDBOXâ enforces sandbox in bridge/SDK modesandbox.failIfUnavailableâ setting that refuses to start without a working sandbox
Git worktree isolation for subagents
Claude Code isolates subagent execution in git worktrees. Each subagent operates on a separate worktree, preventing filesystem conflicts and providing a form of version-control-level sandboxing. Network permission requests are forwarded to the SDK host via the can_use_tool protocol.
Per-command sandboxing is underrated
Most agents sandbox everything or nothing. Claude Code's per-command decision means ls and git status run unsandboxed (faster, no overhead) while rm -rf runs in a sandbox. This is the right tradeoff â sandbox only when it matters.
OpenHands: Docker-first isolation
OpenHands takes the strongest isolation approach: every session runs inside a Docker container. The agent never executes directly on the host machine â it operates entirely within a containerized environment.
Docker container per session
Each OpenHands session launches a container from ghcr.io/openhands/agent-server. Session IDs are base62-generated and used for container naming, ensuring unique isolation per session.
Container lifecycle management
OpenHands manages the full container lifecycle:
- Create â container spawned with the agent-server image
- Pause â container paused to preserve state
- Resume â container resumed for continued work
- Delete â container torn down on session exit
Network configuration
OpenHands uses bridge mode networking by default with dynamic port mapping. Host networking is available as an optional configuration for scenarios requiring direct host network access.
Authentication: X-Session-API-Key
Communication with running containers is authenticated via X-Session-API-Key headers, validated against running containers to prevent unauthorized access.
File synchronization
Host-container file synchronization handles workspace access â files written inside the container are synced back to the host, and workspace files are mounted into the container for agent access.
Process-based sandbox alternative
For local development, OpenHands offers a process-based sandbox as an alternative to Docker. This bypasses container overhead but sacrifices isolation guarantees â useful for development but not for production.
Remote sandbox for cloud deployments
OpenHands supports remote sandbox execution for cloud deployments, allowing sessions to run on remote Docker hosts rather than the local machine.
DeerFlow: Pluggable sandbox backends
DeerFlow implements a pluggable sandbox architecture with multiple backend options, allowing deployment flexibility from local development to cloud-scale execution.
LocalSandboxProvider
The LocalSandboxProvider runs commands on the host directly with path mapping but no process isolation. It is the simplest backend â useful for development but providing no actual sandbox guarantees.
AioSandboxProvider with pluggable backends
The AioSandboxProvider abstracts over multiple backends:
- LocalContainerBackend â Docker or Apple Container (macOS-specific lightweight containerization)
- RemoteSandboxBackend â Kubernetes via provisioner service for cloud-scale deployment
Virtual path system
DeerFlow uses a virtual path system mapping:
/mnt/user-data/workspaceâ project workspace/mnt/user-data/uploadsâ uploaded files/mnt/user-data/outputâ execution output
This virtual path layer abstracts away the differences between local, container, and Kubernetes backends â the agent always sees the same paths regardless of execution environment.
Apple Container
On macOS, DeerFlow can use Apple's lightweight container runtime as an alternative to Docker. This provides OS-level isolation without requiring Docker installation.
Hermes: 6-runtime-backend architecture
Hermes implements the most diverse set of sandbox backends through a BaseEnvironment ABC that serves as a common interface across all runtimes.
The six backends
Local
Zero isolation, direct host execution. Useful only for development or fully trusted environments. No process or filesystem restrictions.
Docker
Hardened Linux containers with capability dropping, tmpfs mounts, and PID limits. The strongest local backend Hermes offers.
SSH
Remote machine execution via ControlMaster connections with file synchronization. Isolation comes from the physical separation of the remote host.
Modal
Cloud-native sandboxing with filesystem snapshots. Code executes in Modal's cloud infrastructure with isolated function containers.
Daytona
Serverless cloud sandboxes with filesystem persistence. Each session gets an isolated development environment in the cloud.
Singularity
HPC-focused container isolation. Designed for high-performance computing environments where Docker's daemon model is unavailable or undesirable.
BaseEnvironment ABC
All backends implement the BaseEnvironment abstract base class, providing a common interface for execution, file access, and lifecycle management. This abstraction allows Hermes to swap backends without changing agent code.
Platform-specific deep comparison
Each platform offers fundamentally different sandboxing primitives. Understanding these differences is critical when building a cross-platform agent.
| Feature | macOS | Linux | Windows |
|---|---|---|---|
| Primary mechanism | Seatbelt (sandbox-exec) | bwrap + Landlock + seccomp | AppContainer / Restricted tokens |
| Invocation | sandbox-exec -p <profile> <cmd> | bwrap <args> -- <cmd> | CreateRestrictedToken + ACLs |
| Filesystem | Scheme DSL (.sbpl), subpath rules, regex globs | Bind mounts (--ro-bind, --bind), Landlock access FS | ACLs on allowed paths, SID restrictions |
| Network | deny default + allow rules, proxy-aware | --unshare-net, network proxy | Network capability restrictions |
| Process | process-exec/process-fork allowlists | --unshare-pid, --die-with-parent, --new-session | Process token restrictions |
| Protected paths | .git, .agents, .codex read-only | .git, .agents, .codex read-only | Protected directory ACLs |
| Fallback | None (Seatbelt always available on macOS) | Landlock when bwrap unavailable, WSL1 detection | AppContainer stub (partial) |
| Codex support | â Full | â Full (bwrap + Landlock) | â Full (restricted tokens) |
| ADK-Rust support | â MacOsEnforcer | â LinuxEnforcer | â ī¸ WindowsEnforcer (stub) |
| Claude Code support | â Seatbelt | â bwrap | â Not supported |
What most agents are missing for sandboxing
Several critical sandboxing patterns appear in only one or two agents despite being broadly applicable to any agent that runs code on behalf of users.
Windows sandboxing is severely under-served
Only Codex has a real Windows sandbox. ADK-Rust has a stub. Claude Code doesn't support Windows sandboxing at all. This leaves the majority of Windows users running agent commands with no OS-level isolation whatsoever.
Landlock adoption is minimal
Only Codex uses Landlock as a bwrap fallback. It's the future of Linux filesystem sandboxing (kernel 5.13+, always available, no daemon required) but most agents don't know about it.
- Network isolation is rare. Most agents that sandbox focus on filesystem restrictions only, ignoring network access entirely. Codex and ADK-Rust are exceptions â they implement network policies with proxy enforcement and domain allowlists.
- Sandbox policy validation is nearly absent. Only ADK-Rust validates that the backend can actually enforce the requested policy before running code (fail-closed). Other agents silently fall back to weaker isolation without notifying the user or the agent.
- Per-command sandbox decisions are unique to Claude Code. Only Claude Code's
shouldUseSandbox.tsdecides per-command whether to sandbox. Most agents apply sandbox settings globally â either everything runs sandboxed or nothing does. This is suboptimal because many commands (likegit statusorls) are inherently safe to run unsandboxed.
What you should build into a new agent
If you're building a new coding agent, here's the sandboxing investment that matters most, informed by what works across all agents examined above.
Use Codex's 3-tier model
Start with ReadOnly, WorkspaceWrite, and DangerFullAccess isolation levels. This gives users a clear mental model: no writes, workspace-only, or full trust. Most agents lack this graduated approach entirely.
macOS: Seatbelt with deny-default
Use sandbox-exec with deny-default policies. Seatbelt is kernel-enforced and always available on macOS â no installation required. Generate .sbpl profiles dynamically from policy data structures.
Linux: bwrap + Landlock fallback
Use bwrap for namespace isolation (--die-with-parent, --unshare-pid, --ro-bind). Check for user namespace access with a probe timeout. When bwrap is unavailable, fall back to Landlock for filesystem restrictions. Never silently skip sandboxing.
Windows: AppContainer with ACLs
This is the hardest platform. Use AppContainer with ACL-based path restrictions. Create unique SIDs per session. Set filesystem ACLs on allowed paths. Don't take shortcuts â Windows security requires full investment.
Fail-closed validation
Implement ADK-Rust-style validate_policy(): if the sandbox backend cannot enforce a policy, reject execution rather than silently downgrading. The user and agent must know when isolation is weaker than requested.
Per-command sandbox decisions
Implement Claude Code-style per-command sandboxing: some commands are safe unsandboxed (git status, ls, cat), others always need sandboxing (anything that writes outside the workspace, runs unknown code, or modifies system state).
Protect agent metadata directories
Even under writable workspace roots, always protect .git and agent metadata directories (.agents, .codex) as read-only. Codex does this â it prevents the agent from corrupting its own version control and configuration state during execution.
Test on all three platforms
Most sandbox bugs are platform-specific. Seatbelt policies that work on macOS may not translate to bwrap on Linux. AppContainer on Windows requires entirely different primitives. Test your sandbox on every platform you claim to support â don't assume cross-platform portability.
Bottom line
Sandboxing is the single most unevenly implemented feature across coding agents. Codex sets the bar with full cross-platform coverage (Seatbelt, bwrap + Landlock, restricted tokens). OpenHands takes the strongest isolation path with Docker per session. Claude Code innovates with per-command sandbox decisions. ADK-Rust pioneers fail-closed policy validation.
But the majority of agents â Pi Mono, Neovate, Crush, OpenCode, Pochi, Kimi CLI, Oh My Pi â run code directly on the host with no OS-level sandbox whatsoever. They rely entirely on permission prompts, banned command lists, and user vigilance. This is acceptable for local development on trusted machines but represents a significant security gap for any agent deployed on production infrastructure.
For new agent builders, the path is clear: implement platform-native sandboxing from day one, validate policies fail-closed, and never silently downgrade isolation. The gap between agents that sandbox and agents that don't is only going to widen as the security implications of AI code execution become more widely understood.