macOS Seatbelt • Linux bwrap/Landlock • Windows AppContainer • Docker Containers

Sandboxing and Execution Isolation

When an AI agent runs shell commands on your machine, sandboxing determines whether a malicious or buggy command can escape and damage your system. This page examines every sandbox mechanism across 18 coding agents — from Docker containers to OS-level Seatbelt profiles.

(Alright, ad over. Back to the serious technical analysis.)

Sandboxing spectrum overview

These agents span an enormous range of sandboxing investment. At one end, some agents implement full containerized isolation with Docker per session. At the other, agents execute directly on the host with no OS-level sandboxing at all — relying entirely on permission prompts and user vigilance.

Agent	Sandbox Type	Platform Support	Isolation Level
Codex	OS-native (Seatbelt/bwrap/restricted tokens)	macOS + Linux + Windows	Strong
Claude Code	sandbox-runtime (Seatbelt/bwrap) + optional	macOS + Linux	Moderate
OpenHands	Docker per-session	Cross-platform (Docker)	Strong
DeerFlow	AioSandboxProvider (Docker/Apple Container/K8s)	macOS + Linux + K8s	Strong
Hermes	6 backends (Docker/SSH/Modal/Daytona/Singularity)	Cross-platform	Strong
ADK-Rust	adk-sandbox (Seatbelt/bwrap/AppContainer)	macOS + Linux + Windows	Moderate
Qwen Code	Docker/Podman container	Cross-platform (Docker)	Strong
Goose	ACP protocol, Docker optional	Docker optional	Moderate
Wintermolt	Optional Docker	Docker optional	Moderate
Pi Mono	None (container recommended in README)	None	None
Neovate	Host execution + banned commands	None	None
Crush	Host execution + TUI permission prompts	None	None
OpenCode	Host execution + permission model	None	None
Dirac	Git checkpoints (revert-based)	None	Minimal
Pochi	Host execution + apply-diff safety	None	None
Kimi CLI	Host execution + protocol-aware	None	None
Zaica	Multi-backend (inherits from Hermes lineage)	Docker optional	Moderate
Oh My Pi	Permission layers, no OS sandbox	None	None

Codex: The gold standard for cross-platform sandboxing

Codex implements the most comprehensive cross-platform sandboxing of any agent in this set. It detects the host OS at startup and selects an appropriate sandboxing mechanism — Seatbelt on macOS, bubblewrap on Linux, and restricted tokens on Windows — all governed by a unified policy layer.

SandboxMode enum: three isolation levels

Codex exposes a SandboxMode enum with three tiers:

ReadOnly — All filesystem paths are read-only. No writes permitted anywhere. Useful for inspection-only tasks.
WorkspaceWrite — The project workspace is writable, but .git, .agents, and .codex metadata directories remain read-only. This is the default mode.
DangerFullAccess — No sandbox restrictions. Full host access. Only for trusted environments.

The CLI exposes --sandbox with these levels, plus codex sandbox {macos,linux,windows} for testing the sandbox on a given platform.

macOS: Seatbelt sandbox-exec

On macOS, Codex uses /usr/bin/sandbox-exec with .sbpl policy files inspired by Chrome's sandbox. The base policy follows a deny-default model:

(deny default)
(allow process-exec)
(allow process-fork)
;; Specific sysctl allowlists
;; PTY support for interactive shells
;; OpenMP shared memory access

The network policy is proxy-aware with loopback detection, Unix domain socket rules, and explicit deny rules for outbound connections when network is restricted. Protected metadata directories (.git, .agents, .codex) stay read-only even under writable roots — a critical defense-in-depth measure.

Linux: bubblewrap + Landlock fallback

On Linux, Codex wraps commands with bwrap (bubblewrap):

--die-with-parent — child process dies when the wrapper exits
--unshare-pid — isolated PID namespace
--unshare-net — network namespace isolation when network is disabled
--ro-bind for read-only path mounts
--bind for read-write path mounts

Codex detects WSL1 and emits a warning — WSL1 lacks proper namespace isolation. It probes user namespace access with a 500ms timeout. When bwrap is unavailable, Codex falls back to Landlock, the Linux kernel's filesystem restriction mechanism. It also uses seccomp for syscall filtering as an additional layer.

Windows: Restricted tokens + AppContainer

On Windows, Codex uses the CreateRestrictedToken API to create a process token with reduced privileges. It integrates AppContainer for additional isolation and uses split-filesystem policies to separate read-only and writable paths. This is the hardest platform to sandbox correctly, and Codex is the only agent with a full Windows implementation.

FileSystemSandboxPolicy and NetworkSandboxPolicy

Codex implements a structured policy layer:

FileSystemSandboxPolicy — entries with path, access mode (read/write/none), and special paths (project-root, tmp, home). Resolution uses most-specific-path-wins, then access-mode tie-breaker.
NetworkSandboxPolicy — three modes: Unrestricted, Restricted (loopback only), or Managed (proxy-enforced with domain allowlists).

ℹ️

Codex is the only agent with all three platforms

Every other agent either skips Windows entirely (Claude Code), doesn't implement OS-level sandboxing (most agents), or has only a stub for one platform (ADK-Rust's WindowsEnforcer). Codex's 3-platform coverage is a significant engineering investment.

ADK-Rust: Declarative sandbox policies with platform enforcers

ADK-Rust takes a declarative approach to sandboxing. Policies are defined as data structures and enforced by platform-specific backends. This separation of policy from enforcement enables fail-closed semantics — if the backend cannot enforce a requested control, execution is rejected rather than silently downgraded.

SandboxPolicy and SandboxPolicyBuilder

The SandboxPolicy struct carries:

allowed_paths — mapped to AccessMode::ReadOnly or AccessMode::ReadWrite
allow_network — boolean network access toggle
network_rules — per-domain allow/deny rules
environment — environment variable restrictions

A fluent SandboxPolicyBuilder API enables chain construction:

SandboxPolicyBuilder::new()
    .allow_read("/usr")
    .allow_read_write("/tmp/work")
    .allow_network_domain("api.example.com")
    .build()

MacOsEnforcer: Seatbelt DSL wrapping

On macOS, MacOsEnforcer generates .sbpl profiles dynamically and invokes sandbox-exec -p <profile>. The generated Seatbelt DSL follows the deny-default pattern:

(version 1)
(deny default)
(allow process-exec)
(allow file-read* (subpath "/usr/lib"))
(allow file-read* file-write* (subpath "/tmp/work"))

LinuxEnforcer: bubblewrap wrapping

On Linux, LinuxEnforcer wraps commands with bwrap:

--die-with-parent — ensures child process cleanup
--unshare-pid — PID namespace isolation
--unshare-net — network namespace when network is disabled
--ro-bind / --bind — filesystem access mapping
--new-session — process session isolation

WindowsEnforcer: AppContainer with ACLs

The WindowsEnforcer takes a fundamentally different approach from macOS/Linux. Instead of wrapping the command in a sandbox process, it:

Creates an AppContainer profile with a unique SID
Sets ACLs on allowed filesystem paths
Configures the process token via configure_command()

This is the hardest platform to sandbox. ADK-Rust's implementation is marked as a stub — it creates the AppContainer profile but has partial coverage of the full Windows security model.

ProcessBackend vs WasmBackend

ADK-Rust supports two execution backends:

ProcessBackend — runs a subprocess with optional enforcer wrapping
WasmBackend — runs code in-process via WebAssembly with built-in isolation

Fail-closed semantics

The critical design decision in ADK-Rust is validate_policy(): before executing any code, the system verifies that the backend can actually enforce the requested policy. If it cannot, execution is rejected. This is the opposite of silent downgrade — most other agents fall back to weaker isolation without telling the user.

Feature flags

Sandboxing is controlled by Cargo feature flags: sandbox-macos, sandbox-linux, sandbox-windows, and sandbox-native. This allows platforms to opt into only the sandboxing they can support.

⚠️

Windows is a stub implementation

ADK-Rust's WindowsEnforcer creates AppContainer profiles but does not match Codex's full Windows sandbox coverage. The Windows security model (restricted tokens, job objects, integrity levels, AppLocker) is complex and requires significant investment to implement correctly.

Claude Code: Permission-driven sandbox routing

Claude Code uses the @anthropic-ai/sandbox-runtime package to implement platform-specific sandboxing. Unlike Codex's always-on approach, Claude Code makes sandboxing decisions per-command and integrates sandbox state deeply into the permission system.

Platform-specific enforcement

Claude Code implements sandboxing on two platforms:

macOS — Seatbelt via sandbox-exec
Linux/WSL — bubblewrap (bwrap)
Windows — Not supported

shouldUseSandbox.ts: per-command decision

The shouldUseSandbox.ts module decides per-command whether to sandbox. This is a key differentiator — most agents apply sandbox settings globally. Claude Code evaluates each command individually:

Read-only commands like git status may bypass sandboxing
Commands that write outside the workspace are always sandboxed
Commands matching excluded patterns can bypass sandboxing

SandboxManager class

The SandboxManager class coordinates sandbox state:

isSandboxingEnabled() — checks if sandbox is active
isAutoAllowBashIfSandboxedEnabled() — auto-approval for sandboxed commands
areUnsandboxedCommandsAllowed() — whether non-sandboxed commands are permitted
checkDependencies() — verifies bwrap/Seatbelt availability

/sandbox command

Users can toggle sandboxing at runtime via the /sandbox command:

Toggle sandbox on/off
Exclude specific command patterns from sandboxing
View violation count

UI feedback: SandboxViolationEvent and SandboxPromptFooterHint

Claude Code provides visible feedback when sandboxing blocks commands:

SandboxViolationEvent and SandboxViolationExpandedView render UI feedback for blocked commands
SandboxPromptFooterHint shows "Sandbox blocked N attempts" in the prompt footer

Environment variables and settings

CLAUDE_CODE_FORCE_SANDBOX — enforces sandbox in bridge/SDK mode
sandbox.failIfUnavailable — setting that refuses to start without a working sandbox

Git worktree isolation for subagents

Claude Code isolates subagent execution in git worktrees. Each subagent operates on a separate worktree, preventing filesystem conflicts and providing a form of version-control-level sandboxing. Network permission requests are forwarded to the SDK host via the can_use_tool protocol.

💡

Per-command sandboxing is underrated

Most agents sandbox everything or nothing. Claude Code's per-command decision means ls and git status run unsandboxed (faster, no overhead) while rm -rf runs in a sandbox. This is the right tradeoff — sandbox only when it matters.

OpenHands: Docker-first isolation

OpenHands takes the strongest isolation approach: every session runs inside a Docker container. The agent never executes directly on the host machine — it operates entirely within a containerized environment.

Docker container per session

Each OpenHands session launches a container from ghcr.io/openhands/agent-server. Session IDs are base62-generated and used for container naming, ensuring unique isolation per session.

Container lifecycle management

OpenHands manages the full container lifecycle:

Create — container spawned with the agent-server image
Pause — container paused to preserve state
Resume — container resumed for continued work
Delete — container torn down on session exit

Network configuration

OpenHands uses bridge mode networking by default with dynamic port mapping. Host networking is available as an optional configuration for scenarios requiring direct host network access.

Authentication: X-Session-API-Key

Communication with running containers is authenticated via X-Session-API-Key headers, validated against running containers to prevent unauthorized access.

File synchronization

Host-container file synchronization handles workspace access — files written inside the container are synced back to the host, and workspace files are mounted into the container for agent access.

Process-based sandbox alternative

For local development, OpenHands offers a process-based sandbox as an alternative to Docker. This bypasses container overhead but sacrifices isolation guarantees — useful for development but not for production.

Remote sandbox for cloud deployments

OpenHands supports remote sandbox execution for cloud deployments, allowing sessions to run on remote Docker hosts rather than the local machine.

DeerFlow: Pluggable sandbox backends

DeerFlow implements a pluggable sandbox architecture with multiple backend options, allowing deployment flexibility from local development to cloud-scale execution.

LocalSandboxProvider

The LocalSandboxProvider runs commands on the host directly with path mapping but no process isolation. It is the simplest backend — useful for development but providing no actual sandbox guarantees.

AioSandboxProvider with pluggable backends

The AioSandboxProvider abstracts over multiple backends:

LocalContainerBackend — Docker or Apple Container (macOS-specific lightweight containerization)
RemoteSandboxBackend — Kubernetes via provisioner service for cloud-scale deployment

Virtual path system

DeerFlow uses a virtual path system mapping:

/mnt/user-data/workspace — project workspace
/mnt/user-data/uploads — uploaded files
/mnt/user-data/output — execution output

This virtual path layer abstracts away the differences between local, container, and Kubernetes backends — the agent always sees the same paths regardless of execution environment.

Apple Container

On macOS, DeerFlow can use Apple's lightweight container runtime as an alternative to Docker. This provides OS-level isolation without requiring Docker installation.

Hermes: 6-runtime-backend architecture

Hermes implements the most diverse set of sandbox backends through a BaseEnvironment ABC that serves as a common interface across all runtimes.

The six backends

Local

Zero isolation, direct host execution. Useful only for development or fully trusted environments. No process or filesystem restrictions.

Docker

Hardened Linux containers with capability dropping, tmpfs mounts, and PID limits. The strongest local backend Hermes offers.

SSH

Remote machine execution via ControlMaster connections with file synchronization. Isolation comes from the physical separation of the remote host.

Modal

Cloud-native sandboxing with filesystem snapshots. Code executes in Modal's cloud infrastructure with isolated function containers.

Daytona

Serverless cloud sandboxes with filesystem persistence. Each session gets an isolated development environment in the cloud.

Singularity

HPC-focused container isolation. Designed for high-performance computing environments where Docker's daemon model is unavailable or undesirable.

BaseEnvironment ABC

All backends implement the BaseEnvironment abstract base class, providing a common interface for execution, file access, and lifecycle management. This abstraction allows Hermes to swap backends without changing agent code.

Platform-specific deep comparison

Each platform offers fundamentally different sandboxing primitives. Understanding these differences is critical when building a cross-platform agent.

Feature	macOS	Linux	Windows
Primary mechanism	Seatbelt (`sandbox-exec`)	bwrap + Landlock + seccomp	AppContainer / Restricted tokens
Invocation	`sandbox-exec -p <profile> <cmd>`	`bwrap <args> -- <cmd>`	`CreateRestrictedToken` + ACLs
Filesystem	Scheme DSL (.sbpl), subpath rules, regex globs	Bind mounts (--ro-bind, --bind), Landlock access FS	ACLs on allowed paths, SID restrictions
Network	deny default + allow rules, proxy-aware	`--unshare-net`, network proxy	Network capability restrictions
Process	process-exec/process-fork allowlists	`--unshare-pid`, `--die-with-parent`, `--new-session`	Process token restrictions
Protected paths	.git, .agents, .codex read-only	.git, .agents, .codex read-only	Protected directory ACLs
Fallback	None (Seatbelt always available on macOS)	Landlock when bwrap unavailable, WSL1 detection	AppContainer stub (partial)
Codex support	✅ Full	✅ Full (bwrap + Landlock)	✅ Full (restricted tokens)
ADK-Rust support	✅ MacOsEnforcer	✅ LinuxEnforcer	⚠️ WindowsEnforcer (stub)
Claude Code support	✅ Seatbelt	✅ bwrap	❌ Not supported

What most agents are missing for sandboxing

Several critical sandboxing patterns appear in only one or two agents despite being broadly applicable to any agent that runs code on behalf of users.

⚠️

Windows sandboxing is severely under-served

Only Codex has a real Windows sandbox. ADK-Rust has a stub. Claude Code doesn't support Windows sandboxing at all. This leaves the majority of Windows users running agent commands with no OS-level isolation whatsoever.

ℹ️

Landlock adoption is minimal

Only Codex uses Landlock as a bwrap fallback. It's the future of Linux filesystem sandboxing (kernel 5.13+, always available, no daemon required) but most agents don't know about it.

Network isolation is rare. Most agents that sandbox focus on filesystem restrictions only, ignoring network access entirely. Codex and ADK-Rust are exceptions — they implement network policies with proxy enforcement and domain allowlists.
Sandbox policy validation is nearly absent. Only ADK-Rust validates that the backend can actually enforce the requested policy before running code (fail-closed). Other agents silently fall back to weaker isolation without notifying the user or the agent.
Per-command sandbox decisions are unique to Claude Code. Only Claude Code's shouldUseSandbox.ts decides per-command whether to sandbox. Most agents apply sandbox settings globally — either everything runs sandboxed or nothing does. This is suboptimal because many commands (like git status or ls) are inherently safe to run unsandboxed.

What you should build into a new agent

If you're building a new coding agent, here's the sandboxing investment that matters most, informed by what works across all agents examined above.

Use Codex's 3-tier model

Start with ReadOnly, WorkspaceWrite, and DangerFullAccess isolation levels. This gives users a clear mental model: no writes, workspace-only, or full trust. Most agents lack this graduated approach entirely.

macOS: Seatbelt with deny-default

Use sandbox-exec with deny-default policies. Seatbelt is kernel-enforced and always available on macOS — no installation required. Generate .sbpl profiles dynamically from policy data structures.

Linux: bwrap + Landlock fallback

Use bwrap for namespace isolation (--die-with-parent, --unshare-pid, --ro-bind). Check for user namespace access with a probe timeout. When bwrap is unavailable, fall back to Landlock for filesystem restrictions. Never silently skip sandboxing.

Windows: AppContainer with ACLs

This is the hardest platform. Use AppContainer with ACL-based path restrictions. Create unique SIDs per session. Set filesystem ACLs on allowed paths. Don't take shortcuts — Windows security requires full investment.

Fail-closed validation

Implement ADK-Rust-style validate_policy(): if the sandbox backend cannot enforce a policy, reject execution rather than silently downgrading. The user and agent must know when isolation is weaker than requested.

Per-command sandbox decisions

Implement Claude Code-style per-command sandboxing: some commands are safe unsandboxed (git status, ls, cat), others always need sandboxing (anything that writes outside the workspace, runs unknown code, or modifies system state).

💡

Protect agent metadata directories

Even under writable workspace roots, always protect .git and agent metadata directories (.agents, .codex) as read-only. Codex does this — it prevents the agent from corrupting its own version control and configuration state during execution.

ℹ️

Test on all three platforms

Most sandbox bugs are platform-specific. Seatbelt policies that work on macOS may not translate to bwrap on Linux. AppContainer on Windows requires entirely different primitives. Test your sandbox on every platform you claim to support — don't assume cross-platform portability.

Bottom line

Sandboxing is the single most unevenly implemented feature across coding agents. Codex sets the bar with full cross-platform coverage (Seatbelt, bwrap + Landlock, restricted tokens). OpenHands takes the strongest isolation path with Docker per session. Claude Code innovates with per-command sandbox decisions. ADK-Rust pioneers fail-closed policy validation.

But the majority of agents — Pi Mono, Neovate, Crush, OpenCode, Pochi, Kimi CLI, Oh My Pi — run code directly on the host with no OS-level sandbox whatsoever. They rely entirely on permission prompts, banned command lists, and user vigilance. This is acceptable for local development on trusted machines but represents a significant security gap for any agent deployed on production infrastructure.

For new agent builders, the path is clear: implement platform-native sandboxing from day one, validate policies fail-closed, and never silently downgrade isolation. The gap between agents that sandbox and agents that don't is only going to widen as the security implications of AI code execution become more widely understood.