← AI Coding Guides â€ē Deep Dives
macOS Seatbelt â€ĸ Linux bwrap/Landlock â€ĸ Windows AppContainer â€ĸ Docker Containers

Sandboxing and Execution Isolation

When an AI agent runs shell commands on your machine, sandboxing determines whether a malicious or buggy command can escape and damage your system. This page examines every sandbox mechanism across 18 coding agents — from Docker containers to OS-level Seatbelt profiles.

(Alright, ad over. Back to the serious technical analysis.)

Sandboxing spectrum overview

These agents span an enormous range of sandboxing investment. At one end, some agents implement full containerized isolation with Docker per session. At the other, agents execute directly on the host with no OS-level sandboxing at all — relying entirely on permission prompts and user vigilance.

AgentSandbox TypePlatform SupportIsolation Level
CodexOS-native (Seatbelt/bwrap/restricted tokens)macOS + Linux + WindowsStrong
Claude Codesandbox-runtime (Seatbelt/bwrap) + optionalmacOS + LinuxModerate
OpenHandsDocker per-sessionCross-platform (Docker)Strong
DeerFlowAioSandboxProvider (Docker/Apple Container/K8s)macOS + Linux + K8sStrong
Hermes6 backends (Docker/SSH/Modal/Daytona/Singularity)Cross-platformStrong
ADK-Rustadk-sandbox (Seatbelt/bwrap/AppContainer)macOS + Linux + WindowsModerate
Qwen CodeDocker/Podman containerCross-platform (Docker)Strong
GooseACP protocol, Docker optionalDocker optionalModerate
WintermoltOptional DockerDocker optionalModerate
Pi MonoNone (container recommended in README)NoneNone
NeovateHost execution + banned commandsNoneNone
CrushHost execution + TUI permission promptsNoneNone
OpenCodeHost execution + permission modelNoneNone
DiracGit checkpoints (revert-based)NoneMinimal
PochiHost execution + apply-diff safetyNoneNone
Kimi CLIHost execution + protocol-awareNoneNone
ZaicaMulti-backend (inherits from Hermes lineage)Docker optionalModerate
Oh My PiPermission layers, no OS sandboxNoneNone

Codex: The gold standard for cross-platform sandboxing

Codex implements the most comprehensive cross-platform sandboxing of any agent in this set. It detects the host OS at startup and selects an appropriate sandboxing mechanism — Seatbelt on macOS, bubblewrap on Linux, and restricted tokens on Windows — all governed by a unified policy layer.

SandboxMode enum: three isolation levels

Codex exposes a SandboxMode enum with three tiers:

The CLI exposes --sandbox with these levels, plus codex sandbox {macos,linux,windows} for testing the sandbox on a given platform.

macOS: Seatbelt sandbox-exec

On macOS, Codex uses /usr/bin/sandbox-exec with .sbpl policy files inspired by Chrome's sandbox. The base policy follows a deny-default model:

(deny default)
(allow process-exec)
(allow process-fork)
;; Specific sysctl allowlists
;; PTY support for interactive shells
;; OpenMP shared memory access

The network policy is proxy-aware with loopback detection, Unix domain socket rules, and explicit deny rules for outbound connections when network is restricted. Protected metadata directories (.git, .agents, .codex) stay read-only even under writable roots — a critical defense-in-depth measure.

Linux: bubblewrap + Landlock fallback

On Linux, Codex wraps commands with bwrap (bubblewrap):

Codex detects WSL1 and emits a warning — WSL1 lacks proper namespace isolation. It probes user namespace access with a 500ms timeout. When bwrap is unavailable, Codex falls back to Landlock, the Linux kernel's filesystem restriction mechanism. It also uses seccomp for syscall filtering as an additional layer.

Windows: Restricted tokens + AppContainer

On Windows, Codex uses the CreateRestrictedToken API to create a process token with reduced privileges. It integrates AppContainer for additional isolation and uses split-filesystem policies to separate read-only and writable paths. This is the hardest platform to sandbox correctly, and Codex is the only agent with a full Windows implementation.

FileSystemSandboxPolicy and NetworkSandboxPolicy

Codex implements a structured policy layer:

â„šī¸

Codex is the only agent with all three platforms

Every other agent either skips Windows entirely (Claude Code), doesn't implement OS-level sandboxing (most agents), or has only a stub for one platform (ADK-Rust's WindowsEnforcer). Codex's 3-platform coverage is a significant engineering investment.

ADK-Rust: Declarative sandbox policies with platform enforcers

ADK-Rust takes a declarative approach to sandboxing. Policies are defined as data structures and enforced by platform-specific backends. This separation of policy from enforcement enables fail-closed semantics — if the backend cannot enforce a requested control, execution is rejected rather than silently downgraded.

SandboxPolicy and SandboxPolicyBuilder

The SandboxPolicy struct carries:

A fluent SandboxPolicyBuilder API enables chain construction:

SandboxPolicyBuilder::new()
    .allow_read("/usr")
    .allow_read_write("/tmp/work")
    .allow_network_domain("api.example.com")
    .build()

MacOsEnforcer: Seatbelt DSL wrapping

On macOS, MacOsEnforcer generates .sbpl profiles dynamically and invokes sandbox-exec -p <profile>. The generated Seatbelt DSL follows the deny-default pattern:

(version 1)
(deny default)
(allow process-exec)
(allow file-read* (subpath "/usr/lib"))
(allow file-read* file-write* (subpath "/tmp/work"))

LinuxEnforcer: bubblewrap wrapping

On Linux, LinuxEnforcer wraps commands with bwrap:

WindowsEnforcer: AppContainer with ACLs

The WindowsEnforcer takes a fundamentally different approach from macOS/Linux. Instead of wrapping the command in a sandbox process, it:

This is the hardest platform to sandbox. ADK-Rust's implementation is marked as a stub — it creates the AppContainer profile but has partial coverage of the full Windows security model.

ProcessBackend vs WasmBackend

ADK-Rust supports two execution backends:

Fail-closed semantics

The critical design decision in ADK-Rust is validate_policy(): before executing any code, the system verifies that the backend can actually enforce the requested policy. If it cannot, execution is rejected. This is the opposite of silent downgrade — most other agents fall back to weaker isolation without telling the user.

Feature flags

Sandboxing is controlled by Cargo feature flags: sandbox-macos, sandbox-linux, sandbox-windows, and sandbox-native. This allows platforms to opt into only the sandboxing they can support.

âš ī¸

Windows is a stub implementation

ADK-Rust's WindowsEnforcer creates AppContainer profiles but does not match Codex's full Windows sandbox coverage. The Windows security model (restricted tokens, job objects, integrity levels, AppLocker) is complex and requires significant investment to implement correctly.

Claude Code: Permission-driven sandbox routing

Claude Code uses the @anthropic-ai/sandbox-runtime package to implement platform-specific sandboxing. Unlike Codex's always-on approach, Claude Code makes sandboxing decisions per-command and integrates sandbox state deeply into the permission system.

Platform-specific enforcement

Claude Code implements sandboxing on two platforms:

shouldUseSandbox.ts: per-command decision

The shouldUseSandbox.ts module decides per-command whether to sandbox. This is a key differentiator — most agents apply sandbox settings globally. Claude Code evaluates each command individually:

SandboxManager class

The SandboxManager class coordinates sandbox state:

/sandbox command

Users can toggle sandboxing at runtime via the /sandbox command:

UI feedback: SandboxViolationEvent and SandboxPromptFooterHint

Claude Code provides visible feedback when sandboxing blocks commands:

Environment variables and settings

Git worktree isolation for subagents

Claude Code isolates subagent execution in git worktrees. Each subagent operates on a separate worktree, preventing filesystem conflicts and providing a form of version-control-level sandboxing. Network permission requests are forwarded to the SDK host via the can_use_tool protocol.

💡

Per-command sandboxing is underrated

Most agents sandbox everything or nothing. Claude Code's per-command decision means ls and git status run unsandboxed (faster, no overhead) while rm -rf runs in a sandbox. This is the right tradeoff — sandbox only when it matters.

OpenHands: Docker-first isolation

OpenHands takes the strongest isolation approach: every session runs inside a Docker container. The agent never executes directly on the host machine — it operates entirely within a containerized environment.

Docker container per session

Each OpenHands session launches a container from ghcr.io/openhands/agent-server. Session IDs are base62-generated and used for container naming, ensuring unique isolation per session.

Container lifecycle management

OpenHands manages the full container lifecycle:

Network configuration

OpenHands uses bridge mode networking by default with dynamic port mapping. Host networking is available as an optional configuration for scenarios requiring direct host network access.

Authentication: X-Session-API-Key

Communication with running containers is authenticated via X-Session-API-Key headers, validated against running containers to prevent unauthorized access.

File synchronization

Host-container file synchronization handles workspace access — files written inside the container are synced back to the host, and workspace files are mounted into the container for agent access.

Process-based sandbox alternative

For local development, OpenHands offers a process-based sandbox as an alternative to Docker. This bypasses container overhead but sacrifices isolation guarantees — useful for development but not for production.

Remote sandbox for cloud deployments

OpenHands supports remote sandbox execution for cloud deployments, allowing sessions to run on remote Docker hosts rather than the local machine.

DeerFlow: Pluggable sandbox backends

DeerFlow implements a pluggable sandbox architecture with multiple backend options, allowing deployment flexibility from local development to cloud-scale execution.

LocalSandboxProvider

The LocalSandboxProvider runs commands on the host directly with path mapping but no process isolation. It is the simplest backend — useful for development but providing no actual sandbox guarantees.

AioSandboxProvider with pluggable backends

The AioSandboxProvider abstracts over multiple backends:

Virtual path system

DeerFlow uses a virtual path system mapping:

This virtual path layer abstracts away the differences between local, container, and Kubernetes backends — the agent always sees the same paths regardless of execution environment.

Apple Container

On macOS, DeerFlow can use Apple's lightweight container runtime as an alternative to Docker. This provides OS-level isolation without requiring Docker installation.

Hermes: 6-runtime-backend architecture

Hermes implements the most diverse set of sandbox backends through a BaseEnvironment ABC that serves as a common interface across all runtimes.

The six backends

Local

Zero isolation, direct host execution. Useful only for development or fully trusted environments. No process or filesystem restrictions.

Docker

Hardened Linux containers with capability dropping, tmpfs mounts, and PID limits. The strongest local backend Hermes offers.

SSH

Remote machine execution via ControlMaster connections with file synchronization. Isolation comes from the physical separation of the remote host.

Modal

Cloud-native sandboxing with filesystem snapshots. Code executes in Modal's cloud infrastructure with isolated function containers.

Daytona

Serverless cloud sandboxes with filesystem persistence. Each session gets an isolated development environment in the cloud.

Singularity

HPC-focused container isolation. Designed for high-performance computing environments where Docker's daemon model is unavailable or undesirable.

BaseEnvironment ABC

All backends implement the BaseEnvironment abstract base class, providing a common interface for execution, file access, and lifecycle management. This abstraction allows Hermes to swap backends without changing agent code.

Platform-specific deep comparison

Each platform offers fundamentally different sandboxing primitives. Understanding these differences is critical when building a cross-platform agent.

FeaturemacOSLinuxWindows
Primary mechanismSeatbelt (sandbox-exec)bwrap + Landlock + seccompAppContainer / Restricted tokens
Invocationsandbox-exec -p <profile> <cmd>bwrap <args> -- <cmd>CreateRestrictedToken + ACLs
FilesystemScheme DSL (.sbpl), subpath rules, regex globsBind mounts (--ro-bind, --bind), Landlock access FSACLs on allowed paths, SID restrictions
Networkdeny default + allow rules, proxy-aware--unshare-net, network proxyNetwork capability restrictions
Processprocess-exec/process-fork allowlists--unshare-pid, --die-with-parent, --new-sessionProcess token restrictions
Protected paths.git, .agents, .codex read-only.git, .agents, .codex read-onlyProtected directory ACLs
FallbackNone (Seatbelt always available on macOS)Landlock when bwrap unavailable, WSL1 detectionAppContainer stub (partial)
Codex support✅ Full✅ Full (bwrap + Landlock)✅ Full (restricted tokens)
ADK-Rust support✅ MacOsEnforcer✅ LinuxEnforcerâš ī¸ WindowsEnforcer (stub)
Claude Code support✅ Seatbelt✅ bwrap❌ Not supported

What most agents are missing for sandboxing

Several critical sandboxing patterns appear in only one or two agents despite being broadly applicable to any agent that runs code on behalf of users.

âš ī¸

Windows sandboxing is severely under-served

Only Codex has a real Windows sandbox. ADK-Rust has a stub. Claude Code doesn't support Windows sandboxing at all. This leaves the majority of Windows users running agent commands with no OS-level isolation whatsoever.

â„šī¸

Landlock adoption is minimal

Only Codex uses Landlock as a bwrap fallback. It's the future of Linux filesystem sandboxing (kernel 5.13+, always available, no daemon required) but most agents don't know about it.

What you should build into a new agent

If you're building a new coding agent, here's the sandboxing investment that matters most, informed by what works across all agents examined above.

Use Codex's 3-tier model

Start with ReadOnly, WorkspaceWrite, and DangerFullAccess isolation levels. This gives users a clear mental model: no writes, workspace-only, or full trust. Most agents lack this graduated approach entirely.

macOS: Seatbelt with deny-default

Use sandbox-exec with deny-default policies. Seatbelt is kernel-enforced and always available on macOS — no installation required. Generate .sbpl profiles dynamically from policy data structures.

Linux: bwrap + Landlock fallback

Use bwrap for namespace isolation (--die-with-parent, --unshare-pid, --ro-bind). Check for user namespace access with a probe timeout. When bwrap is unavailable, fall back to Landlock for filesystem restrictions. Never silently skip sandboxing.

Windows: AppContainer with ACLs

This is the hardest platform. Use AppContainer with ACL-based path restrictions. Create unique SIDs per session. Set filesystem ACLs on allowed paths. Don't take shortcuts — Windows security requires full investment.

Fail-closed validation

Implement ADK-Rust-style validate_policy(): if the sandbox backend cannot enforce a policy, reject execution rather than silently downgrading. The user and agent must know when isolation is weaker than requested.

Per-command sandbox decisions

Implement Claude Code-style per-command sandboxing: some commands are safe unsandboxed (git status, ls, cat), others always need sandboxing (anything that writes outside the workspace, runs unknown code, or modifies system state).

💡

Protect agent metadata directories

Even under writable workspace roots, always protect .git and agent metadata directories (.agents, .codex) as read-only. Codex does this — it prevents the agent from corrupting its own version control and configuration state during execution.

â„šī¸

Test on all three platforms

Most sandbox bugs are platform-specific. Seatbelt policies that work on macOS may not translate to bwrap on Linux. AppContainer on Windows requires entirely different primitives. Test your sandbox on every platform you claim to support — don't assume cross-platform portability.

Bottom line

Sandboxing is the single most unevenly implemented feature across coding agents. Codex sets the bar with full cross-platform coverage (Seatbelt, bwrap + Landlock, restricted tokens). OpenHands takes the strongest isolation path with Docker per session. Claude Code innovates with per-command sandbox decisions. ADK-Rust pioneers fail-closed policy validation.

But the majority of agents — Pi Mono, Neovate, Crush, OpenCode, Pochi, Kimi CLI, Oh My Pi — run code directly on the host with no OS-level sandbox whatsoever. They rely entirely on permission prompts, banned command lists, and user vigilance. This is acceptable for local development on trusted machines but represents a significant security gap for any agent deployed on production infrastructure.

For new agent builders, the path is clear: implement platform-native sandboxing from day one, validate policies fail-closed, and never silently downgrade isolation. The gap between agents that sandbox and agents that don't is only going to widen as the security implications of AI code execution become more widely understood.