m9m
Guide · Agent workflows

Running Claude Code in a sandbox with m9m

How to use m9m's CLI node to run Claude Code (or Codex, Aider) inside a Linux namespace sandbox with CPU, memory, and network limits.

Neul Labs ·
#claude-code#sandbox#agents#security

Running a coding agent in production is the moment the “let’s just give it shell access” plan catches up with you. This guide shows how to run Claude Code inside m9m’s CLI node with namespace isolation, CPU/memory caps, a network allow-list, and an allow-listed command set — the minimum viable production posture.

The threat model

An LLM with shell access can, in the worst case:

  1. Exfiltrate files it can read.
  2. Make outbound network requests to arbitrary destinations.
  3. Install packages that phone home.
  4. Consume CPU / memory / disk far beyond what you intended.
  5. Persist state on the host across runs.

The sandbox closes each of these.

The sandbox, layer by layer

m9m’s cli.claude-code node starts the agent in a fresh Linux namespace with:

  • PID namespace — the agent sees only its own processes.
  • Mount namespace + chroot-ish view — filesystem is read-only except for an explicit working directory.
  • Network namespace + policy — network is off by default; if you allow it, it’s against an allow-list of hosts.
  • cgroup CPU + memory limits — hard caps, not advisory.
  • User namespace — the agent runs as an unprivileged UID inside its namespace.
  • allowed_commands — an allow-list of shell commands the agent may invoke. Anything else fails.

Call it defence in depth: each layer alone could be bypassed; together they narrow the blast radius to something tractable.

Minimal working node config

{
  "id": "claude-review",
  "type": "cli.claude-code",
  "params": {
    "sandbox": true,
    "cpu": 2,
    "memory": "2Gi",
    "network": "deny",
    "workdir": "/tmp/work",
    "prompt": "Summarise the diff below in three bullet points:\n{{ $node.fetch.diff }}",
    "timeout": "2m",
    "allowed_commands": []
  }
}

This is the safest starting posture: no network, no commands, two minutes, hard CPU/memory caps. The agent can read the working directory and write output to stdout — nothing else.

When the agent needs the network

If the agent needs to, say, fetch docs from a vendor or hit an internal API, expand the network policy — carefully.

"network": {
  "mode": "allow",
  "hosts": [
    "api.anthropic.com",
    "docs.neullabs.com",
    "api.internal.example.com"
  ],
  "ports": [443]
}

Rules of thumb:

  • Allow api.anthropic.com (and whatever inference endpoint the agent uses) — obviously required.
  • Don’t allow * or 0.0.0.0/0. Ever.
  • Prefer egress through a proxy you control, so you can audit exactly what the agent reached for.
  • Log every outbound connection from the sandbox for a week before trusting the allow-list.

When the agent needs shell commands

Some tasks genuinely need git, rg, cat, or a compiler. Add them to allowed_commands:

"allowed_commands": ["rg", "cat", "git", "go"]

Guidance:

  • Allow the smallest set that makes the task succeed.
  • Never allow sh, bash, zsh, sudo, su, curl (use http.request nodes upstream instead), wget, pip, npm, or package managers in general.
  • If the agent argues it needs bash, treat that as a sign the task is wrong-shaped, not that the allow-list is too strict.

Working directory hygiene

The working directory is the only writable path. When the workflow starts, m9m mounts a fresh directory into the sandbox. When the node finishes, that directory is tar’d as a run artifact and then deleted. Nothing persists across runs unless you explicitly copy a file back out via the node’s output.

This is deliberate — it means “the agent wrote something weird last night” is always debuggable from the artifact, and “the agent installed something that broke tomorrow’s run” is impossible.

Timeouts

Two timeouts matter:

  • Node timeout — how long the whole Execute may take. Set conservatively. 2–5 minutes for reviews, 10–15 for generation tasks.
  • Agent internal timeout — Claude Code / Codex each have their own idle timeouts. Let the node timeout be the outer bound.

If a timeout fires, m9m kills the namespace and returns a timeout error. The workflow can catch that error and branch (retry, escalate to human, degrade gracefully).

Observability

Every sandboxed run emits structured events: start, exit code, resource usage peaks, network connections attempted, commands executed, stdout, stderr. These land in the audit log and surface as Prometheus metrics. For incident review, turn on the sandbox.verbose flag; for production, the default is adequate.

A realistic end-to-end workflow

{
  "trigger": { "type": "webhook", "path": "/github/pr" },
  "nodes": [
    { "id": "verify", "type": "github.webhook.verify", "params": { "secret": "{{ $cred.github_webhook }}" } },
    { "id": "fetch",  "type": "github.pr.get", "params": { "pr": "{{ $json.pr_url }}" } },
    { "id": "review", "type": "cli.claude-code", "params": {
        "sandbox": true, "cpu": 2, "memory": "2Gi",
        "network": { "mode": "allow", "hosts": ["api.anthropic.com"], "ports": [443] },
        "workdir": "/tmp/work",
        "allowed_commands": ["rg", "cat"],
        "prompt": "Review this diff for security issues. Be specific; cite file:line.\n\n{{ $node.fetch.diff }}",
        "timeout": "5m"
    }},
    { "id": "approve","type": "human.review", "params": { "notify": "#eng-reviews" } },
    { "id": "post",   "type": "github.pr.comment", "params": { "body": "{{ $node.review.output }}" } }
  ]
}

PR webhook → signature verify → diff fetch → sandboxed Claude Code with just the network it needs → human approval → posted comment.

Need help shipping agents or migrating off n8n?

Neul Labs — the team behind m9m — takes on a limited number of consulting engagements each quarter. We help teams migrate n8n workflows, build custom Go nodes, sandbox AI agents in production, and design automation platforms that don't collapse under load.