Ad

Our DNA is written in Swift
Jump

Introducing SwiftBash

Every coding agent I use — Claude Code, Codex, even PI — leans on the same tool: /bin/bash. PI in particular runs almost exclusively through bash, no sandbox in sight. There’s a good reason for that. Bash is one of the most heavily represented languages in any pre-training corpus on the planet, and LLMs write it fluently. If you give a model a file to manipulate, a folder to inspect, or a one-shot pipeline to assemble, the answer that falls out is almost always a few lines of shell.

The downside is the friction. Unless you live in YOLO mode, you spend half your day clicking Allow on find, grep, sed, and cat prompts. Codex in the cloud sidesteps this by spinning up a fresh container per task. On my Mac, both Codex and Claude Code happily edit my actual files — and even with git worktrees, I’ve ended up with stray uncommitted changes on main more than once.

So I started wondering: bash isn’t really that complicated a language. What if I just had Opus write me a bash interpreter — in Swift?

A weekend with the 1M context window

Over the last day or so I had Opus on Extra High fill up the 1M context window a couple of times over. I gave it Vercel’s just-bash for inspiration and bashlex as a reference for how a real bash parser is structured, and let it cook.

The constraints I cared about:

  • Pure modern Swift. No Process, no fork, no exec. Has to drop into a Mac, iOS, or Linux app without dragging libc shell-out behavior into a sandboxed binary.
  • Everything an LLM would actually write. ls, cat, grep, sed, find, awk, jq, tar, curl, bc, xargs, mktemp, the lot.
  • Real sandboxing. Either a cordoned-off temp folder that looks like a real filesystem to the script, or a pure in-memory tree that never touches the disk at all.

That last one was the whole point. Codex’s cloud sandboxes are nice precisely because they’re disposable. I wanted the same property locally — and on iOS, where you can’t fork anything anyway.

What it looks like

The library is split into three products plus a CLI. The smallest useful program is this:

import BashInterpreter
import BashCommandKit

let shell = Shell()                    // sandbox-by-default identity
shell.registerStandardCommands()       // ls, cat, grep, sed, find, …

try await shell.run("""
    for f in *.txt; do
      echo "$(basename "$f" .txt): $(wc -l < "$f") lines"
    done | sort -k2 -n
    """)

Every command is a registered Swift type. Pipelines are AsyncStream<Data> channels. The filesystem is a FileSystem protocol — and there are three implementations to pick from:

  • RealFileSystem — the host’s FileManager, for trusted scripts.
  • SandboxedOverlayFileSystem — confines the script to one host directory plus an in-memory /tmp. Symlink escapes are blocked, every path passes through realpath(3), and error messages reference virtual paths only — host paths never leak.
  • InMemoryFileSystem — pure in-memory tree. Nothing ever hits the disk.

A freshly-constructed Shell() already leaks nothing about the host:

$ echo 'whoami; hostname; ls /Users; cat /etc/passwd' \
    | swift-bash exec --sandbox /tmp/work /dev/stdin
user
sandbox
ls: /Users: No such file or directory
cat: /etc/passwd: No such file or directory

The four virtualisation axes — filesystem, network, processes, identity — are all independent. You opt into each one. Want the script to be able to call your API but nothing else?

shell.networkConfig = NetworkConfig(
    allowedURLPrefixes: ["https://api.example.com/v1/"],
    allowedMethods: ["GET", "POST"],
    denyPrivateIPs: true   // block 127.0.0.1, 10/8, 192.168/16, …
)

That’s it. curl reads from Shell.networkConfig and refuses everything else with exit status 7.

Bash 4, not bash 3.2

One small surprise from this project: macOS still ships /bin/bash 3.2 from 2007, because of a GPL licensing thing. Modern Linux, Homebrew, and basically everyone else are on bash 4 or 5. So when LLMs generate bash, they generate bash 4 — associative arrays, ${var^^} case conversion, ${arr[-1]} negative indexing, mapfile, coproc. SwiftBash targets bash 4.x semantics for everything it implements, which means scripts that an LLM writes generally just work — no “bad substitution” surprises.

declare -A counts
for word in $(cat words.txt); do
  counts[$word]=$(( ${counts[$word]:-0} + 1 ))
done
for k in "${!counts[@]}"; do
  echo "$k: ${counts[$k]}"
done | sort -k2 -rn

That runs in SwiftBash. It does not run in /bin/bash on a stock Mac.

The hard ones, properly done

The thing I’m most pleased about — and honestly a bit surprised by — is how complete the implementations of the staple commands ended up being. These aren’t shims that handle the three flags an LLM happens to use most often. They’re proper implementations of what are, in many cases, full programming languages in their own right.

The biggest ones, ranked by lines of Swift it took to implement them:

CommandSwift LOCWhat it actually is
jq~4,500JSON query language: lexer, parser, evaluator, ~80 builtins
awk~3,000Pattern-action language: lexer, parser, expression tree, builtins
sed~1,600Stream-editor mini-language: address ranges, s/// with backrefs, b/t branches, hold space
find~900Expression tree with -and/-or/-not, -exec … {} +, time/size predicates
curl~600HTTP client with the allow-list and SSRF defenses bolted in
bc~400Expression calculator with -l math library (Double-precision)

jq, awk, and sed in particular each needed their own parser and evaluator — they’re real languages. The fact that all three came out coherent, with associative arrays and user-defined functions in awk, with hold-space and labels in sed, with path expressions and reduce/foreach in jq, is the part I keep being a little amazed by. These are the commands that make bash actually useful for data manipulation, and they’re the ones I’d most miss if they were stubbed out.

Beyond that tier there’s solid coverage on grep, rg (ripgrep), sort, tar, gzip/gunzip, diff/patch, yq, tr, cut, paste, join, comm, xargs, and the rest of the textbook unix toolkit.

Cover the majority, fail honestly on the rest

The design rule I kept coming back to: handle the majority of real-world usage, and when you hit a limitation, fail in a way the model can read and route around.

LLMs are remarkably good at recovery if you give them an honest error. They’re terrible if you silently produce wrong output. So every command emits the same kind of error a real GNU/BSD tool would — prefixed with the command name, written to stderr, with a non-zero exit status:

$ swift-bash exec script.sh
column: unknown option: --table-columns
awk: function `gensub' not implemented
ps: -L not supported in sandbox

When an agent sees awk: function 'gensub' not implemented, it does the obvious thing: it rewrites the line as a sed substitution or an awk gsub, and moves on. That recovery loop is the whole reason this works as an LLM tool. A silent failure or a wrong answer would poison the rest of the session; a loud, specific error is just another data point the model handles in stride.

The corollary: I’d much rather ship a command with 80% coverage and crisp error messages on the missing 20% than a command with 95% coverage and undefined behavior on the edges. If the post-mortem on a failed agent run is “it tried comm -12 --check-order and SwiftBash quietly ignored the flag,” I’ve made the wrong tradeoff.

Math, because of course you need math

LLM-generated bash loves bc for arithmetic. SwiftBash ships a bc that’s “good enough” — it’s Double-accuracy rather than arbitrary precision, but for the kinds of expressions an agent actually writes it’s indistinguishable from the real thing:

$ echo "scale=6; 22/7" | bc
3.142857

$ echo "s(1.5707963)" | bc -l        # sine, with the math library
.999999999999

$ echo "sqrt(2) * 100" | bc -l
141.42135623730950488

# sum a column of numbers
$ awk '{print $2}' sales.tsv | paste -sd+ - | bc
18420.50

Combined with awk, paste, and the usual $(( … )) arithmetic expansion, that covers basically every “do a quick calculation” thing an agent reaches for.

A few real scripts

Just to give you a sense of what runs unmodified — these are the kind of one-liners and small pipelines that LLMs produce constantly, and they all go through the in-process interpreter without spawning a single subprocess.

# Find the 10 largest source files in a tree.
find . -name '*.swift' -type f -print0 \
  | xargs -0 wc -l \
  | sort -rn \
  | head -11 \
  | tail -10
# Count TODO/FIXME comments by author, using grep + awk.
grep -rn -E 'TODO|FIXME' Sources/ \
  | awk -F: '{ print $1 }' \
  | xargs -I{} git log -1 --format='%an' -- {} \
  | sort | uniq -c | sort -rn
# Rewrite a config file in place: bump every version: x.y.z by one patch.
sed -i.bak -E 's/^(version: [0-9]+\.[0-9]+\.)([0-9]+)/\1\
  $((\2+1))/' config.yaml
# Tally HTTP status codes from an access log.
awk '{ print $9 }' access.log \
  | sort | uniq -c | sort -rn \
  | head

None of these need /bin/bash, none need Process. They run inside the same Swift process that hosts your app.

The CLI

There’s a swift-bash binary that mirrors the embedded interpreter — same parser, same commands, same sandbox flags. You can use it as a safer bash for scripts you don’t fully trust:

# AI-generated script, no host access at all.
echo "$llm_output" | swift-bash exec --sandbox /tmp/work /dev/stdin

# Sandboxed run with read-only access to one specific API.
swift-bash exec --sandbox ~/Documents/scratch \
                --allow-url https://api.github.com/repos/example/ \
                analyze.sh

It also has a parse subcommand that prints the AST, which is useful when you’re trying to understand why some weird quoting edge case isn’t doing what you expected.

What it’s actually for

The vision is an iPad coding-agent app that embeds this thing as its bash tool. OpenAI gives you code_interpreter over the wire, and it’s great — but if I have a perfectly serviceable interpreter that runs in-process on the device, why pay a round-trip to run wc -l? Light agentic exploration, summarising a folder of CSVs the user dropped into the sandbox, basic data wrangling — all of it stays local, and all of it stays inside the sandbox the host app handed the script.

To be clear: SwiftBash only manipulates files inside the sandbox you give it. It doesn’t reach into the user’s Photos library or read arbitrary files from the Files app. But the sandbox is a normal Swift FileSystem, which means an embedding app can plug in whatever extra commands it wants. I can imagine pulling in a few of my SwiftText routines — Markdown-to-HTML, HTML-to-PDF, that sort of thing — and registering them as bash commands. Then you can have an LLM produce a report in Markdown inside the sandbox and get a polished HTML or PDF out of the same script.

It also turns out to be a useful CLI in its own right. I now reach for swift-bash exec --sandbox whenever an LLM hands me a script and I haven’t yet read the whole thing.

And one more thing

I asked Opus to summarise the lessons we learned building the bash interpreter — what the abstractions ended up being, where the parser and the executor split, how AsyncStream pipelines actually want to be wired. Then I handed that summary to another Opus and asked it to start a Swift interpreter on the same architecture.

It’s already further along than I expected. Most arithmetic, control flow, and function definitions work. I’ll probably wire it into SwiftBash itself as a stand-in for swiftc so that #!/usr/bin/env swift scripts can run inside the same sandbox as everything else.

Same trick, different language — and the same reason it works. The training data is already there. We just have to give it somewhere safe to run.

Why open source?

Honestly? Because I don’t know how complete or correct this is yet. Bash is a sprawling, decades-old language with all sorts of corners (job control, brace expansion edge cases, the seventeen different ways [[ … ]] differs from [ … ]), and I’ve covered the parts that LLM-generated scripts actually exercise — but “actually exercise” is a moving target. Every model I throw at it finds another quoting wrinkle.

So I’m putting it on GitHub. If you read this and think that’s a fun idea, but you forgot about X, please tell me. If you have a use case I haven’t thought of — embedding it in a Shortcuts action, wiring it up to a local model, using it as a teaching sandbox for a bash class — I’d love to hear that too. The repo is the conversation; I’ll meet you there.


Categories: Administrative

Comments are closed.