My Outer Loop

May 09, 2026

There is a thing that happens, after a couple of weeks of working with coding agents at a steady pace, where you stop thinking of yourself as the person typing and start thinking of yourself as the person seeing. The Latin word for vision is visio, “I see”; the Italian visione and English vision both keep that. It’s a much older idea than the modern “mission statement on a slide” usage. It means: I have, in my head, a picture of where this should go.

That picture is the thing I am responsible for, now. The typing has been outsourced.

This blog post is an attempt to write down, honestly, what my day-to-day looks like a couple of weeks into running this experiment at full pace. It is also — as a worked example — the story of how SwiftBash, SwiftScript, SwiftPorts, and SwiftJS — four projects you’ve read about here, each its own announcement post — finally clicked together into a single thing this weekend. It is not a post about any of those four projects in particular. They have their own posts. It’s a post about the loop that built them.

The role of the experienced engineer

The thing I don’t have to do anymore is type. Not the code, not the tests, not the issue text, not the PR description, not the commit messages, not the review responses. All of that is agent work now. What I bring is upstream of all of it — the picture in my head: where these pieces want to fit, what shape the seams between them should be, which abstraction belongs in which package, what doesn’t yet exist but is going to need to.

It turns out a coding agent is brilliant at the mechanics — including the writing-things-down mechanics — and almost completely without opinion about which thing should be built. You have to bring the opinion. You have to have, very clearly, a picture of what good looks like, because the agent will happily produce code (and issues, and PRs) that look plausible all the way to the test failures, and your only edge is that you know what the answer should approximately resemble before the agent starts.

That is the experienced-software-engineer part. The vision. The visio.

What follows is, structurally, my outer loop. It runs all day. There are usually two or three of these going in parallel across different repos.

The loop, step by step

1. An idea gets fleshed out into a GitHub issue.

The idea is mine. Everything else about the issue isn’t. I describe what I’m seeing — sometimes a sentence, sometimes a paragraph, often just a half-formed nudge — and an agent does the research, reads the surrounding code, sketches alternatives, asks me clarifying questions, and ultimately writes the full issue text: motivation, current state, proposed shape, acceptance criteria, out-of-scope. I edit. We go a couple of rounds. The acceptance bullets matter the most — they’re the answer-key for everything that comes later. Without them, the agent that picks the issue up has nothing to hill-climb against. With them, almost everything else is mechanical.

The issues in the SwiftBash repo, the SwiftPorts one, and the new ShellKit repo all look like this. They’re long. They’re long on purpose. And essentially none of the prose in them was typed by me.

2. An agent picks up the issue and works on it locally.

Local development, local tests, a branch, eventually a pull request. The agent runs the test suite before opening the PR. Most of the time it’s green when I see it.

This is the step where I have to be present the most, even though it looks like it should be the most autonomous. The agent will hit a junction — usually a “should this live in package A or package B?”, or “I see two ways to model this; the type system doesn’t decide between them” — and stop and ask. Eighty percent of those questions are answered with “do it” or “make it so” (Picard, on the bridge of the Enterprise, has been a remarkably useful role model for this kind of work). But the other twenty percent are taste questions: I can see a simpler path the agent didn’t consider, and if I don’t tell it, Opus will earnestly produce the more elaborate one. Without the human at this junction, the agent overbuilds. Quietly, plausibly, but it overbuilds.

The frustrating shape this takes is that the agent will be working away while I’m at lunch or asleep, and at some point it’ll hit one of these questions, stop, and wait. I come back to the keyboard and realise nothing has happened in the last hour because of a stupid clarification it could have asked anyone. There is, right now, no good answer to “agent needs human input but human is not at the keyboard”. This is the part of the loop I’m not yet sure how to tighten.

3. Codex reviews the PR.

This is the step I am most surprised I came to depend on. I have Codex configured to review every PR I open — and the comments are actually useful often enough that I read every single one. Maybe it’s a missing edge case. Maybe a path that wasn’t shell-quoted. Maybe an HTTP method that wasn’t in the allow-list. Maybe a Shell.current that should have been read but wasn’t.

The agent that wrote the PR addresses each comment: 👍 for the good catches (the majority — these reviews are a real second pair of eyes), 👎 for the false positives (rare, but they happen), and the conversation gets marked resolved. I read along.

4. CI runs on every commit. Five platforms.

Every push, GitHub Actions runs the full build-and-test matrix on macOS, iOS, Linux, Android, and Windows. The ambition — for SwiftBash and the surrounding projects — is that all five stay green forever. The CI configuration that gets us there is its own story, told in Four Green Checkmarks. It’s now five.

GitHub Actions is, to be honest, not fast. A Windows job can take half an hour; the full matrix takes longer. It’s also not free at scale — the only reason this loop is economically viable for me is that all of these projects are open source, and GitHub gives unlimited Actions minutes to public repositories. That’s the deal that makes the whole thing work, and it’s the reason I’m unlikely to start a closed-source experiment any time soon. The big payoff: I do not have to maintain a Windows box, an Android emulator, or a Linux VM on my desk. I write Swift on a Mac and watch four other platforms tell me whether I broke them.

5. The agent watches CI and reacts.

Some platforms are well-behaved. Some — Windows in particular — have opinions. A test that quotes a path with forward slashes passes everywhere except where backslashes survive bash quoting differently. A POSIX exec-bit check is meaningless on a filesystem that doesn’t have one. getpid() is deprecated under WinSDK. BOOL is bridged differently. getaddrinfo lives behind a different import.

Each of these is small. Most of them are five-line fixes once you understand the platform quirk. The art is fixing them in a way that doesn’t un-fix the other four platforms.

6. Hill climbing.

This is the part of the day where the agent and I are at our most useful to each other. The success criterion is binary — all five checkmarks green — and there’s a finite stack of small, mechanical, platform-shaped failures to grind through. The agent reads the CI log, identifies the platform quirk, makes the fix, pushes, and waits. A Windows CI run can take up to thirty minutes. Sometimes a single iteration takes one fix; sometimes a fix surfaces a new failure underneath. You climb. You watch the altimeter.

This is where Opus is at its most quietly impressive. It is not glamorous work. It is patient, specific, mechanical work — exactly the kind of work that humans get bored of and start cutting corners on after the third Windows-only branch. The agent does not get bored. As long as I keep the success criterion sharp (“five green, no continue-on-error shortcuts”), it keeps climbing. The recent push to lift Windows from “advisory” to “committed” — a couple of dozen platform-specific fixes, ending with a five-line workflow change to delete the continue-on-error: true gates — happened in one focused stretch on the evening of May 8, somewhere between dinner and bedtime. About two and a half hours, end to end, against build steps that take half an hour each.

7. Five green checkmarks. Merge.

Repeat.

A cousin loop: external PRs

Increasingly the PRs I’m looking at are not from agents I started. They’re from outside contributors. And those contributors, increasingly, are using coding agents themselves.

This is a strange and slightly recursive new pattern. I have one of my coding agents review the incoming PR — sometimes with a few questions of my own sprinkled in — and the conversation that emerges is, in effect, two or three coding agents and two humans collaborating on the shape of a change. Sometimes the PR has missed a design consideration I had in my head; I’ll ask for changes. Sometimes one PR has lumped together three separable improvements; I’ll ask for it to be split. Sometimes the PR is just good — five green checkmarks, the design fits, Codex is happy, and I merge.

That is also part of the outer loop. The vision-holding extends across the boundaries of the repo.

The example: how SwiftBash and friends clicked together

Now the worked example. I’ll keep it deliberately high-level — each of these projects has its own announcement post for the gory detail.

Two weeks ago there was SwiftBash: a sandboxed bash interpreter, in pure Swift, with no Process and no fork. Then SwiftScript: a tree-walking Swift interpreter that needs no toolchain. Then SwiftPorts: pure-Swift reimplementations of gh, glab, git, jq, and the compression family. Then SwiftJS: a Node-shaped runtime on JavaScriptCore.

Four good pieces. Four separate good pieces. Each had its own private notion of “where does stdout go, what’s the working directory, what am I allowed to read, who am I.” The seams between them did not yet line up.

I could see, in my head, how they should fit. There needed to be a fifth package — a tiny one — that owned the runtime context: stdio, environment, sandbox, network policy, identity. The four runtimes would each adopt it. The bash interpreter would still own bash semantics; the Swift interpreter would still own Swift semantics; the JS runtime would still own JavaScriptCore. But the shell context — the substrate they all shared — would be one package. I called it ShellKit.

That was the vision. Turning it into a stack of issues — one per repo, each with its own motivation and acceptance criteria — was an evening’s worth of back-and-forth with an agent that did the actual writing. The agent then implemented it across three repos over a single weekend: ShellKit got published, SwiftBash adopted it, SwiftPorts adopted it, the JavaScript runtime got every host-touching surface gated on the new shared Shell type, the SwiftScript shebang dispatch dropped to a five-line bridge. From the first ShellKit-adoption commit to the last green-Windows CI run was about seventeen hours of wall-clock time, almost all of it the agent toiling away on PRs while I pointed and reviewed.

What that produced is something I am genuinely happy about: a single, composable Shell you can build up with bash commands, Swift port CLIs, a Swift interpreter, and a JS interpreter, hand a sandbox and a network allow-list to, and run a polyglot pipeline through. None of the seams creak. The bash shell pipes into jq which pipes into a Swift script which calls fetch from JavaScript, and every step honors the same sandbox.

Again, none of that paragraph is the point of this post. The point is: I had a picture, the picture was correct, and the loop took it from picture to working substrate over a weekend.

Some unexpected revelations

A handful of things have surprised me about working this way for the last few weeks.

The role inversion is real, and quieter than I expected. I thought I would feel less useful. I feel more useful. The decisions I make at the issue stage — what to include, what to scope out, what counts as done — propagate through the loop with extraordinary leverage, and they’re the only decisions that are still mine to make. A clear idea produces a clear issue produces a clean PR. A muddy idea produces a muddy issue produces a muddy PR. The thinking I do up front, before the agent ever drafts a word, dwarfs anything I’d save by skipping it.

Codex genuinely catches things. This is the one that surprised me most. I expected agent-on-agent review to be a kind of theatre. It is not. The Codex comments find real bugs and real missed edge cases at a hit-rate that would be respectable for a careful human reviewer. I’d estimate ninety percent of the comments are worth a 👍 and an actual fix. Two coding agents conversing about a PR is a meaningfully different review than one agent acting alone.

Hill-climbing is exactly as good as the altimeter. Opus 4.7 is very good at the patient, repetitive, “fix one platform-quirk at a time” work — but only if the success criterion is unambiguous. The most frustrating moments in the Windows climb were the ones where the CI log was being truncated and the signal of progress was missing. The hill is climbable; the altimeter has to be reading.

Five platforms changes how you write code. When every commit gets immediately tested on macOS, iOS, Linux, Android, and Windows, your default reflexes change. You stop reaching for Foundation.Process. You stop assuming POSIX. You design for the smallest common surface and add platform-specific niceties on top, rather than the other way around. This is not a discipline I would have adopted on my own; the matrix imposed it, and I’m grateful it did.

The day of “weeks of work” is over for an entire class of task. The four-projects-clicking-together work in this post would have been, optimistically, two or three weeks of human effort. It was a weekend. I keep mentally re-calibrating what counts as a reasonable size for “this afternoon’s project”. The answer keeps growing.

How would you tighten this loop?

I’m going to close on the question I most want to ask other people, because I suspect the readers of this blog are exactly the people who’d have an answer.

There are two soft spots in my loop that I have not yet figured out how to fix. They are not technical bottlenecks — they’re coordination bottlenecks, and I think they’re the next interesting frontier.

The “agent stuck on a question while I’m asleep” problem. I described this above. The agent hits a taste question, stops, waits. The clock keeps running and nothing happens. I’d love a setup where the agent can post the question into a channel I’m watching from anywhere — Discord, Slack, an iOS notification, whatever — and I can answer with one tap, and it picks up where it left off. The pieces all exist. Nobody, as far as I can tell, has wired them together yet. If you have, please tell me.

The “manually pinging the reviewer” problem. Right now, when an external PR comes in, I still have to go to the right topic thread on my Discord and tell my reviewer-bot (“OpenClaw”) to Review PR #N. The review is great when it lands. The pinging is silly. Ideally the moment GitHub sends the new-PR notification, OpenClaw spins up, reviews the diff, and presents me with a one-screen summary plus two buttons: Good to merge and Request these changes. I tap one. Done. The review-trigger step shouldn’t need a human.

Local CI runners for hill-climbing. Half-hour Windows builds on GitHub Actions are fine for the merge gate, but they’re a lousy iteration surface. I keep meaning to look at running a Windows VM and an Android emulator on something local — even just for the noisy hill-climb phase, before the change goes back up to the cloud matrix as the canonical check. If you’ve automated this in a way that doesn’t double your maintenance burden, I’d genuinely like to read about it.

There are surely more soft spots — the four I’ve listed (issue-stage questions, agent-asleep questions, manual review pings, slow CI hill-climbing) are the ones I bump into every day. I’m sure the reader bumps into different ones. I’d like to know which.

If you’ve found a way to tighten any part of this loop on an OSS project of your own — your own outer-loop diagram, your own Discord-and-bot incantation, a self-hosted runner setup that earns its keep, a “agent asks, human one-taps” flow that already works — I’d love to hear it. The repos are at Cocoanetics/SwiftBash, Cocoanetics/SwiftScript, Cocoanetics/SwiftPorts, and the new Cocoanetics/shellkit. Open an issue, write to me, or — better yet — post how you’ve solved a piece of this and link me the diagram. Right now everyone seems to be inventing their outer loop in private. I’d like for that to stop being the case.

Categories: Recipes

Comments are closed.

Ad

My Outer Loop

The role of the experienced engineer

The loop, step by step

A cousin loop: external PRs

The example: how SwiftBash and friends clicked together

Some unexpected revelations

How would you tighten this loop?

Like this:

Related

CC

Ad

Ad

My Outer Loop

The role of the experienced engineer

The loop, step by step

A cousin loop: external PRs

The example: how SwiftBash and friends clicked together

Some unexpected revelations

How would you tighten this loop?

Sharing:

Like this:

Related

CC

Ad