The Creature That Empties Your Backlog While You Sleep: goal-driven loops and workflows in Claude Code

Loop engineering without the smoke: the anatomy of an autonomous loop, how I run mine with a GitHub board and subagents, and the Claude Code parts that make it possible.

For two years the craft was writing the perfect prompt. The exact phrasing, the right example, the correct tone. It worked. But there was an awkward detail: you were the loop. You read the answer, you judged whether it was any good, you decided the next step and typed again. You were the nervous system firing every signal by hand, neuron by neuron, for hours.

In 2026 the craft got a new name. Addy Osmani called it loop engineering and Peter Steinberger put it without anaesthetic: you should stop prompting your agents and start designing the loops that prompt them. The prompt is no longer the unit of work. The unit is the loop.

Designing a loop is, essentially, building a reflex arc that fires itself. And yes, there is something faintly unsettling in the image: a calm creature that metabolizes your backlog while you sleep, ticket by ticket, never asking permission, never getting tired. Let us open it up and see how it is built inside.

1. The anatomy of a loop

Strip off the fashionable wrapping and an agent always does the same thing, in three beats: it gathers context, acts with a tool, and verifies the result. Then it repeats. Claude Code’s own docs describe that heartbeat as the “agentic loop”: Claude evaluates, calls a tool, receives the result and evaluates again, turn after turn, until it no longer needs any tools.

If that sounds like biology, it is because it is. Your body has spent your whole life holding 37 °C with exactly the same circuit: a sensor measures, a controller compares against a set point, an effector corrects, and the correction gets measured again. That is homeostasis, and it is the oldest, most reliable loop in existence. An agent loop invents nothing new: it just bolts a GPU onto a trick nature patented millions of years ago.

The heartbeat of any agent: gather, act, verify, start again. The same thing your body does to stay within 37 °C.

Inside that heartbeat there are two roles worth keeping straight: a doer, which produces, and a checker, which decides whether what was produced is any good. The trap is to assume the doer is the hard part. It is not: the doer (the model) is already good. The one that decides whether the whole contraption works is the checker, and that is where almost all the craft lives.

If you can’t say what “done” looks like, you don’t have a loop. You have a wish.

That line —going around a lot lately among people who build loops— is the raw version of something that in my workflow has a formal name: the Definition of Done. Without a clear check, the agent doesn’t know when to stop, just like a thermostat with no thermometer: it heats forever.

2. My loop, today

Let me show you the guts of the one I run daily, because a loop is easier to understand watching it run than reading theory.

The board as a digestive tract

Everything starts on a GitHub board. I drop tasks into columns there, and the only one the system cares about is Ready for implement. The board does two jobs at once: it is the work queue and it is the loop’s external memory —the notebook that lives outside the conversation and remembers what is done and what is left. I think of it as a gut: tasks enter at one end and move along by peristalsis, one at a time, until they leave digested through “Closed”.

The orchestrator dispatches; the subagents execute

A main orchestrator keeps pulling tickets from Ready for implement and hands them to an execution subagent. This is, to the letter, the orchestrator-workers pattern Anthropic describes in its agent guide: a brain that decomposes and delegates, and hands that do. The brain never touches the code; the hands never decide the strategy.

Skill overloading: the right enzyme for each substrate

Here is my favorite part. Depending on the task, the orchestrator tells the executor which skills to load —a deliberate “overload” of capabilities. A cell doesn’t manufacture all its enzymes at once: it expresses the one it needs for the substrate in front of it. Same with the subagent: if there is SQL to touch, it loads the data skills; if it is front-end, others. Clean context, minimal tools, zero noise.

The reviewer and the QA: the loop inside the loop

When the executor finishes, the work passes to a reviewer and a QA. The QA runs the tests and, most interestingly, proposes improvements: a small loop opens between QA and executor that iterates until the thing passes. That has a name in the literature: the evaluator-optimizer pattern, one model that generates and another that evaluates, in a loop, until the result holds up. It is exactly the golden rule of loops: nobody checks their own work. The one who cooks isn’t the one who tastes for salt.

The Definition of Done opens the last gate

With the inner loop closed, the ticket returns to the orchestrator, which checks it against the Definition of Done. Only if it passes does the ticket move to “Closed”. And on to the next, until the backlog is empty. Two nested loops: the big one drains the queue; the small one polishes each piece.

Two nested loops: the orchestrator drains the board’s queue; inside, executor and QA iterate until the piece passes the Definition of Done.

3. The Claude Code parts that make it possible

The good news: almost none of this has to be built by hand. Claude Code already ships the parts; the trick is knowing which one switches on each function of the loop.

Subagents — workers with their own context window, their own tools and their own permissions. They do the dirty work in their corner and return only the summary. They are the doer, the reviewer and the QA in my pipeline.
Skills — reusable instructions the agent loads only when needed. The enzyme on demand. The basis of my “skill overloading”.
Hooks — scripts that fire at fixed points in the cycle (before a tool, when it finishes, on stop). They don’t ask the model for anything: they just do it. A Stop hook that runs the tests is a deterministic checker, no argument possible.
/goal and /loop — the built-in checker. With /goal you give it a checkable target and the agent iterates on its own until it is met; a small, fast model checks after each round whether it is there yet. Key: always set a cap (“stop after N turns”).
Dynamic workflows — when a task needs more agents than one conversation can coordinate, Claude writes a script (you trigger it with ultracode) that holds the plan, the branches and the intermediate results. Up to 16 agents in parallel, with a hard cap so it can’t run away.
Agent SDK — the same loop, embedded in your code, with max_turns, max_budget_usd and permission modes. It is the step from “toy in my terminal” to “process in production”.

A simple hierarchy

Who holds the plan? A subagent: Claude, turn by turn. A skill: the instructions Claude follows. A workflow: a script the runtime executes. The more autonomy you want, the more it pays to move the plan out of Claude’s context and into code —and the more the brake matters.

4. Patterns worth stealing

From the recent research I picked up a few ideas I’ve already wired (or am wiring) into my loop:

An independent checker, always. The one who reviews isn’t the one who executes. It sounds obvious and almost nobody respects it.
Set a cap, or the creature never stops eating. max_turns, a dollar budget, “stop after 30 turns”. A loop with no brake isn’t autonomy: it’s a car with no pedal.
Start read-only. Let the loop summarize and propose for a few days before you let it touch anything. Trust earned, not granted.
Separate worktrees. Two parallel agents on the same tree trip over each other. Give them different cells, like cultures that don’t contaminate.
Memory outside the chat. A file (or a board, like mine) that remembers the state even when the conversation resets. Context is volatile; the queue isn’t.

The ladder of autonomy. The top rung —the routine— is the one that takes you out of the picture: it sets a timer and runs the loop even with your laptop closed.

That top rung deserves a paragraph. A routine is the timer that launches the loop on its own, in the cloud, at 8 a.m., without you pressing anything. A checkable goal + a timer = your first genuinely autonomous agent. This is where the metaphor turns dystopian: the creature no longer waits for you to wake it.

5. LoopMaker: the loop that builds the loop

The next jump isn’t writing loops, but having an assistant assemble them for you. And Claude Code already does a version of this: with ultracode you don’t run the task by hand —Claude writes the workflow for you; the routine builder is a no-code wizard. The plan stops living in your head and becomes something the machine drafts.

I’m starting to use a wizard of this kind —I call it LoopMaker— that turns the whole setup into a couple of guided questions: which queue, which Definition of Done, which cap. It is, literally, a loop creating a loop. The cell that builds the ribosome that builds the cell. The machine that designs the machine. It sounds like science fiction and yet it’s a Tuesday.

In the end it all comes down to a formula that’s silly to state and hard to honor: your leverage = skill × clarity. Skill is reviewing the work and improving the loop. Clarity is being able to say, exactly, what “done” means. The reflex will never be better than its receptor.

So yes: let the creature empty your backlog while you sleep. But make sure you’re the one defining what “finished” means. Because if you don’t tell it, it’ll decide for you.

Sources

Claude Code Docs — How the agent loop works
Claude Code Docs — Create custom subagents
Claude Code Docs — Extend Claude with skills
Anthropic — Building Effective Agents
Addy Osmani — Loop Engineering
Sabrina Ramonov — AI Loop Engineering: Claude Code /goal + Routines