Vibe Coding with Guardrails: My Workflow for Shipping with AI Agents

I've shipped plenty of code with an AI agent riding shotgun. Some of it was great. Some of it I caught during review and quietly deleted. Enough of it broke in ways I didn't expect that I stopped treating "vibe coding" as a workflow and started treating it as a default that needs scaffolding around it.

The term came out of a half-joking post by Andrej Karpathy — the idea being that you let the LLM drive and barely read the code. For throwaway scripts and weekend projects, that's fine. But the second the code is going to live somewhere, be read by someone else, or touch a system I care about, I need guardrails.

What follows is the workflow I've landed on. It's not about being cautious — I'm still moving fast. It's about stacking cheap checks so the agent's failure modes get caught early, while the thing is still easy to fix.

The Problem with Pure Vibe

Left to its own devices, an agent will happily produce plausible-looking code that solves the wrong problem, introduces a subtle bug in an edge case, imports a package that doesn't exist, or silently breaks a contract the rest of the codebase depends on. The code looks right. It compiles. The tests — which the agent also wrote — pass.

The mental model I've found most useful: the agent is fast, well-read, tireless, and starts every session knowing nothing about yesterday. It can write code all day. It does not know your codebase, and it does not know all of the details that matter. Your job is to supply that context and verify the output. Everything below is a way of doing that cheaply — and the lack of memory, which is a liability when you're implementing, turns out to be an asset when you're reviewing.

[ WARNING ]

The Failure Mode to Watch

The dangerous output isn't the code that breaks loudly. It's the code that works in the happy path but silently violates an invariant you didn't think to mention. If you're not reading the diff, you need something reading it.

Step 1: Research Before You Prompt

Vague context in, vague code out. The quality of the prompt determines the quality of the output, and most of my prompt quality comes from work I do before opening the agent at all.

Before I kick off a task, I spend five to fifteen minutes getting oriented. What does the existing code look like? What are the conventions? Where does this new piece slot in? I'll also skim the docs for whatever library I'm about to use, check recent issues in the repo, read a similar PR if one exists.

When I write the prompt, I include the constraints I've surfaced: here's the pattern we use for X, here's the edge case I'm worried about, here's the library version we're on, here's the function this needs to integrate with.

The agent can't read your mind, and it can't smell which details matter. You have to tell it.

Step 2: Invite the Pushback

This is the step that changed everything for me.

Before I ask the agent to build the thing, I ask it to argue against it. Literally: here's what I'm planning. What are the risks? What would a senior engineer push back on? Is there a simpler approach I'm missing? What am I not thinking about?

Often enough to be worth the thirty seconds, it surfaces something real — a library that already does what I'm about to write, a race condition in the design, a schema decision that'll hurt us in six months, a simpler path I hadn't considered. When it doesn't surface anything real, I'm out thirty seconds.

It's a pre-mortem with a consultant who works for free and never gets tired.

The key is to actually consider the pushback instead of defensively justifying your original plan. If the agent's critique is weak, fine, move on. If it's pointing at something you'd glossed over, slow down and fix the design before a single line of code exists.

Step 3: Plan Mode, With a Second Opinion

Once I've digested the pushback and refined the approach, I have the agent enter plan mode. Claude Code has this built in — it produces a structured plan without touching any files.

Then I take that plan to a second agent. Different context window, often a different prompt framing. "Review this plan. What's wrong with it? What's missing? Where would this fail?"

The two agents often disagree. Agent A insists the right pattern is a factory; agent B argues that's over-engineered and suggests a direct approach. When that happens, I either ask them to debate each other or just make the call myself based on what I know about the codebase.

The magic isn't that either agent is right. It's that the disagreement forces me to think about the tradeoff instead of rubber-stamping a plan. Two agents in mild disagreement produce better decisions than one agent in confident agreement with itself.

The Loop

Research

orient, read the code, surface constraints

Pushback

ask the agent to argue against the plan

Plan

structured plan, no files touched

Second opinion

fresh agent reviews the plan

Reconcile

resolve disagreements, pick a path

Implement

watch the shape, interrupt if it drifts

Simplify

/simplify to strip the noise

Scan

periodic security, bug, refactor sweeps

↻ repeat per task · scan runs out-of-band

Step 4: Reconcile, Then Implement

After the plans converge, or I pick a winner, I let the primary agent implement. This is where "vibe" actually earns its name — I'm not reading every line as it streams. I'm watching for shape. Is it touching the files I expect? Is it writing tests? Is the diff size in the ballpark I predicted?

If something feels off while it's streaming, I stop it. Interrupting is cheap. Unwinding a bad 400-line diff is not.

The heuristic I use: if the agent's doing something I didn't anticipate from the plan, that's a signal to pause, not a signal that it figured something out I missed. Sometimes it has. More often it's drifting.

Step 5: `/simplify` When the Diff Gets Ugly

Agents over-engineer. They add abstractions you didn't ask for, sprinkle in try/except blocks that swallow exceptions, write helper functions used in exactly one place, introduce an options dict "for flexibility."

Claude Code ships with a /simplify command that does exactly what I want at this stage: review the diff, strip anything that isn't load-bearing, collapse abstractions used once, remove exception-swallowing try blocks, inline helpers that aren't reused. I didn't build it — it's built in — but it's the step in my workflow I lean on most heavily.

Not every run improves things, but it reliably strips some of the noise without losing functionality. I run it whenever a diff becomes large or the code just feels heavier than the problem warrants.

Custom Subagents

Claude Code lets you define subagents with canned system prompts that live alongside your project. I've built a handful I use constantly:

Reviewer — reads a diff and plays senior-engineer-on-code-review. Pedantic by design.
Planner — takes a feature request and produces a plan with risks and assumptions called out explicitly.
Simplifier — a custom companion to Claude Code's /simplify, tuned to the patterns this codebase tends to over-engineer.
Security Auditor — tuned to the specific stack and risk patterns we actually care about, not a generic OWASP checklist.
Bug Scanner — looks for the specific failure patterns that keep showing up in postmortems.

Here's roughly what my Security Auditor prompt looks like, to make this concrete:

You are a security auditor reviewing a diff in a TypeScript/Rust
service that runs on AWS.

Look for:
- Authentication or authorization gaps on new endpoints
- Input validation missing on anything reaching a database, shell,
  or downstream service
- Secrets, credentials, or PII ending up in logs or error messages
- Injection surfaces (SQL, command, template, prototype pollution)
- IAM policies or role assumptions that are broader than the code needs
- TOCTOU races in resource access
- Unsafe deserialization

Ignore:
- Style, naming, performance (unless performance enables a DoS)
- Theoretical attacks that assume the attacker already has the keys

Output format:
severity (high/medium/low) | file:line | specific issue | suggested fix

One row per finding. No preamble, no summary.

The specificity is the point. A generic "look for security issues" prompt produces generic slop. A prompt that names your stack, lists the categories you actually care about, tells the agent what to ignore, and pins the output format produces findings you can triage in five minutes.

Each subagent is a variation on this pattern: narrow scope, concrete output shape, explicit list of what to ignore.

The payoff is twofold. First, I stop retyping the same meta-prompt fifteen times a day. Second — and this is the bigger win — each subagent enters the conversation fresh. No context bleed from the implementation agent, no recency bias toward the code that was just written, no sunk-cost attachment to the approach under review. They see the diff the way a reviewer in another timezone would: cold, without the story of how it got there.

A good subagent is a fresh pair of eyes you can summon on demand. That's the whole game.

Periodic Sweeps

The per-task loop above is one rhythm. Here's the other: periodic sweeps across the whole codebase, run by the scanning subagents above, not tied to any particular feature.

Security scan. Auth issues, input validation gaps, secrets ending up in logs, injection surfaces, over-permissive IAM, things exposed that shouldn't be.
Bug scan. Off-by-one errors, null handling gaps, race conditions, resource leaks, retry logic that hammers a downstream when it's already unhappy.
Refactoring scan. Code smells, duplicated logic, dead code, anything that'll make the codebase harder to maintain in six months.

These are slow and noisy. Most of what they flag isn't worth fixing, but enough of it is that the ratio earns its keep. It's the agentic equivalent of running every linter you own in paranoid mode, except the linter understands semantics.

I usually run one of these on a Friday afternoon when I don't want to start something new but don't want to coast either.

Steer Your Own Compaction

Context windows are big now, but not infinite — and they degrade well before they fill. Claude Code shows you a context indicator, warns around 80% used, and auto-compacts around 95%, silently summarizing older turns to free up room. (The newer API beta exposes the same primitive at a configurable trigger, default 150K tokens.) Auto-compaction works. The problem is that by the time it fires, the agent is compressing under duress, and the summary is lossier than you'd think.

How lossy? Factory.ai benchmarked summarization-based handoff at roughly 37% retention — about two-thirds of decision-relevant detail gets lost or corrupted across a single compaction. Per-fact accuracy lands around 4 out of 5, which sounds fine until you notice which facts get mangled: file paths get paraphrased or hallucinated, error strings get reworded (breaking grep), line numbers shift, config values round. The exact details an engineer would never paraphrase are the first to go.

The deeper problem is that the agent was already getting worse before compaction kicked in. Chroma's Context Rot study tested 18 frontier models — Claude Opus 4, GPT-4.1, Gemini 2.5 — and found that every model degraded at every context length increment, even on trivial tasks like repeating words. The NoLiMa benchmark (Modarressi et al., 2025) sharpened the point: 11 of 13 frontier models claiming 128K+ support dropped below half their short-context baseline at just 32K tokens. GPT-4o fell from 99.3% to 69.7% accuracy on tasks requiring latent inference rather than literal match — a thirty-point drop at a sixteenth of the advertised window.

So by the time the auto-compactor is summarizing, it's a mediocre summarizer working from already-degraded recall. That's the wrong moment to hand it the keys.

The fix is to compact on your own terms, well before the agent does it for you. When I see the indicator climbing — that 80% warning is the cue, not 95% — I stop and ask the agent to write its own handoff:

"Summarize this session as a brief for a fresh agent picking up where you left off. Include the goal, the constraints we surfaced, the decisions we made and why, the open questions, and what to do next. Be specific about file paths, function names, and exact error strings. Skip anything reconstructable from the diff."

The output is a tight, structured brief — usually 200 to 500 words. I read it, edit it (the agent always over-includes irrelevant detail and under-includes the one weird constraint that took an hour to find), and either paste it into a new session or save it as a scratch file the next one can pull in. Anthropic's own guidance for long-running agents lands in the same place — they prescribe a claude-progress.txt log, structured feature lists, and an init.sh to reconstruct the environment. Hand-written engineering artifacts beat model self-summary, by their own evidence.

A few rules I've landed on:

Watch the indicator. The 80% warning is your cue, not 95%. Mid-task auto-compaction is the worst kind — it strips the scaffolding for the work you're actively doing.
Compact at natural breakpoints. End of a feature, after a major review, before switching subjects. Same instinct as committing code.
Treat the brief as a real artifact. Edit it. Trim what isn't load-bearing. Add what the agent missed. The brief is the bridge between sessions, and a sloppy bridge falls down in the middle.
When in doubt, start fresh with a brief. A fresh agent reading a tight 300-word handoff is almost always sharper than a tired one in a 90%-full context. Context rot is real, and it gets worse the more you stuff in.

Compaction is going to happen. The only question is whether you wrote the summary or the agent did.

Groom the Memory

The same agent that's helpful because it remembers things across sessions becomes unreliable when those memories rot. Claude Code commits notes to memory unprompted — your stylistic preferences, project facts, decisions you made out loud weeks ago — and most of them are right when written. The trouble is they keep applying long after the underlying truth has moved.

A memory that says "the auth service uses JWT" is gospel until you migrate to session cookies and forget to update it. The agent now confidently leans on a fact that hasn't been true for a week. Worse, it cites the stale memory when justifying a decision, and you don't notice because you'd forgotten you ever told it to remember.

This compounds with a well-documented property of long contexts. Liu et al.'s Lost in the Middle found that LLM recall over a long context follows a U-shape: performance is best at the beginning and end and reliably worst for information buried in the middle. The result has been replicated across model families. A bloated memory file doesn't just consume tokens — it pushes load-bearing facts into the recall sag while keeping the noise visible. Anthropic's own guidance on building long-running agents lands in the same place: a curated, retrievable set tends to beat "stuff everything into context" for code tasks, and compaction — the automatic summarization that fires near the context limit — preserves gist while losing fine-grained rationale. By the time you notice the drift, the why behind a decision is already gone, and only a confident-sounding what remains.

The fix is cheap. Every couple of weeks, ask the agent to dump everything it has stored about you and the project. Read it cold. Delete what's wrong, condense what's bloated, rewrite anything that drifted. Ten minutes of pruning beats six months of an agent confidently repeating things that stopped being true on a Tuesday in March.

Treat the memory the way you'd treat a CLAUDE.md: it's the agent's source of truth about your world, and a stale source of truth is worse than none.

The Curve Gets Steeper

Early in a project, you can let the agent rip. The blast radius of any change is small because there's barely anything to break. As the project matures, that inverts. Every new feature has to fit through a narrower hole — respect existing contracts, not regress tests you wrote three weeks ago, play nice with data already in production, and not surprise the six other places that import from the file you're touching.

Tom Cargill called the asymmetry the ninety-ninety rule: the first 90% of the code takes 90% of the time, and the last 10% takes the other 90%. With an agent in the loop, that curve gets steeper, not flatter. Early on, the agent is drafting in an empty room. Near launch, every change is a puzzle piece, and the agent has to hold most of the surrounding puzzle in its head to place it correctly.

Ousterhout calls the underlying phenomenon change amplification: a seemingly simple change requires modifications in many places. It's the first symptom of complexity, and it hits agent-driven workflows especially hard. The agent doesn't automatically see the three callers, the migration that depends on the schema, or the invariant encoded in the test suite two directories over. What a human engineer absorbs as tacit knowledge over months, the agent has to be fed fresh every session.

This is why I plan harder the closer I get to launch, not less. A thirty-minute planning session that surfaces "this breaks the email webhook" is cheaper by an order of magnitude than discovering it from a user report on day three. Boehm's cost-of-change curve — defects get roughly 100x more expensive to fix the later you catch them — was true before agents and is still true with them. The only thing that changes is who is doing the catching.

Practically, the effort shifts as the project matures:

Early: prompts are short, plans are loose, the agent improvises.
Middle: more research before the prompt, more pushback before the plan.
Late: the plan is the deliverable. I spend more time scoping the change, mapping what it touches, and writing the brief than I spend watching the agent type.

The agent doesn't get dumber as the project grows. The job just gets harder, and most of the extra work is supplying context the agent can't infer. Planning deeply is how you pay that tax upfront instead of in postmortems.

One caveat worth naming: this workflow has a shelf life. Agents are getting better at the exact things that make it necessary — persistent memory across sessions, absorbing a codebase's conventions without being hand-fed, reasoning over hundreds of thousands of lines at once. A year or two from now, most of what I've described here will probably read as period-specific scaffolding. That's fine. You build the guardrails the current tools need, and tear them down when the tools outgrow them.

The Loop

Zoom out and the whole thing is a loop: research, pushback, plan, second opinion, reconcile, implement, simplify, scan. Each step is cheap. None of them on its own catches everything. Stacked together, they catch most of what matters, and they do it fast enough that I'm still shipping faster than I would without the agent.

The tools are a lever. The workflow is the fulcrum.

Stay rigorous, keep the agent honest, and may your diffs stay small.

Vibe Coding with Guardrails: My Workflow for Shipping with AI Agents

The Problem with Pure Vibe

Step 1: Research Before You Prompt

Step 2: Invite the Pushback

Step 3: Plan Mode, With a Second Opinion

Step 4: Reconcile, Then Implement

Step 5: `/simplify` When the Diff Gets Ugly

Custom Subagents

Periodic Sweeps

Steer Your Own Compaction

Groom the Memory

The Curve Gets Steeper

The Loop

A Practical Guide to Optimizing Your Finances

Shower Thoughts: Six Ideas Worth Sitting With

The Problem with Pure Vibe

Step 1: Research Before You Prompt

Step 2: Invite the Pushback

Step 3: Plan Mode, With a Second Opinion

Step 4: Reconcile, Then Implement

Step 5: /simplify When the Diff Gets Ugly

Custom Subagents

Periodic Sweeps

Steer Your Own Compaction

Groom the Memory

The Curve Gets Steeper

The Loop

A Practical Guide to Optimizing Your Finances

Shower Thoughts: Six Ideas Worth Sitting With

Step 5: `/simplify` When the Diff Gets Ugly