We Tested 5 Spec-Driven Development Tools. Here's What Actually Worked.

Everyone has opinions about AI-assisted development methodologies. Most opinions come from reading documentation, watching demos, or repeating what someone else said on Twitter.

We wanted data. So we built the same project five times, using five different approaches.

The Test

Project: A CLI time tracker in Node.js/TypeScript

track start [task] - start timing
track stop - stop current task
track list - show today's entries
track summary - show time per task

Simple enough to complete in one session. Complex enough to reveal differences.

The Five Approaches:

Vibe coding - No planning, just prompting
Ralph Loops - Fresh context each iteration
GSD - "Get Shit Done" phased approach
BMAD - Scale-adaptive methodology
Spec Kit - GitHub's enterprise framework

All five produced working code. The differences were in everything else.

The Numbers

Approach	Planning Docs	Source Code	Docs:Code Ratio
Vibe	0 lines	209 lines	0:1
Ralph	177 lines	385 lines	0.46:1
BMAD	156 lines	279 lines	0.56:1
GSD	318 lines	359 lines	0.89:1
Spec Kit	1,724 lines	610 lines	2.8:1

Read that last row again. 2.8 lines of documentation for every line of code. For a CLI time tracker.

What We Learned

Vibe Coding Works (Until It Doesn't)

Zero overhead. One 209-line file. Shipped in the least time.

The catch: one commit with everything. No structure. If something breaks next month, you're reading 209 lines to figure out why.

For throwaway prototypes, vibe coding is fine. For anything you'll maintain, you're borrowing time from your future self.

Ralph Loops Are Underrated

The Ralph technique has the best overhead-to-structure ratio. 177 lines of planning docs produced clean, atomic commits and maintainable code.

The key insight: memory doesn't live in the AI's context window. It lives in files and git history. When context fills up, a fresh agent picks up where the last left off. The plan file is the handoff mechanism.

If you want structure without ceremony, Ralph is the sweet spot.

GSD Delivers What It Promises

GSD's STATE.md approach tracks progress across conversations. You can close the terminal, come back tomorrow, and the agent knows where you left off.

The 318 lines of planning docs produced the cleanest file structure—seven files with clear separation of concerns. But there's a 77-file framework to install first.

For solo developers on real projects, GSD works. The overhead pays off when projects span multiple sessions.

BMAD Is Promising But Unproven

BMAD claims "scale-adaptive intelligence"—automatically adjusting planning depth based on project complexity. For a CLI tool, it produced light documentation and working code.

Whether it actually scales up for complex projects? Unknown. The methodology is newer and less battle-tested than alternatives.

Spec Kit Is Enterprise Theater

1,724 lines of documentation. 66 tasks. For a four-command CLI tool.

This isn't a criticism—it's a design choice. Spec Kit is built for teams with stakeholders who need to review specs, compliance requirements, and formal handoff processes.

If you need accountability and audit trails, Spec Kit provides them. If you're a solo developer, you're writing specs for an audience of one.

The Real Trade-off

Here's what the numbers don't show: context degradation.

AI agents get worse as context grows. The more history in a conversation, the more likely the model forgets earlier decisions, contradicts itself, or hallucinates.

Vibe coding puts everything in one conversation. Ralph Loops start fresh each iteration. That's not just philosophy—it's practical context management.

The approaches with the best docs:code ratios (Ralph, BMAD) also have the best context hygiene. That's not a coincidence.

Our Verdict

Situation	Use This
Throwaway prototype	Vibe coding
Solo dev, quick feature	Ralph Loops
Multi-session project	GSD
Team with stakeholders	Spec Kit
"It depends"	BMAD (but verify the claims)

There's no universal best. The right tool depends on your context: team size, project duration, maintenance requirements, and how much ceremony you're willing to tolerate.

But if you're a solo developer building real software? Ralph Loops or GSD. The data supports it.

What This Means for Your Projects

Most developers skip evaluation and grab whatever tool has the best marketing. That's how you end up with 1,724 lines of documentation for a time tracker.

Take an hour. Build the same small project with two or three approaches. See what fits your brain.

The methodology that feels right during a quick test is probably wrong. The one that produces maintainable code with reasonable overhead? That's the one to scale.

Want help evaluating AI development methodologies for your team? Book a free consultation and we'll figure out what fits your context.