We Tested 5 Spec-Driven Development Tools. Here's What Actually Worked.
Real data from building the same project five different ways. Vibe coding vs Ralph Loops vs GSD vs BMAD vs Spec Kit—with actual line counts and honest verdicts.
We Tested 5 Spec-Driven Development Tools. Here's What Actually Worked.
Everyone has opinions about AI-assisted development methodologies. Most opinions come from reading documentation, watching demos, or repeating what someone else said on Twitter.
We wanted data. So we built the same project five times, using five different approaches.
The Test
Project: A CLI time tracker in Node.js/TypeScript
track start [task]- start timingtrack stop- stop current tasktrack list- show today's entriestrack summary- show time per task
Simple enough to complete in one session. Complex enough to reveal differences.
The Five Approaches:
- Vibe coding - No planning, just prompting
- Ralph Loops - Fresh context each iteration
- GSD - "Get Shit Done" phased approach
- BMAD - Scale-adaptive methodology
- Spec Kit - GitHub's enterprise framework
All five produced working code. The differences were in everything else.
The Numbers
| Approach | Planning Docs | Source Code | Docs:Code Ratio |
|---|---|---|---|
| Vibe | 0 lines | 209 lines | 0:1 |
| Ralph | 177 lines | 385 lines | 0.46:1 |
| BMAD | 156 lines | 279 lines | 0.56:1 |
| GSD | 318 lines | 359 lines | 0.89:1 |
| Spec Kit | 1,724 lines | 610 lines | 2.8:1 |
Read that last row again. 2.8 lines of documentation for every line of code. For a CLI time tracker.
What We Learned
Vibe Coding Works (Until It Doesn't)
Zero overhead. One 209-line file. Shipped in the least time.
The catch: one commit with everything. No structure. If something breaks next month, you're reading 209 lines to figure out why.
For throwaway prototypes, vibe coding is fine. For anything you'll maintain, you're borrowing time from your future self.
Ralph Loops Are Underrated
The Ralph technique has the best overhead-to-structure ratio. 177 lines of planning docs produced clean, atomic commits and maintainable code.
The key insight: memory doesn't live in the AI's context window. It lives in files and git history. When context fills up, a fresh agent picks up where the last left off. The plan file is the handoff mechanism.
If you want structure without ceremony, Ralph is the sweet spot.
GSD Delivers What It Promises
GSD's STATE.md approach tracks progress across conversations. You can close the terminal, come back tomorrow, and the agent knows where you left off.
The 318 lines of planning docs produced the cleanest file structure—seven files with clear separation of concerns. But there's a 77-file framework to install first.
For solo developers on real projects, GSD works. The overhead pays off when projects span multiple sessions.
BMAD Is Promising But Unproven
BMAD claims "scale-adaptive intelligence"—automatically adjusting planning depth based on project complexity. For a CLI tool, it produced light documentation and working code.
Whether it actually scales up for complex projects? Unknown. The methodology is newer and less battle-tested than alternatives.
Spec Kit Is Enterprise Theater
1,724 lines of documentation. 66 tasks. For a four-command CLI tool.
This isn't a criticism—it's a design choice. Spec Kit is built for teams with stakeholders who need to review specs, compliance requirements, and formal handoff processes.
If you need accountability and audit trails, Spec Kit provides them. If you're a solo developer, you're writing specs for an audience of one.
The Real Trade-off
Here's what the numbers don't show: context degradation.
AI agents get worse as context grows. The more history in a conversation, the more likely the model forgets earlier decisions, contradicts itself, or hallucinates.
Vibe coding puts everything in one conversation. Ralph Loops start fresh each iteration. That's not just philosophy—it's practical context management.
The approaches with the best docs:code ratios (Ralph, BMAD) also have the best context hygiene. That's not a coincidence.
Our Verdict
| Situation | Use This |
|---|---|
| Throwaway prototype | Vibe coding |
| Solo dev, quick feature | Ralph Loops |
| Multi-session project | GSD |
| Team with stakeholders | Spec Kit |
| "It depends" | BMAD (but verify the claims) |
There's no universal best. The right tool depends on your context: team size, project duration, maintenance requirements, and how much ceremony you're willing to tolerate.
But if you're a solo developer building real software? Ralph Loops or GSD. The data supports it.
What This Means for Your Projects
Most developers skip evaluation and grab whatever tool has the best marketing. That's how you end up with 1,724 lines of documentation for a time tracker.
Take an hour. Build the same small project with two or three approaches. See what fits your brain.
The methodology that feels right during a quick test is probably wrong. The one that produces maintainable code with reasonable overhead? That's the one to scale.
Want help evaluating AI development methodologies for your team? Book a free consultation and we'll figure out what fits your context.
Weekly Insights on Building with Claude Code
Get practical tips on AI-assisted development, Claude Code patterns, and building software faster.
No spam. Unsubscribe anytime.