sightful.
Insight

We Tested 5 Spec-Driven Development Tools. Here's What Actually Worked.

Real data from building the same project five different ways. Vibe coding vs Ralph Loops vs GSD vs BMAD vs Spec Kit—with actual line counts and honest verdicts.

Matthias Walter

We Tested 5 Spec-Driven Development Tools. Here's What Actually Worked.

Everyone has opinions about AI-assisted development methodologies. Most opinions come from reading documentation, watching demos, or repeating what someone else said on Twitter.

We wanted data. So we built the same project five times, using five different approaches.

The Test

Project: A CLI time tracker in Node.js/TypeScript

  • track start [task] - start timing
  • track stop - stop current task
  • track list - show today's entries
  • track summary - show time per task

Simple enough to complete in one session. Complex enough to reveal differences.

The Five Approaches:

  1. Vibe coding - No planning, just prompting
  2. Ralph Loops - Fresh context each iteration
  3. GSD - "Get Shit Done" phased approach
  4. BMAD - Scale-adaptive methodology
  5. Spec Kit - GitHub's enterprise framework

All five produced working code. The differences were in everything else.

The Numbers

ApproachPlanning DocsSource CodeDocs:Code Ratio
Vibe0 lines209 lines0:1
Ralph177 lines385 lines0.46:1
BMAD156 lines279 lines0.56:1
GSD318 lines359 lines0.89:1
Spec Kit1,724 lines610 lines2.8:1

Read that last row again. 2.8 lines of documentation for every line of code. For a CLI time tracker.

What We Learned

Vibe Coding Works (Until It Doesn't)

Zero overhead. One 209-line file. Shipped in the least time.

The catch: one commit with everything. No structure. If something breaks next month, you're reading 209 lines to figure out why.

For throwaway prototypes, vibe coding is fine. For anything you'll maintain, you're borrowing time from your future self.

Ralph Loops Are Underrated

The Ralph technique has the best overhead-to-structure ratio. 177 lines of planning docs produced clean, atomic commits and maintainable code.

The key insight: memory doesn't live in the AI's context window. It lives in files and git history. When context fills up, a fresh agent picks up where the last left off. The plan file is the handoff mechanism.

If you want structure without ceremony, Ralph is the sweet spot.

GSD Delivers What It Promises

GSD's STATE.md approach tracks progress across conversations. You can close the terminal, come back tomorrow, and the agent knows where you left off.

The 318 lines of planning docs produced the cleanest file structure—seven files with clear separation of concerns. But there's a 77-file framework to install first.

For solo developers on real projects, GSD works. The overhead pays off when projects span multiple sessions.

BMAD Is Promising But Unproven

BMAD claims "scale-adaptive intelligence"—automatically adjusting planning depth based on project complexity. For a CLI tool, it produced light documentation and working code.

Whether it actually scales up for complex projects? Unknown. The methodology is newer and less battle-tested than alternatives.

Spec Kit Is Enterprise Theater

1,724 lines of documentation. 66 tasks. For a four-command CLI tool.

This isn't a criticism—it's a design choice. Spec Kit is built for teams with stakeholders who need to review specs, compliance requirements, and formal handoff processes.

If you need accountability and audit trails, Spec Kit provides them. If you're a solo developer, you're writing specs for an audience of one.

The Real Trade-off

Here's what the numbers don't show: context degradation.

AI agents get worse as context grows. The more history in a conversation, the more likely the model forgets earlier decisions, contradicts itself, or hallucinates.

Vibe coding puts everything in one conversation. Ralph Loops start fresh each iteration. That's not just philosophy—it's practical context management.

The approaches with the best docs:code ratios (Ralph, BMAD) also have the best context hygiene. That's not a coincidence.

Our Verdict

SituationUse This
Throwaway prototypeVibe coding
Solo dev, quick featureRalph Loops
Multi-session projectGSD
Team with stakeholdersSpec Kit
"It depends"BMAD (but verify the claims)

There's no universal best. The right tool depends on your context: team size, project duration, maintenance requirements, and how much ceremony you're willing to tolerate.

But if you're a solo developer building real software? Ralph Loops or GSD. The data supports it.

What This Means for Your Projects

Most developers skip evaluation and grab whatever tool has the best marketing. That's how you end up with 1,724 lines of documentation for a time tracker.

Take an hour. Build the same small project with two or three approaches. See what fits your brain.

The methodology that feels right during a quick test is probably wrong. The one that produces maintainable code with reasonable overhead? That's the one to scale.


Want help evaluating AI development methodologies for your team? Book a free consultation and we'll figure out what fits your context.

Weekly Insights on Building with Claude Code

Get practical tips on AI-assisted development, Claude Code patterns, and building software faster.

No spam. Unsubscribe anytime.

Ready to implement this?

Let's discuss how we can help your team adopt AI-assisted development.