Spec-Driven Development: Why Your AI Agent Needs a Contract

Here's a pattern I see constantly: someone fires up Claude Code, types "build me an authentication system," and gets... something. Maybe it works. Maybe it doesn't. Either way, it's probably not what they actually needed.

The problem isn't the AI. The problem is that nobody defined what "authentication system" means.

Enter Spec-Driven Development (SDD): the practice of writing specifications before writing code. It's not a new idea—formal specs have existed for decades. But with AI agents, specs aren't just documentation. They're the contract that tells the AI exactly what to build.

The Hallucination Problem

AI models hallucinate. Not sometimes. Always, to some degree. They fill in gaps with plausible-sounding details that may or may not be correct.

When you ask Claude Code to "add user authentication," it has to make hundreds of decisions:

What fields does a user have? Email? Username? Both?
How are passwords stored? bcrypt? argon2? scrypt?
What's the session mechanism? JWT? Server-side sessions?
How long do sessions last? 1 hour? 24 hours? Forever?
What happens when a session expires?
Are there roles? Permissions? Admin users?

Without a spec, the AI guesses. Sometimes it guesses well. Sometimes it doesn't. And you won't know which until you've dug through the generated code.

What is Spec-Driven Development?

SDD flips the traditional flow. Instead of:

Code first → Debug → Document (maybe) → Ship

You do:

Spec first → Validate spec → Generate code → Verify against spec

A spec is a detailed description of what the software should do. Not how—that's implementation. A spec defines:

Inputs: What data comes in, in what format
Outputs: What data comes out, in what format
Behavior: What transformations happen
Edge cases: What happens when things go wrong
Constraints: Performance requirements, security rules, etc.

Why Specs Matter for AI Agents

Specs solve the hallucination problem by giving the AI ground truth to check against.

Clear Instructions

Compare these prompts:

Vague: "Add a login endpoint"

Spec-driven:

## POST /api/auth/login

### Request
- Body: { email: string, password: string }
- Both fields required
- Email must be valid format

### Response (Success - 200)
- Body: { token: string, expiresAt: ISO8601 }
- Token is JWT with user ID and roles
- Expires in 24 hours

### Response (Failure - 401)
- Body: { error: "Invalid credentials" }
- Same response for wrong email or wrong password (security)

### Behavior
- Rate limit: 5 attempts per email per 15 minutes
- Lock account after 10 failed attempts
- Log all attempts (success and failure) for audit

The spec leaves nothing to interpretation. The AI knows exactly what to build.

Testable Criteria

Specs define when something is "done." Each requirement in the spec becomes a test:

Does the endpoint accept POST requests?
Does it require both email and password?
Does it return a JWT on success?
Does it rate limit after 5 attempts?

The AI can run these tests and verify its own work. No more "it looks right, I guess?"

Iteration Anchor

When the first implementation doesn't work, the spec tells you why. You're not debugging "is this what I wanted?" You're debugging "does this match the spec?"

This is especially powerful with Ralph Loops. Each iteration starts fresh, reads the spec, and tries again. The spec is the constant; the implementation is the variable.

How to Write Specs AI Agents Can Execute

Not all specs are created equal. Here's what makes a spec AI-friendly.

Be Explicit About Data Structures

Bad:

The endpoint returns user information.

Good:

### Response Body
{
  "user": {
    "id": "uuid",
    "email": "string",
    "displayName": "string | null",
    "createdAt": "ISO8601",
    "roles": ["admin" | "user"]
  }
}

Define Edge Cases

Bad:

Handle errors appropriately.

Good:

### Error Responses
- 400 Bad Request: Missing required fields, invalid email format
- 401 Unauthorized: Invalid credentials
- 403 Forbidden: Account locked
- 429 Too Many Requests: Rate limit exceeded
- 500 Internal Server Error: Database connection failed

Each error response includes: { "error": "message", "code": "ERROR_CODE" }

Include Examples

### Example: Successful Login

Request:
POST /api/auth/login
Content-Type: application/json
{
  "email": "user@example.com",
  "password": "correct-password"
}

Response: 200 OK
{
  "token": "eyJhbGciOiJIUzI1NiIs...",
  "expiresAt": "2026-01-19T12:00:00Z"
}

### Example: Failed Login

Request:
POST /api/auth/login
Content-Type: application/json
{
  "email": "user@example.com",
  "password": "wrong-password"
}

Response: 401 Unauthorized
{
  "error": "Invalid credentials",
  "code": "AUTH_INVALID_CREDENTIALS"
}

Separate Interface from Implementation

Your spec should define what, not how. Don't write:

Use bcrypt with 12 rounds to hash passwords.
Store sessions in Redis.

Instead, write:

- Passwords must be securely hashed (not plaintext, not MD5)
- Sessions must survive server restarts

Let the AI choose the implementation. You can always override if you have preferences, but the spec focuses on requirements.

Tools for Spec-Driven Development

GitHub Spec Kit

GitHub released spec-kit as an open-source toolkit for SDD. It provides:

Templates for common spec types (APIs, components, features)
Validation tools to check spec completeness
Integration with CI/CD pipelines

cc-sdd

cc-sdd brings Kiro-style spec commands to Claude Code. You can define specs in a structured format and have Claude Code parse and execute them.

claude-code-spec-workflow

claude-code-spec-workflow automates the spec → implement → verify cycle. It's designed specifically for iterative development with Claude Code.

SDD in Practice: A Real Workflow

Here's how I use SDD with Claude Code.

Step 1: Discovery

Before writing any spec, understand the problem. Talk to stakeholders, users, whoever knows what needs to be built. Take notes.

Step 2: Draft the Spec

Write the spec in markdown. I use a template:

# Feature: [Name]

## Overview
[2-3 sentences describing the feature]

## Requirements
- [ ] [Requirement 1]
- [ ] [Requirement 2]
...

## API Endpoints (if applicable)
### [METHOD] [Path]
...

## Data Models (if applicable)
...

## Business Rules
...

## Edge Cases
...

## Out of Scope
[Things explicitly NOT included]

Step 3: Validate the Spec

Before giving it to Claude Code, review:

Is every requirement testable?
Are there ambiguities an AI might misinterpret?
Are edge cases covered?
Is anything impossible or contradictory?

Step 4: Execute with Claude Code

claude --print "
Read the spec at docs/specs/auth-feature.md.

Implement the feature exactly as specified.
After each requirement, write a test that verifies it.
Mark the requirement as done in the spec when the test passes.
"

Step 5: Verify and Iterate

Run the tests. If something fails, check:

Is the spec clear? If not, clarify it.
Is the implementation wrong? If so, have Claude Code fix it.
Is the test wrong? If so, fix the test.

Repeat until all requirements pass.

When Specs Fail

Specs aren't magic. They can fail in several ways:

Incomplete Specs

You forgot a requirement. The AI built exactly what you asked for—it's just not what you needed. Solution: better discovery, more edge case thinking.

Contradictory Specs

Requirement A conflicts with Requirement B. The AI picks one (or worse, tries both). Solution: review specs for consistency before implementing.

Overly Rigid Specs

You specified implementation details that constrain the AI unnecessarily. Solution: focus on outcomes, not mechanisms.

Changing Requirements

The spec was right when you wrote it, but now the requirements changed. Solution: update the spec first, then regenerate.

The Spec-First Mindset

SDD isn't just a technique. It's a mindset shift.

Old thinking: "I'll know it when I see it" Spec-first thinking: "I'll define it before I build it"

This discipline pays off beyond AI. Specs force you to think clearly about what you're building. They catch design flaws early. They create documentation automatically. They make testing straightforward.

AI just amplifies the benefits. A human developer might successfully interpret a vague requirement. An AI definitely won't—or rather, it will interpret it, just not the way you wanted.

Start Today

You don't need fancy tools to start with SDD. Grab a markdown file and write:

What does this feature do?
What are the inputs and outputs?
What happens when things go wrong?
How will I know it's working?

That's a spec. Give it to Claude Code. See what happens.

I guarantee you'll get better results than "build me an authentication system."

Ready to implement spec-driven development in your projects? Book a free consultation and let's discuss how to structure AI-assisted development for your team.