Most teams using AI for coding jump straight to implementation. They open Cursor or Claude Code, describe what they want, and let the model write code. This works beautifully for greenfield projects. Start from scratch, describe the feature, get working code.
But for brownfield codebases (complex, existing repositories with years of accumulated decisions), this approach breaks down. The model doesn't know your architecture. It doesn't know why you chose Django over FastAPI, why that service is async, or why the data model looks the way it does.
At Horizon, we developed a four-step workflow that solves this. We call it Spec-Driven Development.
The Core Idea: Thinking Cannot Be Delegated
Before explaining the workflow, here's the principle behind it.
There's a leverage hierarchy in software development. One bad line of code produces one bad line of code. One bad line of plan produces 10 to 100 bad lines of code (wrong solution). One bad line of research produces 1,000+ bad lines of code (misunderstanding the system). One bad line of specification produces 10,000+ bad lines of code (wrong problem).
The highest-leverage work happens before a single line of code is written. AI amplifies your thinking. If your thinking is good, the amplification produces great results. If your thinking is bad, the amplification produces a spectacular mess.
The developer is at the center of the entire process. They enter at the points of highest importance. AI handles the mechanical parts. This is not about replacing engineers. It's about putting human judgment where it matters most.
The 4 Steps
Step 1: Spec
What it is: Define what you want and why.
This is a product/business need expressed clearly. Not a technical specification yet. Just: what problem are we solving, for whom, and what does success look like?
The spec is written by a human. Always. This is where thinking happens. A bad spec ruins everything downstream regardless of how good the AI is.
Example of a good spec: "Users need to see their discovery cycle progress in real time. Currently they check back manually after days. The progress should update live on their dashboard, showing completion percentage per conversation and overall cycle status."
Example of a bad spec: "Add real-time progress to the dashboard."
Step 2: Research
What it is: The AI analyzes the current codebase and produces a compressed AS-IS document.
Given the spec, the AI agent reads through the relevant parts of the codebase and produces a research document that answers: how does this area of the code work today? What models are involved? What services handle this logic? What are the current patterns?
The output is a compacted representation of the current state. Not the full code, but a structured summary of what exists and how it connects.
Critical: This step has a HUMAN REVIEW gate. The engineer reads the research document and validates it. Did the AI correctly understand the system? Did it miss anything? Is the AS-IS accurate?
This review is fast (minutes, not hours) because you're reading a summary, not the code itself. But it catches fundamental misunderstandings before they propagate.
Step 3: Plan
What it is: The AI combines the spec and research to produce a TO-BE plan.
Given what needs to happen (spec) and how things work today (research), the AI produces an implementation plan. What files need to change? What new models or services are needed? What's the sequence of changes?
The output is another compacted document: the plan.
Critical: This also has a HUMAN REVIEW gate. The engineer validates the approach. Is this the right architecture? Are we following our patterns? Does this plan account for edge cases?
Again, fast to review because it's a plan, not code. But this is where architectural mistakes get caught.
Step 4: Implement
What it is: The AI executes the plan and produces expanded code.
With a validated research document and an approved plan, the model now writes code. Because it has the right context (from research) and the right approach (from the plan), the code is significantly better than if you'd just asked it to "add real-time progress."
The implementation follows the plan step by step. Each file change is scoped and intentional. The model isn't guessing about architecture because the plan already defined it.
Why This Order Matters
The workflow is sequential and each step depends on the previous one. But more importantly, the human review gates are placed at the two highest-leverage points:
After Research (did the AI understand the system?) and after Plan (is the approach correct?).
If both are right, the implementation is almost always right. If either is wrong, no amount of coding skill (human or AI) will save you.
We've seen this play out consistently. When engineers skip the research review, they end up with code that looks correct but violates an architectural pattern that wasn't documented. When they skip the plan review, they get a working feature built in the wrong way.
How We Implement This Technically
Each step runs as a separate AI agent session. The Research agent has access to the codebase and produces a research.md file. The Planning agent reads research.md and the spec, then produces plan.md. The Implementation agent reads plan.md and executes.
We use Claude Code with sub-agents for each phase. The agents have access to system instructions, CLAUDE.md (per-app context), built-in tools, and MCP tools. Each phase spawns sub-agents that do the detailed work, then consolidate into the output document.
The human review happens between phases. The engineer reads the .md file, approves or corrects it, and triggers the next phase.
What Changes When You Work This Way
Development feels different. You spend more time thinking about what to build and validating understanding. You spend less time writing code manually. The code that gets produced is more consistent because it follows explicit plans rather than ad-hoc decisions.
For teams working on brownfield codebases, this is the difference between AI that kind of helps and AI that genuinely transforms your velocity.