We spent six months trying to make AI coding tools work. Cursor, Claude Code, Devin, Codex. We threw everything at our codebase. The models made changes without clear criteria. They modified code without understanding context. Development velocity didn't improve the way we expected.

The instinct was to blame the models. But after months of frustration, we realized something that changed everything: the problem wasn't the AI. The problem was our code.

The Real Problem

When we looked closely at why AI tools were underperforming, we found a pattern. It wasn't that the models were bad at coding. It was that our codebase was giving them bad inputs.

Three things were broken:

Ambiguous code. Our codebase had unclear responsibilities. When we asked a model to add a feature, it couldn't tell where the logic should live because we hadn't made that clear either. If a senior engineer needs 30 minutes to understand where something goes, a model has zero chance.

Too much context, or not enough. We had files with hundreds of lines that couldn't fit in a model's context window. Or we'd give the model a task but leave out critical information about how that piece connected to the rest of the system. The model would generate perfectly reasonable code that was completely wrong for our architecture.

No defined architecture of responsibilities. Services, models, views, selectors... we were using our framework (Django) without strict conventions. Every developer had their own style. Every module had its own implicit patterns. The AI couldn't learn patterns that didn't exist consistently.

What We Actually Did

We didn't buy a better tool or switch models. We restructured our entire approach in three moves.

1. We restructured the codebase into smaller, isolated apps

We broke our monolith into focused apps, each with a clear purpose: Analytics App, Core App, Company App, Assistant App (SuperConsultant), Interview App, Discovery Cycle App, Processes App, Knowledge Graph App, Initiative App, Insights App, and more.

Each app is small enough that its entire relevant context fits in a model's context window. When an AI tool works inside the Company App, it doesn't need to understand the Insights App. The boundaries are explicit.

We also partitioned large files. No more 500-line service files. If a file can't fit in context alongside the relevant docs, it's too big.

2. We defined and documented everything

For every app, we wrote explicit documentation about how code should be written. Not vague style guides. Concrete rules.

For example, our selector guidelines specify exactly what belongs in a selector (read-only queries, QuerySet construction, aggregations, data preparation for display) and what doesn't (write operations, side effects, business logic, state modification, transactions).

We wrote these rules as markdown documents and diagrams. Not as onboarding material for humans, but as context for AI agents. The documentation serves both, but the primary consumer is the model.

3. We use bugs as our evaluation metric

This is the metric that made everything concrete: what percentage of reported bugs can be resolved with a single prompt?

If you give an AI tool a bug report and it can fix it in one shot, your codebase is AI-ready for that area. If it takes three rounds of back-and-forth, your documentation is incomplete or your code structure is ambiguous.

We track this percentage over time. When it drops for a specific app, we know we need to improve the documentation or refactor the structure. It's a direct, measurable signal of AI-readiness.

The Results

After implementing these changes:

60% reduction in delivery time. We went from multi-week feature cycles to shipping an average of 30 new features per month. The restructured codebase lets AI tools do the mechanical work while engineers focus on architecture and design.

76% increase in team productivity. Not because anyone is working harder, but because the leverage is real. Engineers spend their time on the problems that matter, while well-structured code lets AI handle the rest.

40% fewer bugs in production. This surprised us. Cleaner architecture and explicit patterns didn't just help AI write code. They helped everyone write better code. Clear boundaries reduce the surface area for mistakes.

What This Means For You

If your team is frustrated with AI coding tools, don't switch tools. Look at your codebase. Ask yourself:

Can a model understand where new code should go without being told? Are your files small enough to fit in context? Do you have explicit, written conventions for how every type of code should be structured? Can you measure how AI-ready your code actually is?

If the answer to any of these is no, that's where the work is. The models are good enough. The question is whether your codebase is good enough for them.

Written by

Miguel Langone

CTO & Co-Founder, Horizon

Miguel co-founded Horizon and leads engineering. He architected Horizon's context graph and the AI-native codebase that powers it, and writes about how to build software with AI agents in the loop.

Why Your Codebase Isn't AI-Ready (And What We Did About It)