There's a spectrum of AI coding assistance. On one end, you have autocomplete: the model finishes your line. On the other end, you have full autonomy: the model takes a product requirement and ships a working feature.

Most teams are somewhere in the middle. At Horizon, we've been pushing toward the autonomous end with two systems: our all-in-one workflows and an internal tool called Ralph. Both attempt to answer the same question: how much of the development lifecycle can be automated while maintaining quality?

All-in-One Workflows

Our first step toward autonomy was building multi-phase workflows that chain together the Spec-Driven Development process into a single, orchestrated pipeline.

The workflow runs through six phases:

Phase 1: Setup. The agent creates a feature branch, sets up the working environment, and pulls the relevant context.

Phase 2: Research & Planning. A sub-agent flow that reads the codebase, produces a research document, then generates an implementation plan. This phase spawns multiple sub-agents that work in parallel to analyze different parts of the codebase.

Phase 3: Implementation. Another sub-agent flow that executes the plan. Each step in the plan becomes a task for a sub-agent. The agents write code, run tests, and iterate until the implementation matches the plan.

Phase 4: Manual Testing. A prompt-based checkpoint where the engineer runs manual tests to verify the implementation works correctly. This is a human gate. No automation here. Check all functional and non-functional requirements.

Phase 5: Review Result. A decision point with three options: Approved (proceed to merge), Changes Requested (send back with feedback), or Rejected (close the PR and document why).

If approved, the workflow proceeds to commit and PR creation, then merges to the development branch. If changes are requested, the workflow routes to an "Address CTO Feedback" step where the agent makes the requested changes, then loops back to Phase 4. If rejected, the PR is closed and the rejection reason is documented for future reference.

The entire flow is orchestrated visually. You can see each phase, the decision points, and the routing logic. It's a development pipeline where AI does the work and humans make the decisions at checkpoints.

Ralph: The Autonomous Agent Loop

Ralph takes this further. Named after... well, let's just say he's enthusiastic if not always brilliant. But in this case, the simplicity is the point.

Ralph is an autonomous agent loop that processes a PRD end-to-end. The flow:

You write a PRD. Define what you want to build in plain language, structured as a product requirements document.

Convert to prd.json. The PRD is broken into small, atomic user stories. Each story has an ID, title, acceptance criteria, and a passed: false flag.

Run ralph.sh. This starts the autonomous loop.

The loop: Ralph picks the next story where passed is false. It reads the story. It implements the code. It runs tests. If tests pass, it commits the changes, updates prd.json to set passed to true, logs to progress.txt, and also updates AGENTS.md with any patterns it discovered during implementation. Then it moves to the next story. If tests fail, it iterates on the implementation.

The loop continues until all stories are marked as passed, or until it gets stuck (in which case it logs the failure and a human intervenes).

The key detail: Ralph updates AGENTS.md with what it learned. This means each story it completes makes the codebase more documented for future iterations. The system gets smarter as it works.

What Works and What Doesn't

Both systems work well for well-defined, bounded tasks. Adding a new field to an existing model, creating a new API endpoint that follows established patterns, building a UI component that matches existing conventions. These are the kinds of stories where the documentation layer provides enough context for autonomous execution.

Where they struggle: tasks that require cross-cutting changes affecting multiple apps, tasks that need new architectural patterns not yet documented, and tasks with ambiguous acceptance criteria. In these cases, the human intervention rate goes up significantly.

We're also finding that the quality of the PRD matters enormously. A well-written PRD with clear acceptance criteria produces reliable autonomous execution. A vague PRD produces the same chaos it would with a human developer, just faster.

The Tech Stack

We run these workflows using Claude Code as the foundation, with Amp as the orchestrator for Ralph. The workflow system uses sub-agent flows that can spawn multiple parallel processes.

For code quality gates, we integrate Code Rabbit for automated review and Sentry Seer for error detection. These provide automated checkpoints between the AI implementation and the human review.

The orchestration is the key piece. Without it, you're running individual prompts. With it, you have a pipeline that handles the full lifecycle.

What's Next: Auto Testing + Auto Deploy

The frontier we're exploring is closing the loop entirely: a system that autonomously deploys features and evaluates them in production.

The vision: Ralph implements the feature, automated tests validate the implementation, the system deploys to a staging environment, automated evaluations check the feature against acceptance criteria, and if everything passes, it promotes to production.

We're not there yet. The deployment and evaluation steps still need human oversight. But the trajectory is clear. Each piece of the development lifecycle that we can automate reliably gets automated, while humans focus on the pieces that require judgment.

Should You Build This?

Honest answer: probably not yet, unless your documentation layer is solid.

Autonomous development loops amplify whatever state your codebase is in. If your architecture is clear, conventions are documented, and tests are comprehensive, autonomy works. If any of those are missing, autonomy produces bugs at machine speed.

Start with the foundations: AI-ready codebase, AGENTS.md per app, spec-driven development workflow. Once those are reliable, autonomous loops become a natural extension.

Written by

Bruno Ramos

Engineering Lead, Horizon

Bruno leads the engineering practice at Horizon. He designed the spec-driven workflow and the autonomous agent stack that lets the team ship complex features without losing architectural integrity.

Building Autonomous Development Loops: From PRD to Pull Request