The Framework: How Agentic Development Actually Works

In the last post, I told the story of how I went from using AI as a code completion tool to building a full agentic development framework in three weeks. That post was the "what happened." This one is the "how it works."

What follows is the framework I've built through three major iterations, refined by real projects and real failures. It's not theoretical — it's the actual process I use every day. It has five phases, a library of specialized skills, and an orchestration pattern designed to solve the single biggest problem in agentic development: context drift.

The Core Philosophy

Before the mechanics, the principles. Everything in this framework flows from a few hard-won beliefs:

Context drift is the primary adversary. Over the course of a long session, the AI will start dropping things from its working memory. Not because it's broken — because it has to manage a finite context window. The things it drops are often the things you care most about: architectural patterns, testing standards, naming conventions. The entire supervisor/worker architecture exists to combat this.

Separate concerns between human-driven discovery, AI-assisted specification, and fully orchestrated development. Each phase uses different tools and has different context boundaries. Clean handoffs between phases prevent the accumulated assumptions of one phase from leaking into the next.

Vertical slices over horizontal layers. Stories deliver user-visible functionality, not technical layers. Foundation work is bounded to the minimum needed to unblock the first milestone. This is scrum orthodoxy, and it applies even more strongly in agentic development because the cost of course correction is lower.

Trust but verify. The AI does the work, but independent agents check the work. Self-review is unreliable — the same biases that led to a decision will confirm that decision was correct. Separate review agents with fresh context catch what self-review misses.

Phase 1: Discovery & Capture

Tool: Otter.ai
Output: Raw transcript

This is where it all starts, and it's deliberately low-tech. I hit record and talk. If I'm working with a client, we talk together. There's no required structure — the goal is to get every idea, requirement, concern, and tangential thought captured. I call it "puking it out." You go down one rabbit hole, realize you need to go back, chase another thread — that's fine. Get it all on tape.

The key insight is that this conversation is the highest-bandwidth way to transfer domain knowledge. Trying to write a requirements document from scratch is slow and lossy. Talking about what you want, how it should work, what the edge cases are — that captures nuance that formal documentation misses. Otter.ai transcribes it in real time, and the transcript becomes the input artifact for the next phase.

Phase 2: PRD Refinement

Tool: Claude Web Interface
Output: PRD + Screen Inventory + Decisions Log

I upload the transcript to Claude in the web interface and ask it to make sense of the conversation and produce a first-pass Product Requirements Document. Claude is remarkably good at this — it organizes the chaos, identifies the themes, and produces something structured from something freeform.

Then we enter what I think of as the convergence loop. I ask Claude: "Do you have any questions, or am I missing anything?" It always does. It identifies gaps I hadn't considered, asks clarifying questions, raises edge cases. I answer, refine, and ask again. When it's running out of pertinent questions and starting to suggest marginal things — that diminishing return is the exit signal. Time to move on.

My friend Mike introduced me to what I call the "Joseph Maneuver" — just keep asking Claude "what am I missing?" or "what should I be thinking about that I'm not?" It's simple, but it's incredibly effective. Claude's suggestions will go from "you haven't defined your authentication model" to "have you considered what happens if the user's timezone changes mid-session." When you've gone from critical to trivial, you're done.

During this phase, I also define the engineering scope: tech stack, compilable components (what each deployable unit is and what it does), and architectural constraints. I learned the hard way that if you don't explicitly define service boundaries, Claude Code will make assumptions about them during development — and those assumptions might include services talking directly to databases when you expected them to go through an API.

The phase produces three artifacts: the PRD itself, a Screen Inventory (every screen, every modal, every navigation flow, every data dependency), and a Decisions Log. The Decisions Log is something I added after learning that Claude Code benefits enormously from knowing not just what was decided, but why. Each entry follows a simple pattern: what was decided, what alternatives were considered, and why this choice was made.

Phase 3: Project Bootstrap

Tool: Claude Code
Output: Initialized repo with milestoned backlog

This is the handoff from Claude Web to Claude Code — a clean context boundary. Claude Code gets the PRD, Screen Inventory, and Decisions Log, along with my engineering standards (more on those in a moment).

I have a skill called prd-to-backlog that decomposes the PRD into a milestoned backlog of vertical-slice stories. It reads everything, identifies foundation work (kept minimal), groups features into milestones, and creates GitHub issues with proper labels and milestone markers. Before creating anything, it presents the full plan for human review. I learned not to skip this step.

The backlog uses GitHub Projects with a Kanban board: Inbox, Up Next, In Progress, Waiting/Blocked, Done. Stories follow vertical-slice principles — entity lifecycle operations (create, list, view, delete) travel together as one story, not four.

The Engineering Standards Repo

This deserves its own section because it's the thing that makes everything else work.

After my v1 experience — where Claude made architectural assumptions I didn't agree with and drifted away from patterns I considered non-negotiable — I created a separate repository containing everything I care about: how I write code, how I structure projects, how I test, how I handle errors, how I manage dependencies. All of it.

The repo contains standards for architecture, API design, database patterns, testing (including my strong preference for real fakes over mocking frameworks), security, logging, error handling, git workflow, and more. It also contains the skills that drive the development pipeline, issue templates, and a project-level CLAUDE.md template that every new project uses.

Every time Claude Code starts a session — whether it's a new project or picking up an existing one — it pulls the latest standards, reads them, and syncs the skills. This is the anti-drift mechanism. No matter how long a session runs, every new worker agent starts by re-reading the standards. The non-negotiables are always fresh.

Phase 4: Orchestrated Development

Tool: Claude Code with the orchestrate skill
Output: Implemented, reviewed, tested, merged code

This is where the supervisor/worker pattern does its thing. The orchestrator picks up a ticket from the backlog and runs it through a multi-stage pipeline, spawning fresh sub-agents for each stage:

Stage 1: Refine. A dedicated agent enriches the story with technical detail and acceptance criteria, informed by the current state of the codebase and the engineering standards.

Stage 2: Implement. A fresh agent implements the story and creates a pull request. This is a clean context — it reads the standards, understands the story, and writes code without carrying baggage from previous stories.

Stage 3: Engineering Review. Another fresh agent reviews the PR against the engineering standards. If it finds issues, it kicks the PR back to the implementer (in fix mode) for corrections. This can loop up to three times. The key here is that the reviewer has independent context — it's not confirming its own assumptions.

Stage 4: Security Review. A separate agent runs an OWASP Top 10 security analysis. Same loop pattern — up to two iterations of findings and fixes. When it passes, the PR gets merged.

Stage 5: Integration Tests. After the implementation PR is merged, a new agent writes integration tests on a separate PR. Those tests get their own engineering review cycle before being merged.

Stage 6: Close. The orchestrator verifies acceptance criteria are met and closes the issue.

The orchestrator can run in supervised mode (hard stop between each ticket, for early-stage projects or when you want close oversight) or autonomous mode (continues ticket-to-ticket, stopping only at milestones or circuit breakers).

Milestone Gates and Circuit Breakers

Milestones are review gates — they're the moments where I stop, launch the application, interact with it, and verify it matches what I expected. Each milestone delivers a coherent set of user-visible functionality. The orchestrator stops at every milestone regardless of operating mode, runs a smoke test checklist, and waits for my approval before continuing.

Circuit breakers are the emergency stops. Even in autonomous mode, the orchestrator halts if three consecutive tickets are blocked (something systemic is wrong), if a merge conflict occurs (needs human judgment), if the build breaks on main after a merge (rollback tag available), or if review iterations are exhausted on consecutive tickets (pattern problem, not a one-off).

Before any work begins on a ticket, the orchestrator tags main as a rollback point. If something goes wrong, there's always a clean state to return to.

Phase 5: Session Summary & Review

Tool: Human + Claude Code
Output: Session report, refined standards

After a development session, the orchestrator produces a summary: what was completed, what's blocked, what milestones were reached, token usage per stage (for cost tracking), and any PRD amendments discovered during development — things that contradict or are missing from the original PRD.

Then I run the application, test flows manually, and debug with Claude Code as needed. Insights from this phase feed back into the engineering standards, the project-level documentation, and the skills themselves. The framework gets better with every project.

AgentZula: The Process in Action

The project I'm building right now — AgentZula — is the first one built entirely within this v3 framework. It's a developer work-tracking and productivity system that passively collects activity signals from Claude Code sessions and GitHub repos, maintains a work log for invoicing, and runs an LLM-powered agent that monitors daily goals, manages a task list, and sends nudges when it's time to switch focus.

It started the way everything starts now: I hit record, talked through what I wanted for about twenty minutes, fed the transcript to Claude Web, refined the PRD through several rounds of the convergence loop, produced a Decisions Log, and handed the whole package to Claude Code. The repo now has a milestoned backlog, a clean architecture, and development is proceeding ticket by ticket through the orchestrated pipeline.

It's also a product I actually need — a system that tracks what I'm doing across multiple client projects so I can invoice accurately and stay on task. The fact that I'm building it with the very process it's designed to support is either poetic or recursive, depending on your perspective.

What This Means

I'm not claiming this is the only way to do agentic development, or even the best way. It's the way I've arrived at after three iterations, multiple projects, and a lot of trial and error. What I am claiming is that agentic development requires a framework — a deliberate structure for how you interact with the AI — and that most of the failure modes people experience aren't limitations of the AI itself. They're architecture problems.

Drift isn't a bug in Claude. It's a consequence of finite context windows that you can engineer around. Hallucinated architecture isn't the AI being stupid. It's the AI filling in gaps you left open. Code quality issues aren't proof that AI can't write good code. They're what happens when you don't define what "good" means and enforce it with independent review.

The framework solves these problems. Not perfectly — nothing is perfect — but systematically.

This blog will continue to document the journey as it evolves. There's more to learn, more to refine, and more to share. If you're a developer thinking about moving into agentic development, or a business leader trying to understand what this means for your organization, I hope this gives you a concrete picture of what the work actually looks like.

It looks like the future. And it's already here.

Kevin Phifer is the founder of Theoretically Impossible Solutions LLC, specializing in agentic AI development and consulting. You can reach him at kevin.phifer@theoreticallyimpossible.org.

← Back to Blog