The SpecFlow Process For Using AI For Coding
How SpecStory helps you code more controlled with AI and capture the “Why” Behind AI-Generated Code
I was thrilled to host Jake and Greg, the co-founders of SpecStory, for a recent talk in our Elite AI-Assisted Coding Course. I use their tools all day, every day (their cloud, their API, their local extension) and I believe they are one of the most mature teams out there when it comes to building software with AI.
It’s no secret that AI is useful for development and improves productivity, and new products are being developed because of that, each with their own theories about the best way to use AI. These range from vibe coding tools like Loveable and Vibecodeapp, to cloud agents like Devin and OpenHands, to IDEs like Cursor and WindSurf, to terminal agents like Claude Code and Amp, to notebook interfaces like Solveit and Marimo. The reality (like always) is that these differing approaches are good for different things and in different situations, and developer preferences heavily influences tool a developer chooses to use.
SpecStory is a company that is looking to solve many problems that are arising because of the explosion of AI usage for coding, in a way that acknowledges the range of approaches and uses.
They’re building the tools and workflows that help us use it better. They’re focused on solving the problems related to intent. Their session was extremely practical and useful for today’s developers, so I wanted to share this dive into their core workflow.
Here’s what they shared:
What’s changed?
Jake started by discussing the shifts in software delivery. Features can often be built much quicker now than in the past thanks to advancements in AI. But stability, auditability, and collaboration at speed is a problem that’s magnified by this change. While aspects related to coding have sped up (thanks to many of the tools I listed above) they have not improve the planning, reviewing, and intent part of developing products to the same degree.
The Core Problem: AI Writes the Code, But Loses the Intent
The code is there, committed to Git, but the conversation, the trade-offs, the discarded ideas, and the subtle decisions that led to that specific implementation are gone. As Jake put it, when you don’t preserve the intent, you “run the risk of losing the why behind the code” [03:41].
This makes maintenance, collaboration, and future development incredibly difficult. SpecStory’s entire approach is built to solve this.
The Workflow: Plan, Implement, Reflect
To demonstrate their process, Jake built a simple static blog generator from scratch during the talk. The entire process follows a simple, powerful loop: Plan, Execute, and Reflect.
It’s a structured methodology for guiding an agent and, crucially, for creating a durable record of your development journey.
Step 1: The Plan - From Brainstorm to Tasks
Before writing a single line of code, the process starts with shaping the idea into a clear plan the AI can follow. This happens in three stages, all using Markdown files.
A. Brainstorming
It begins with a brainstorm.md file. This is a free-form brain dump. The goal is to get all the initial thoughts down without structure. Jake used a voice-to-text tool to talk through the initial idea: create a simple static blog generator that reads Markdown files and converts them to HTML [05:50].
There are many good Voice transcription tools. Eleanor frequently using Monologue, while I normlaly use macwhisper. Jake used WisprFlow. SuperWhisper is another good option.
B. Speccing with SpecFlow Templates
Next, that brainstorm is refined into a formal specification using templates from their open-source SpecFlow project. This is where you, the human, provide critical guardrails for the AI.
Instead of a vague prompt, you fill out a structured template that forces you to think about key product decisions upfront [08:10]:
Who is the app for? (e.g., technical writers)
What is the fidelity? (e.g., a prototype, not production-grade)
What is the form factor? (e.g., a command-line tool)
What are the key features? (e.g., read markdown, save to a build directory)
Any technology choices? (e.g., specify a library, or let the AI choose)
This step is a perfect example of human-in-the-loop control. You’re staying in control to prevent it from making incorrect assumptions and going down the wrong path [08:14].
C. Generating a Focused Task List
With a solid spec in hand, the next step is to ask the AI to break it down into an actionable tasks.md file. The prompt here is also specific, asking the agent to identify “just the right next chunk of work” that can be completed in a single session [14:26].
This avoids the “boil the ocean” problem and keeps the development process incremental and focused.
Step 2: The Implementation - Executing the Plan
Now, it’s time to code. The generated task list is fed directly to the AI agent (in this case, Anthropic’s Claude). The agent then works through the plan, creating the project structure, writing the Python script, and setting up sample files [17:25].
A key moment in the demo highlighted the power of synchronous, interactive agents. The agent initially tried to use pip3 directly, which conflicted with Jake’s virtualized Python setup. He was able to intervene mid-stream with a simple instruction: “For this project, use uv” [18:49].
This is a perfect blend of planned work and ad-hoc iteration is something that’s natural to human developers and essential for efficient AI collaboration.
In a few minutes, the core of the static site generator was complete and working.
Screenshot at [20:10]:
Step 3: The Reflection - The “Archaeology” of Your Code
This is where the SpecStory workflow truly shines. The implementation is done, but the work isn’t over. Now it’s time to capture the intent.
Saving the Conversation with specstory sync
First, Jake ran a simple command: specstory sync [21:50]. This tool connects to the local logs of your AI coding agent (Claude, Cursor, etc.) and saves the entire, unabridged conversation history into a clean Markdown file in your project directory.
This transcript is your raw material from your development session.
Extracting Decisions
Reading a raw, 500-line chat history isn’t practical. So, the next step is to perform an “extraction” on it. Using another SpecFlow template called a Decision Spec, Jake prompted the AI to analyze the chat history and pull out all the significant decisions that were made [26:59].
The prompt was simple: “review that [transcript] and then based on decision spec, let’s extract decisions that were made and write them to decisions.markdown” [27:42].
The result is a clean, scannable decisions.md file that summarizes the “why” of the session. It captured:
The planned decisions from the spec
The implicit decisions the AI made
The ad-hoc decisions the human made
It helps create a perfect summary for a pull request, a guide for a new team member, or a refresher for your future self. It shifts the focus of code review from the line-by-line what to the high-level why.
Putting the Decisions to Work
This decisions.md file isn’t just a static report; it’s a springboard for further work. Jake demonstrated this by taking an AI-made decision, the choice of the markdown library, and asking a different AI (ChatGPT) to analyze it [37:37].
This “second opinion” provided a nuanced assessment, explaining the pros and cons and suggesting alternatives like CommonMark. It’s a fantastic way to validate the agent’s choices and uncover your own knowledge gaps.
One attendee even suggested prompting an AI to analyze your chat history to reveal your skill level and identify areas for self-improvement [41:11]. It’s clear we’re just scratching the surface of what’s possible with these agent logs.
Key Takeaways
Intent is as Important as Code: The code is just one artifact. The conversation and decisions that produce it are equally valuable for long-term maintenance and collaboration.
Structure Your Prompts: Don’t “vibe code.” Use structured templates like those in SpecFlow to guide the AI, define constraints, and prevent it from making poor assumptions.
Plan, Implement, Reflect: Adopt this simple loop for your projects. Plan the work in Markdown, use the AI to implement it, and then use the AI again to reflect on the chat history and extract key decisions.
Intent Review: The
decisions.mdfile can be a centerpiece of a pull request. It allows for a much higher-level and more valuable discussion about the direction of the project.
The tools and templates that Jake and Greg demonstrated are free and open-source. I highly encourage you to check out the SpecFlow GitHub repository and start integrating this workflow into your own projects. And definitely keep an eye on SpecStory—they’re building some incredibly cool things to make this entire process even more seamless.












