The Ultimate Agentic Engineering Workflow: Building Features with Multi-Agent Systems
YouTube
This video presents an advanced look into a modern agentic engineering workflow, showcasing how to build complex features using autonomous AI agents. The creator details their current toolkit, which centers on GPT-5.5 extra high fast as the primary model, Cursor for the integrated development environment, Greptile for automated code reviews, and Wispr Flow for streamlined voice-to-text prompting. The demonstration focuses on building an artifacts feature for the Pluto app, an enterprise-focused AI agent platform. This feature allows the AI to generate and preview interactive web components like HTML, React, and SVG files directly within the chat interface.
The core of the process highlights the symbiotic relationship between human developers and AI sub-agents. The workflow involves delegating research and coding tasks to autonomous entities, followed by a rigorous, automated verification process using stacked pull requests. A key innovation discussed is the use of a recursive feedback loop where Greptile identifies bugs or security vulnerabilities, and Cursor automatically addresses these issues through multiple iterations until a perfect score is achieved. This approach significantly reduces manual debugging time while ensuring code quality through consistent architectural patterns and modular service layers.
This video provides a deep dive into high-level agentic engineering, demonstrating how to build complex interactive features by orchestrating multiple AI agents and specialized development tools. Viewers will learn a specific workflow using GPT-5.5, Cursor, Greptile, and Wispr Flow to create an interactive artifacts preview system within a live application. The content focuses on moving beyond simple text-based AI assistance toward a fully autonomous cycle of research, generation, automated review, and recursive debugging.
Key Takeaways
Primary Stack: Effective agentic engineering currently benefits from high-speed models like GPT-5.5 integrated into Cursor, combined with specialized review tools like Greptile.
The Feedback Loop: Recursive cycles where one AI agent generates code and another critiques it (using tools like Greploop) are essential for achieving production-ready code quality.
Small, Stacked PRs: Breaking down large feature updates into multiple small, manageable Pull Requests (under 1,000 lines) prevents AI models from hallucinating or overlooking critical errors.
Sub-Agent Delegation: Complex tasks should be offloaded to specialized sub-agents to keep the main chat thread unblocked and maintain clear focus.
Timestamps
00:00
The Evolved Agentic WorkflowIntroduction to the latest changes in the creator's agentic development process.
00:42
My Tech StackBreakdown of the primary tools: GPT-5.5, Cursor, Greptile, and Wispr Flow.
01:50
Project Overview: Pluto ArtifactsDefining the new feature being built: an interactive artifacts system like Claude's.
04:50
The Greploop MechanismExplanation of the automated recursive code review and feedback loop.
09:05
Service Layer ArchitectureHow to structure code so that AI agents can read and write it more effectively.
12:12
The 5-PR Rollout StrategyWhy breaking features into small, manageable chunks is critical for AI agents.
16:00
Debugging the Artifacts FeatureIterating through UI issues and dark mode themes using automated feedback.
21:10
Autonomous Review CycleDemonstrating agents fixing security bugs and logic defects without human intervention.
Target Audience
Software engineers, AI researchers, and tech entrepreneurs interested in leveraging autonomous AI agents to accelerate software development and improve code quality.
Use Cases
-Implementing automated code review pipelines using AI sub-agents.
-Designing complex interactive features for web applications using generative models.
-Setting up a recursive AI debugging loop to handle edge cases and security flaws.
-Architecting modular service layers to facilitate easier AI-driven code modification.
-Scaling development capacity by delegating research tasks to specialized AI agents.
Modular Architecture: Using a service layer architecture makes the codebase more predictable and readable for AI agents, leading to better generation results.
The Agentic Engineering Stack
Modern agentic engineering requires a robust set of tools that work together seamlessly. The video highlights Cursor as the primary IDE, chosen for its deep integration of LLMs directly into the file structure. While many use CLI-based tools, a GUI that manages context effectively provides a smoother experience for complex feature builds. For the underlying brain, the creator utilizes GPT-5.5 in its 'extra high fast' configuration, noting that intelligence and speed are the most critical factors for real-time collaborative coding.
A significant addition to this stack is Wispr Flow, which facilitates voice-to-text prompting. Because developers can speak faster than they can type, voice-driven instructions allow for more verbose and context-rich prompts, which typically results in more accurate initial code generation. This efficiency at the entry point of the workflow accelerates the entire development cycle.
Automated Code Review with Greploop
One of the most innovative concepts discussed is the integration of Greptile for code review. Instead of a human manually checking every line, the Greptile agent scans PRs for security flaws, logical defects, and edge cases. It provides a confidence score out of five; any score below five triggers a refinement phase.
The creation of 'Greploop'—a custom skill that automates the back-and-forth between the reviewer and the generator—represents a shift toward truly autonomous development. In this loop, the generator agent reads Greptile’s feedback, makes the necessary adjustments, pushes a new commit, and asks for another review. This process continues automatically until the code meets high-quality standards, ensuring that human intervention is only needed at a strategic level rather than for granular bug fixing.
Architectural Considerations for AI
To make a codebase 'agent-friendly,' developers must prioritize clear structure and modularity. The video introduces the concept of a Service Layer Architecture, which separates orchestration logic from reusable operational mechanics. This structure prevents 'God files' (massive, multi-functional files) that tend to confuse LLMs due to context window limitations and high complexity.
By ensuring each function has explicit parameters and returns structured outputs, agents can easily find where to insert new logic or update existing features. Furthermore, breaking down large features into 'stacked PRs'—where each PR builds on the previous one in small, verified increments—is shown to be the most reliable way to ship large-scale updates without breaking existing functionality.
Practical Applications
Developers can apply these lessons by integrating automated review bots into their GitHub or GitLab pipelines to provide immediate feedback on every commit. Additionally, software architects should consider refactoring legacy code into a service layer pattern to prepare their systems for future AI-driven maintenance. Implementing a sub-agent strategy for time-consuming tasks like research or unit testing can also significantly boost individual developer productivity, allowing them to act more as 'Product Engineers' who manage the vision while the agents handle the implementation details.
Frequently Asked Questions
Why use GPT-5.5 instead of Claude Opus for coding?
While Claude Opus is highly intelligent, the 'extra high fast' version of GPT-5.5 provides a superior balance of reasoning ability and response speed. In a collaborative workflow where the agent needs to iterate through dozens of files and reviews, the reduced latency significantly enhances the developer's experience and overall throughput.
What are the main challenges with large Pull Requests for AI agents?
Large PRs (exceeding 2,000 lines) often exceed the practical reasoning limits of current code review agents. This leads to agents overlooking subtle bugs or making contradictory suggestions. Keeping PRs under 1,000 lines, or ideally much smaller, ensures the review agent maintains high precision and can provide actionable feedback.
How does a service layer architecture improve AI generation?
Service layers compartmentalize logic into distinct, predictable modules. When an AI agent needs to modify a feature, it doesn't have to scan the entire application; it only needs to understand the specific service responsible for that task. This reduces noise in the context window and makes the agent's changes less likely to cause regressions in unrelated areas of the code.
25:30
Final Feature DemoTesting the completed artifacts system with a complex research task.
Autonomous Agentic Engineering WorkflowsAI-Powered Code Review and Feedback LoopsBuilding Multi-Agent Enterprise ApplicationsModular Software Architecture for AI GenerationOptimizing AI Prompting and Deployment