Why Do Most AI Agent Demos Fail in Production?
Most "agent" demos end right where real work begins: the messy part where the first attempt fails, tools time out, and the user says, "okay, but do it again—correctly."
Common production failure modes:
| Demo Behavior | Production Reality |
|---|---|
| Perfect first attempt | Tools timeout, retry needed |
| Simple single task | Multi-step with dependencies |
| Controlled environment | Network failures, auth expiry |
| No user interruption | "Stop, wrong cluster!" |
Skyflo's Engine exists for that reality. It's not a 14-node science project. It's a compact loop designed to do three things well:
- Plan with enough grounding to be safe
- Execute tools with policy enforcement
- Verify outcomes and decide what happens next
What Is Skyflo's LangGraph Workflow Architecture?
Skyflo's workflow is a LangGraph graph with four nodes that map cleanly to mental models:
entry → model → gate → final
↑ |
└─────────┘ (loop for multi-turn)| Node | Purpose | Key Responsibility |
|---|---|---|
entry | Setup | Prepare context, init streaming |
model | Think | LLM reasoning, tool proposals |
gate | Act | Policy enforcement, tool execution |
final | Summarize | End-state summary for operators |
entrymodelgatefinalWhy only four nodes?
We've tried the "many agents with many roles" approach. It reads nicely. It also tends to:
| Problem | Impact |
|---|---|
| Increased latency | Each agent = network + LLM call |
| Duplicated context | Same info passed between agents |
| Complex debugging | "Which agent said what?" |
Operators don't want elegant internals. They want a single timeline they can trust.
What Happens in the Entry Phase?
The entry phase does the boring setup that makes everything else predictable:
| Task | Purpose |
|---|---|
| Prepare message list | System prompt + history + new user prompt |
| Generate run ID | Unique identifier for this execution |
| Initialize streaming | UI can start rendering immediately |
| Set stop flags | Ready to honor cancellation requests |
| Apply rate limiting | Prevent runaway executions |
Key principle: If your agent doesn't reliably stop mid-stream, you're building a demo, not a tool.
How Does the Model Phase Work?
In the model phase, Skyflo calls an LLM provider (via LiteLLM). The model can:
| Action | Description |
|---|---|
| Interpret intent | Understand what the user wants |
| Propose a plan | Sequence of steps to achieve goal |
| Select tools | Decide which tool calls are needed |
| Ask questions | Request clarification when ambiguous |
Critical distinction: The model does not execute tools directly. It outputs structured tool call requests.
{
"tool_calls": [
{
"name": "kubectl_get_pods",
"arguments": {"namespace": "production"}
}
]
}This keeps "thinking" separate from "acting"—a big safety win.
What Is the Gate Phase and Why Is It Critical?
The gate phase is where tool calls actually execute via the MCP server. This is where policy lives.
Two rules govern the gate:
| Tool Type | Behavior |
|---|---|
Read-only (readOnlyHint: true) | Execute immediately, no approval |
| Write operations | Halt, require explicit user approval |
readOnlyHint: true)What happens at the gate:
| Step | Action |
|---|---|
| 1 | Receive tool call request from model |
| 2 | Check tool metadata for readOnlyHint |
| 3 | If read-only: execute via MCP server |
| 4 | If write: pause, request approval |
| 5 | On approval: execute, stream results |
| 6 | Return results to model for next turn |
readOnlyHintThis is where you want validation, logging, and consistent error handling—not scattered across prompts, UI, and client code.
What Does the Final Phase Produce?
Streaming is great, but it's also noisy. Operators need an end-state summary:
| Summary Element | Purpose |
|---|---|
| What happened | Sequence of operations performed |
| What changed | Resources modified (or not) |
| What to check next | Verification steps, related resources |
| Confidence level | Certainty about outcomes |
Good summaries are:
- Short and concrete
- Specific to what was attempted
- Honest about uncertainty
Bad summaries:
- Repeat the entire stream
- Pretend certainty where there isn't any
- Generic without context
How Does Auto-Continue Work Without Infinite Loops?
Skyflo is designed to keep moving until a task is done. But "auto-continue" can become "infinite loop" if you aren't careful.
Skyflo's conservative approach:
| Control | How It Works |
|---|---|
| Model decides | LLM determines if another turn needed |
| Max turns limit | Hard cap on iterations (configurable) |
| Stop signals | Honored immediately at any phase |
| Approval gates | Natural pause points for write operations |
This is the difference between a chat assistant and a system you can use during a real incident.
What Events Does Skyflo Stream?
Skyflo streams events over SSE from two endpoints:
POST /api/v1/agent/chat # Main interaction
POST /api/v1/agent/approvals/{id} # After approval grantedEvent types:
| Event | Purpose | Example |
|---|---|---|
token | Incremental text | LLM output character-by-character |
tool.executing | Tool start | "Running kubectl get pods..." |
tool.result | Tool complete | Pod list JSON |
tool.error | Tool failed | Error message, stack trace |
tools.pending | Awaiting approval | What needs to be approved |
workflow_complete | Done | Final summary |
workflow_error | Failed | Why workflow stopped |
tokentool.executingtool.resulttool.errortools.pendingworkflow_completeworkflow_errorKey insight: If you only stream text, the user is watching the narration, not the work. Tool events are the work.
Why Does Redis Handle Internal Event Coordination?
SSE streams are client-facing. Internally, Skyflo uses Redis pub/sub channels keyed by run ID:
| Feature | How Redis Helps |
|---|---|
| Stop signals | Any client can interrupt the run |
| Event consistency | All subscribers see same sequence |
| Multi-client support | Web UI, Slack, CLI all work |
| Decoupled state | Workflow continues if client disconnects |
Redis isn't glamorous, but it's practical for real-time coordination without baking state into HTTP connections.
When Should You Enable Postgres Checkpointing?
LangGraph supports Postgres-backed checkpointing. In Skyflo, it's optional.
| Environment | Checkpointing | Reason |
|---|---|---|
| Local dev | Optional | Simpler setup, faster iteration |
| Production | Recommended | Resilience, resumability |
| Long workflows | Required | Survives interruptions |
Checkpointing matters when:
| Scenario | Without Checkpointing | With Checkpointing |
|---|---|---|
| Pod restart | Workflow lost | Resume from last state |
| Network blip | Start over | Continue after reconnect |
| Page refresh | Lose context | Rejoin existing run |
Why Must Safety Live Outside the Model?
If you remember one thing from this article:
You can't prompt your way into safety.
Prompting helps. Policy enforcement wins.
| Scenario | Prompting Only | Policy Enforcement |
|---|---|---|
| Model hallucinates | Might execute dangerous command | Gate rejects invalid tool |
| User is ambiguous | Model guesses wrong | Approval forces clarification |
| Environment changed | Stale context causes errors | Validation catches mismatch |
By enforcing approvals and tool execution rules in the Engine gate, Skyflo remains safe even when:
- The model is wrong
- The user is ambiguous
- The environment behaves unexpectedly
That's the boring kind of robustness that makes software feel "enterprise-grade" even when it's open source.
Related articles:
- MCP in Practice: Standardizing DevOps Tools
- SSE Done Right: Streaming Tokens + Tool Events
- Why Human-in-the-Loop Is Non-Negotiable for AI in Production Ops
FAQ: LangGraph Workflows for AI Agents
What is LangGraph? LangGraph is a library for building stateful, multi-step AI agent workflows as graphs. Nodes represent phases (like planning or execution), and edges define transitions between phases.
Why use a graph-based workflow instead of a simple loop? Graphs make state transitions explicit, enable conditional branching (like approval gates), and are easier to debug because you can trace exactly which node produced which output.
What is the "gate" phase in an AI agent workflow? The gate is where policy enforcement happens. It checks if tool calls require approval, validates parameters, executes via the MCP server, and handles errors—all before results return to the model.
How do you prevent infinite loops in auto-continuing agents? Combine multiple controls: let the model decide if more turns are needed, enforce hard limits on iterations, and honor stop signals immediately. Approval gates also create natural pause points.
Why should tool execution be separate from LLM reasoning? Separating thinking (model proposes tool calls) from acting (gate executes them) enables policy enforcement between intent and action. The model can't execute anything directly—it can only request.
Is Postgres checkpointing required for Skyflo? No, it's optional. Use it in production for resilience and resumability. Skip it in local development for simplicity. The choice depends on your reliability requirements.