Inside Skyflo’s LangGraph Workflow: Plan → Execute → Verify (Without the Hype)

Why Do Most AI Agent Demos Fail in Production?

Most "agent" demos end right where real work begins: the messy part where the first attempt fails, tools time out, and the user says, "okay, but do it again, correctly."

Common production failure modes:

Demo Behavior	Production Reality
Perfect first attempt	Tools timeout, retry needed
Simple single task	Multi-step with dependencies
Controlled environment	Network failures, auth expiry
No user interruption	"Stop, wrong cluster!"

Demo BehaviorPerfect first attempt

Production RealityTools timeout, retry needed

Demo BehaviorSimple single task

Production RealityMulti-step with dependencies

Demo BehaviorControlled environment

Production RealityNetwork failures, auth expiry

Demo BehaviorNo user interruption

Production Reality"Stop, wrong cluster!"

Skyflo's Engine exists for that reality. It's not a 14-node science project. It's a compact loop designed to do three things well:

Plan with enough grounding to be safe
Execute tools with policy enforcement
Verify outcomes and decide what happens next

What Is Skyflo's LangGraph Workflow Architecture?

Skyflo's workflow is a LangGraph graph with four nodes that map cleanly to mental models:

code

entry → model → gate → final
        ↑         |
        └─────────┘  (loop for multi-turn)

Node	Purpose	Key Responsibility
`entry`	Setup	Prepare context, init streaming
`model`	Think	LLM reasoning, tool proposals
`gate`	Act	Policy enforcement, tool execution
`final`	Summarize	End-state summary for operators

Nodeentry

PurposeSetup

Key ResponsibilityPrepare context, init streaming

Nodemodel

PurposeThink

Key ResponsibilityLLM reasoning, tool proposals

Nodegate

PurposeAct

Key ResponsibilityPolicy enforcement, tool execution

Nodefinal

PurposeSummarize

Key ResponsibilityEnd-state summary for operators

Why only four nodes?

We've tried the "many agents with many roles" approach. It reads nicely. It also tends to:

Problem	Impact
Increased latency	Each agent = network + LLM call
Duplicated context	Same info passed between agents
Complex debugging	"Which agent said what?"

ProblemIncreased latency

ImpactEach agent = network + LLM call

ProblemDuplicated context

ImpactSame info passed between agents

ProblemComplex debugging

Impact"Which agent said what?"

Operators don't want elegant internals. They want a single timeline they can trust.

What Happens in the Entry Phase?

The entry phase does the boring setup that makes everything else predictable:

Task	Purpose
Prepare message list	System prompt + history + new user prompt
Generate run ID	Unique identifier for this execution
Initialize streaming	UI can start rendering immediately
Set stop flags	Ready to honor cancellation requests
Apply rate limiting	Prevent runaway executions

TaskPrepare message list

PurposeSystem prompt + history + new user prompt

TaskGenerate run ID

PurposeUnique identifier for this execution

TaskInitialize streaming

PurposeUI can start rendering immediately

TaskSet stop flags

PurposeReady to honor cancellation requests

TaskApply rate limiting

PurposePrevent runaway executions

Key principle: If your agent doesn't reliably stop mid-stream, you're building a demo, not a tool.

How Does the Model Phase Work?

In the model phase, Skyflo calls an LLM provider (via LiteLLM). The model can:

Action	Description
Interpret intent	Understand what the user wants
Propose a plan	Sequence of steps to achieve goal
Select tools	Decide which tool calls are needed
Ask questions	Request clarification when ambiguous

ActionInterpret intent

DescriptionUnderstand what the user wants

ActionPropose a plan

DescriptionSequence of steps to achieve goal

ActionSelect tools

DescriptionDecide which tool calls are needed

ActionAsk questions

DescriptionRequest clarification when ambiguous

Critical distinction: The model does not execute tools directly. It outputs structured tool call requests.

json

{
  "tool_calls": [
    {
      "name": "kubectl_get_pods",
      "arguments": {"namespace": "production"}
    }
  ]
}

This keeps "thinking" separate from "acting". It's a big safety win.

What Is the Gate Phase and Why Is It Critical?

The gate phase is where tool calls actually execute via the MCP server. This is where policy lives.

Two rules govern the gate:

Tool Type	Behavior
Read-only (`readOnlyHint: true`)	Execute immediately, no approval
Write operations	Halt, require explicit user approval

Tool TypeRead-only (readOnlyHint: true)

BehaviorExecute immediately, no approval

Tool TypeWrite operations

BehaviorHalt, require explicit user approval

What happens at the gate:

Step	Action
1	Receive tool call request from model
2	Check tool metadata for `readOnlyHint`
3	If read-only: execute via MCP server
4	If write: pause, request approval
5	On approval: execute, stream results
6	Return results to model for next turn

Step1

ActionReceive tool call request from model

Step2

ActionCheck tool metadata for readOnlyHint

Step3

ActionIf read-only: execute via MCP server

Step4

ActionIf write: pause, request approval

Step5

ActionOn approval: execute, stream results

Step6

ActionReturn results to model for next turn

This is where you want validation, logging, and consistent error handling, not scattered across prompts, UI, and client code.

What Does the Final Phase Produce?

Streaming is great, but it's also noisy. Operators need an end-state summary:

Summary Element	Purpose
What happened	Sequence of operations performed
What changed	Resources modified (or not)
What to check next	Verification steps, related resources
Confidence level	Certainty about outcomes

Summary ElementWhat happened

PurposeSequence of operations performed

Summary ElementWhat changed

PurposeResources modified (or not)

Summary ElementWhat to check next

PurposeVerification steps, related resources

Summary ElementConfidence level

PurposeCertainty about outcomes

Good summaries are:

Short and concrete
Specific to what was attempted
Honest about uncertainty

Bad summaries:

Repeat the entire stream
Pretend certainty where there isn't any
Generic without context

How Does Auto-Continue Work Without Infinite Loops?

Skyflo is designed to keep moving until a task is done. But "auto-continue" can become "infinite loop" if you aren't careful.

Skyflo's conservative approach:

Control	How It Works
Model decides	LLM determines if another turn needed
Max turns limit	Hard cap on iterations (configurable)
Stop signals	Honored immediately at any phase
Approval gates	Natural pause points for write operations

ControlModel decides

How It WorksLLM determines if another turn needed

ControlMax turns limit

How It WorksHard cap on iterations (configurable)

ControlStop signals

How It WorksHonored immediately at any phase

ControlApproval gates

How It WorksNatural pause points for write operations

This is the difference between a chat assistant and a system you can use during a real incident.

What Events Does Skyflo Stream?

Skyflo streams events over SSE from two endpoints:

bash

POST /api/v1/agent/chat           # Main interaction
POST /api/v1/agent/approvals/{id} # After approval granted

Event types:

Event	Purpose	Example
`token`	Incremental text	LLM output character-by-character
`tool.executing`	Tool start	"Running kubectl get pods..."
`tool.result`	Tool complete	Pod list JSON
`tool.error`	Tool failed	Error message, stack trace
`tools.pending`	Awaiting approval	What needs to be approved
`workflow_complete`	Done	Final summary
`workflow_error`	Failed	Why workflow stopped

Eventtoken

PurposeIncremental text

ExampleLLM output character-by-character

Eventtool.executing

PurposeTool start

Example"Running kubectl get pods..."

Eventtool.result

PurposeTool complete

ExamplePod list JSON

Eventtool.error

PurposeTool failed

ExampleError message, stack trace

Eventtools.pending

PurposeAwaiting approval

ExampleWhat needs to be approved

Eventworkflow_complete

PurposeDone

ExampleFinal summary

Eventworkflow_error

PurposeFailed

ExampleWhy workflow stopped

Key insight: If you only stream text, the user is watching the narration, not the work. Tool events are the work.

Why Does Redis Handle Internal Event Coordination?

SSE streams are client-facing. Internally, Skyflo uses Redis pub/sub channels keyed by run ID:

Feature	How Redis Helps
Stop signals	Any client can interrupt the run
Event consistency	All subscribers see same sequence
Multi-client support	Web UI, Slack, CLI all work
Decoupled state	Workflow continues if client disconnects

FeatureStop signals

How Redis HelpsAny client can interrupt the run

FeatureEvent consistency

How Redis HelpsAll subscribers see same sequence

FeatureMulti-client support

How Redis HelpsWeb UI, Slack, CLI all work

FeatureDecoupled state

How Redis HelpsWorkflow continues if client disconnects

Redis isn't glamorous, but it's practical for real-time coordination without baking state into HTTP connections.

When Should You Enable Postgres Checkpointing?

LangGraph supports Postgres-backed checkpointing. In Skyflo, it's optional.

Environment	Checkpointing	Reason
Local dev	Optional	Simpler setup, faster iteration
Production	Recommended	Resilience, resumability
Long workflows	Required	Survives interruptions

EnvironmentLocal dev

CheckpointingOptional

ReasonSimpler setup, faster iteration

EnvironmentProduction

CheckpointingRecommended

ReasonResilience, resumability

EnvironmentLong workflows

CheckpointingRequired

ReasonSurvives interruptions

Checkpointing matters when:

Scenario	Without Checkpointing	With Checkpointing
Pod restart	Workflow lost	Resume from last state
Network blip	Start over	Continue after reconnect
Page refresh	Lose context	Rejoin existing run

ScenarioPod restart

Without CheckpointingWorkflow lost

With CheckpointingResume from last state

ScenarioNetwork blip

Without CheckpointingStart over

With CheckpointingContinue after reconnect

ScenarioPage refresh

Without CheckpointingLose context

With CheckpointingRejoin existing run

Why Must Safety Live Outside the Model?

If you remember one thing from this article:

You can't prompt your way into safety.

Prompting helps. Policy enforcement wins.

Scenario	Prompting Only	Policy Enforcement
Model hallucinates	Might execute dangerous command	Gate rejects invalid tool
User is ambiguous	Model guesses wrong	Approval forces clarification
Environment changed	Stale context causes errors	Validation catches mismatch

ScenarioModel hallucinates

Prompting OnlyMight execute dangerous command

Policy EnforcementGate rejects invalid tool

ScenarioUser is ambiguous

Prompting OnlyModel guesses wrong

Policy EnforcementApproval forces clarification

ScenarioEnvironment changed

Prompting OnlyStale context causes errors

Policy EnforcementValidation catches mismatch

By enforcing approvals and tool execution rules in the Engine gate, Skyflo remains safe even when:

The model is wrong
The user is ambiguous
The environment behaves unexpectedly

That's the boring kind of robustness that makes software feel "enterprise-grade" even when it's open source.

Related articles:

FAQ: LangGraph Workflows for AI Agents

What is LangGraph? LangGraph is a library for building stateful, multi-step AI agent workflows as graphs. Nodes represent phases (like planning or execution), and edges define transitions between phases.

Why use a graph-based workflow instead of a simple loop? Graphs make state transitions explicit, enable conditional branching (like approval gates), and are easier to debug because you can trace exactly which node produced which output.

What is the "gate" phase in an AI agent workflow? The gate is where policy enforcement happens. It checks if tool calls require approval, validates parameters, executes via the MCP server, and handles errors, all before results return to the model.

How do you prevent infinite loops in auto-continuing agents? Combine multiple controls: let the model decide if more turns are needed, enforce hard limits on iterations, and honor stop signals immediately. Approval gates also create natural pause points.

Why should tool execution be separate from LLM reasoning? Separating thinking (model proposes tool calls) from acting (gate executes them) enables policy enforcement between intent and action. The model can't execute anything directly; it can only request.

Is Postgres checkpointing required for Skyflo? No, it's optional. Use it in production for resilience and resumability. Skip it in local development for simplicity. The choice depends on your reliability requirements.