Blog

Inside Skyflo’s LangGraph Workflow: Plan → Execute → Verify (Without the Hype)

How Skyflo compiles a compact graph, streams progress, and decides when to continue, stop, or request approval.

10 min read
architecturelanggraphenginestreaming

Why Do Most AI Agent Demos Fail in Production?

Most "agent" demos end right where real work begins: the messy part where the first attempt fails, tools time out, and the user says, "okay, but do it again—correctly."

Common production failure modes:

Demo BehaviorPerfect first attempt
Production RealityTools timeout, retry needed
Demo BehaviorSimple single task
Production RealityMulti-step with dependencies
Demo BehaviorControlled environment
Production RealityNetwork failures, auth expiry
Demo BehaviorNo user interruption
Production Reality"Stop, wrong cluster!"

Skyflo's Engine exists for that reality. It's not a 14-node science project. It's a compact loop designed to do three things well:

  1. Plan with enough grounding to be safe
  2. Execute tools with policy enforcement
  3. Verify outcomes and decide what happens next

What Is Skyflo's LangGraph Workflow Architecture?

Skyflo's workflow is a LangGraph graph with four nodes that map cleanly to mental models:

code
entry → model → gate → final
        ↑         |
        └─────────┘  (loop for multi-turn)
Nodeentry
PurposeSetup
Key ResponsibilityPrepare context, init streaming
Nodemodel
PurposeThink
Key ResponsibilityLLM reasoning, tool proposals
Nodegate
PurposeAct
Key ResponsibilityPolicy enforcement, tool execution
Nodefinal
PurposeSummarize
Key ResponsibilityEnd-state summary for operators

Why only four nodes?

We've tried the "many agents with many roles" approach. It reads nicely. It also tends to:

ProblemIncreased latency
ImpactEach agent = network + LLM call
ProblemDuplicated context
ImpactSame info passed between agents
ProblemComplex debugging
Impact"Which agent said what?"

Operators don't want elegant internals. They want a single timeline they can trust.


What Happens in the Entry Phase?

The entry phase does the boring setup that makes everything else predictable:

TaskPrepare message list
PurposeSystem prompt + history + new user prompt
TaskGenerate run ID
PurposeUnique identifier for this execution
TaskInitialize streaming
PurposeUI can start rendering immediately
TaskSet stop flags
PurposeReady to honor cancellation requests
TaskApply rate limiting
PurposePrevent runaway executions

Key principle: If your agent doesn't reliably stop mid-stream, you're building a demo, not a tool.


How Does the Model Phase Work?

In the model phase, Skyflo calls an LLM provider (via LiteLLM). The model can:

ActionInterpret intent
DescriptionUnderstand what the user wants
ActionPropose a plan
DescriptionSequence of steps to achieve goal
ActionSelect tools
DescriptionDecide which tool calls are needed
ActionAsk questions
DescriptionRequest clarification when ambiguous

Critical distinction: The model does not execute tools directly. It outputs structured tool call requests.

json
{
  "tool_calls": [
    {
      "name": "kubectl_get_pods",
      "arguments": {"namespace": "production"}
    }
  ]
}

This keeps "thinking" separate from "acting"—a big safety win.


What Is the Gate Phase and Why Is It Critical?

The gate phase is where tool calls actually execute via the MCP server. This is where policy lives.

Two rules govern the gate:

Tool TypeRead-only (readOnlyHint: true)
BehaviorExecute immediately, no approval
Tool TypeWrite operations
BehaviorHalt, require explicit user approval

What happens at the gate:

Step1
ActionReceive tool call request from model
Step2
ActionCheck tool metadata for readOnlyHint
Step3
ActionIf read-only: execute via MCP server
Step4
ActionIf write: pause, request approval
Step5
ActionOn approval: execute, stream results
Step6
ActionReturn results to model for next turn

This is where you want validation, logging, and consistent error handling—not scattered across prompts, UI, and client code.


What Does the Final Phase Produce?

Streaming is great, but it's also noisy. Operators need an end-state summary:

Summary ElementWhat happened
PurposeSequence of operations performed
Summary ElementWhat changed
PurposeResources modified (or not)
Summary ElementWhat to check next
PurposeVerification steps, related resources
Summary ElementConfidence level
PurposeCertainty about outcomes

Good summaries are:

  • Short and concrete
  • Specific to what was attempted
  • Honest about uncertainty

Bad summaries:

  • Repeat the entire stream
  • Pretend certainty where there isn't any
  • Generic without context

How Does Auto-Continue Work Without Infinite Loops?

Skyflo is designed to keep moving until a task is done. But "auto-continue" can become "infinite loop" if you aren't careful.

Skyflo's conservative approach:

ControlModel decides
How It WorksLLM determines if another turn needed
ControlMax turns limit
How It WorksHard cap on iterations (configurable)
ControlStop signals
How It WorksHonored immediately at any phase
ControlApproval gates
How It WorksNatural pause points for write operations

This is the difference between a chat assistant and a system you can use during a real incident.


What Events Does Skyflo Stream?

Skyflo streams events over SSE from two endpoints:

bash
POST /api/v1/agent/chat           # Main interaction
POST /api/v1/agent/approvals/{id} # After approval granted

Event types:

Eventtoken
PurposeIncremental text
ExampleLLM output character-by-character
Eventtool.executing
PurposeTool start
Example"Running kubectl get pods..."
Eventtool.result
PurposeTool complete
ExamplePod list JSON
Eventtool.error
PurposeTool failed
ExampleError message, stack trace
Eventtools.pending
PurposeAwaiting approval
ExampleWhat needs to be approved
Eventworkflow_complete
PurposeDone
ExampleFinal summary
Eventworkflow_error
PurposeFailed
ExampleWhy workflow stopped

Key insight: If you only stream text, the user is watching the narration, not the work. Tool events are the work.


Why Does Redis Handle Internal Event Coordination?

SSE streams are client-facing. Internally, Skyflo uses Redis pub/sub channels keyed by run ID:

FeatureStop signals
How Redis HelpsAny client can interrupt the run
FeatureEvent consistency
How Redis HelpsAll subscribers see same sequence
FeatureMulti-client support
How Redis HelpsWeb UI, Slack, CLI all work
FeatureDecoupled state
How Redis HelpsWorkflow continues if client disconnects

Redis isn't glamorous, but it's practical for real-time coordination without baking state into HTTP connections.


When Should You Enable Postgres Checkpointing?

LangGraph supports Postgres-backed checkpointing. In Skyflo, it's optional.

EnvironmentLocal dev
CheckpointingOptional
ReasonSimpler setup, faster iteration
EnvironmentProduction
CheckpointingRecommended
ReasonResilience, resumability
EnvironmentLong workflows
CheckpointingRequired
ReasonSurvives interruptions

Checkpointing matters when:

ScenarioPod restart
Without CheckpointingWorkflow lost
With CheckpointingResume from last state
ScenarioNetwork blip
Without CheckpointingStart over
With CheckpointingContinue after reconnect
ScenarioPage refresh
Without CheckpointingLose context
With CheckpointingRejoin existing run

Why Must Safety Live Outside the Model?

If you remember one thing from this article:

You can't prompt your way into safety.

Prompting helps. Policy enforcement wins.

ScenarioModel hallucinates
Prompting OnlyMight execute dangerous command
Policy EnforcementGate rejects invalid tool
ScenarioUser is ambiguous
Prompting OnlyModel guesses wrong
Policy EnforcementApproval forces clarification
ScenarioEnvironment changed
Prompting OnlyStale context causes errors
Policy EnforcementValidation catches mismatch

By enforcing approvals and tool execution rules in the Engine gate, Skyflo remains safe even when:

  • The model is wrong
  • The user is ambiguous
  • The environment behaves unexpectedly

That's the boring kind of robustness that makes software feel "enterprise-grade" even when it's open source.

Related articles:


FAQ: LangGraph Workflows for AI Agents

What is LangGraph? LangGraph is a library for building stateful, multi-step AI agent workflows as graphs. Nodes represent phases (like planning or execution), and edges define transitions between phases.

Why use a graph-based workflow instead of a simple loop? Graphs make state transitions explicit, enable conditional branching (like approval gates), and are easier to debug because you can trace exactly which node produced which output.

What is the "gate" phase in an AI agent workflow? The gate is where policy enforcement happens. It checks if tool calls require approval, validates parameters, executes via the MCP server, and handles errors—all before results return to the model.

How do you prevent infinite loops in auto-continuing agents? Combine multiple controls: let the model decide if more turns are needed, enforce hard limits on iterations, and honor stop signals immediately. Approval gates also create natural pause points.

Why should tool execution be separate from LLM reasoning? Separating thinking (model proposes tool calls) from acting (gate executes them) enables policy enforcement between intent and action. The model can't execute anything directly—it can only request.

Is Postgres checkpointing required for Skyflo? No, it's optional. Use it in production for resilience and resumability. Skip it in local development for simplicity. The choice depends on your reliability requirements.

Schedule a Demo

See Skyflo in Action

Book a personalized demo with our team. We'll show you how Skyflo can transform your DevOps workflows.