v0.2.0: The Rebuild — From WebSockets to SSE and a Cleaner Agent Core

Why Did Skyflo v0.2.0 Require a Rebuild?

Every project has a moment where you realize you're spending more time maintaining scaffolding than building the product.

For Skyflo, that moment became v0.2.0, a rebuild focused on one outcome:

Make the agent feel faster, simpler, and more reliable, even when it's doing hard things.

Key changes in v0.2.0:

Component	Before	After
Agent architecture	Multi-agent (planner/executor/verifier)	Single LangGraph workflow
Streaming transport	WebSockets	Server-Sent Events (SSE)
Tool server	Multiple endpoints	Single FastMCP server
Workflow state	In-memory only	Optional Postgres checkpointing

ComponentAgent architecture

BeforeMulti-agent (planner/executor/verifier)

AfterSingle LangGraph workflow

ComponentStreaming transport

BeforeWebSockets

AfterServer-Sent Events (SSE)

ComponentTool server

BeforeMultiple endpoints

AfterSingle FastMCP server

ComponentWorkflow state

BeforeIn-memory only

AfterOptional Postgres checkpointing

What Is the "More Agents" Trap in AI Systems?

Early agent architectures often split roles into separate agents:

Agent	Role
Planner	Decides what to do
Executor	Runs the tools
Verifier	Checks the results

AgentPlanner

RoleDecides what to do

AgentExecutor

RoleRuns the tools

AgentVerifier

RoleChecks the results

It reads well on paper. In practice, you pay for it:

Problem	Impact
More latency	Each agent adds network + LLM call time
Duplicated context	Same information passed between agents
Debugging complexity	"Which agent said what?"
State synchronization	Keeping agents aligned is hard

ProblemMore latency

ImpactEach agent adds network + LLM call time

ProblemDuplicated context

ImpactSame information passed between agents

ProblemDebugging complexity

Impact"Which agent said what?"

ProblemState synchronization

ImpactKeeping agents aligned is hard

v0.2.0's solution: Consolidate into a compact LangGraph workflow with explicit phases. The system became easier to reason about and easier to ship on.

Why Did Skyflo Switch from WebSockets to SSE?

WebSockets are great until you ship them into a thousand different Kubernetes setups.

WebSocket challenges in production:

Issue	Symptom
Connection lifecycle	Complex state management
Proxy compatibility	Many proxies don't handle WS well
Timeout handling	Random disconnects at 60 seconds
Load balancer stickiness	Session affinity required

IssueConnection lifecycle

SymptomComplex state management

IssueProxy compatibility

SymptomMany proxies don't handle WS well

IssueTimeout handling

SymptomRandom disconnects at 60 seconds

IssueLoad balancer stickiness

SymptomSession affinity required

SSE advantages:

Benefit	Why It Matters
Plain HTTP	Works with standard infrastructure
One-way events	Simpler than bidirectional
Auto-reconnect	Built into EventSource API
Proxy-friendly	No special proxy configuration

BenefitPlain HTTP

Why It MattersWorks with standard infrastructure

BenefitOne-way events

Why It MattersSimpler than bidirectional

BenefitAuto-reconnect

Why It MattersBuilt into EventSource API

BenefitProxy-friendly

Why It MattersNo special proxy configuration

SSE isn't glamorous. It just works, and "works" is a feature.

How Does Postgres Checkpointing Improve Workflow Resilience?

Long-running workflows fail in boring ways:

Failure Mode	What Happens
Pod restart	Workflow state lost mid-run
Network blip	Stream interruption
Page refresh	Browser loses connection
Service deploy	Engine restarts during operation

Failure ModePod restart

What HappensWorkflow state lost mid-run

Failure ModeNetwork blip

What HappensStream interruption

Failure ModePage refresh

What HappensBrowser loses connection

Failure ModeService deploy

What HappensEngine restarts during operation

Skyflo v0.2.0 added optional Postgres checkpointing:

code

Workflow State → Postgres → Resumable on reconnect

Benefits:

Feature	Value
Persistence	Workflow state survives restarts
Resumability	Continue from last checkpoint
Audit trail	Historical workflow states stored
Optional	Not required for local development

FeaturePersistence

ValueWorkflow state survives restarts

FeatureResumability

ValueContinue from last checkpoint

FeatureAudit trail

ValueHistorical workflow states stored

FeatureOptional

ValueNot required for local development

If you've ever lost a 5-minute diagnostic run right before the important output, you'll understand why this matters.

Why Consolidate Tools into a Single MCP Server?

Tools are the agent's hands. v0.2.0 streamlined MCP into a single FastMCP server.

Before (multiple endpoints):

Issue	Impact
Inconsistent discovery	Different tools, different patterns
Validation variations	Each endpoint validated differently
Policy enforcement	Hard to apply consistent rules

IssueInconsistent discovery

ImpactDifferent tools, different patterns

IssueValidation variations

ImpactEach endpoint validated differently

IssuePolicy enforcement

ImpactHard to apply consistent rules

After (single FastMCP server):

Benefit	Impact
Consistent discovery	All tools in one catalog
Uniform validation	Same validation logic everywhere
Central policy	Approval rules applied consistently
Foundation for growth	Easy to add Helm, Jenkins, Argo tools

BenefitConsistent discovery

ImpactAll tools in one catalog

BenefitUniform validation

ImpactSame validation logic everywhere

BenefitCentral policy

ImpactApproval rules applied consistently

BenefitFoundation for growth

ImpactEasy to add Helm, Jenkins, Argo tools

What Was the Real Win from the v0.2.0 Rebuild?

The headline changes were nice, but the best outcome was harder to tweet:

Skyflo became easier to operate.

Metric	Improvement
Components to monitor	Fewer
Failure modes	Reduced
Debugging time	Shorter
New developer onboarding	Faster

MetricComponents to monitor

ImprovementFewer

MetricFailure modes

ImprovementReduced

MetricDebugging time

ImprovementShorter

MetricNew developer onboarding

ImprovementFaster

Fewer moving parts means fewer 3am mysteries. And in DevOps, reducing mysteries is basically the product.

Related articles:

FAQ: Skyflo v0.2.0 Architecture Changes

Why did Skyflo switch from WebSockets to SSE? SSE is simpler, works with standard HTTP infrastructure, doesn't require special proxy configuration, and handles reconnection automatically. WebSockets added complexity without proportional benefits for one-way streaming.

What is the "more agents" trap? The pattern of splitting AI agent responsibilities across multiple specialized agents (planner, executor, verifier). While architecturally elegant, it increases latency, duplicates context, and makes debugging harder.

How does LangGraph improve agent architecture? LangGraph provides a graph-based workflow with explicit phases (entry, model, gate, final) that's easier to debug, test, and reason about than distributed multi-agent systems.

Is Postgres checkpointing required for Skyflo? No. Checkpointing is optional and primarily valuable for production deployments. Local development works fine without it.

What tools does the consolidated MCP server support? The FastMCP server supports Kubernetes (kubectl), Helm, Argo Rollouts, and Jenkins tools, all with consistent schemas, validation, and policy enforcement.