Blog

v0.2.0: The Rebuild — From WebSockets to SSE and a Cleaner Agent Core

What we learned rebuilding Skyflo’s core loop, and why “simpler” was the biggest performance unlock.

7 min read
releasearchitecturesse

Why Did Skyflo v0.2.0 Require a Rebuild?

Every project has a moment where you realize you're spending more time maintaining scaffolding than building the product.

For Skyflo, that moment became v0.2.0—a rebuild focused on one outcome:

Make the agent feel faster, simpler, and more reliable—even when it's doing hard things.

Key changes in v0.2.0:

ComponentAgent architecture
BeforeMulti-agent (planner/executor/verifier)
AfterSingle LangGraph workflow
ComponentStreaming transport
BeforeWebSockets
AfterServer-Sent Events (SSE)
ComponentTool server
BeforeMultiple endpoints
AfterSingle FastMCP server
ComponentWorkflow state
BeforeIn-memory only
AfterOptional Postgres checkpointing

What Is the "More Agents" Trap in AI Systems?

Early agent architectures often split roles into separate agents:

AgentPlanner
RoleDecides what to do
AgentExecutor
RoleRuns the tools
AgentVerifier
RoleChecks the results

It reads well on paper. In practice, you pay for it:

ProblemMore latency
ImpactEach agent adds network + LLM call time
ProblemDuplicated context
ImpactSame information passed between agents
ProblemDebugging complexity
Impact"Which agent said what?"
ProblemState synchronization
ImpactKeeping agents aligned is hard

v0.2.0's solution: Consolidate into a compact LangGraph workflow with explicit phases. The system became easier to reason about—and easier to ship on.


Why Did Skyflo Switch from WebSockets to SSE?

WebSockets are great until you ship them into a thousand different Kubernetes setups.

WebSocket challenges in production:

IssueConnection lifecycle
SymptomComplex state management
IssueProxy compatibility
SymptomMany proxies don't handle WS well
IssueTimeout handling
SymptomRandom disconnects at 60 seconds
IssueLoad balancer stickiness
SymptomSession affinity required

SSE advantages:

BenefitPlain HTTP
Why It MattersWorks with standard infrastructure
BenefitOne-way events
Why It MattersSimpler than bidirectional
BenefitAuto-reconnect
Why It MattersBuilt into EventSource API
BenefitProxy-friendly
Why It MattersNo special proxy configuration

SSE isn't glamorous. It just works, and "works" is a feature.


How Does Postgres Checkpointing Improve Workflow Resilience?

Long-running workflows fail in boring ways:

Failure ModePod restart
What HappensWorkflow state lost mid-run
Failure ModeNetwork blip
What HappensStream interruption
Failure ModePage refresh
What HappensBrowser loses connection
Failure ModeService deploy
What HappensEngine restarts during operation

Skyflo v0.2.0 added optional Postgres checkpointing:

code
Workflow State → Postgres → Resumable on reconnect

Benefits:

FeaturePersistence
ValueWorkflow state survives restarts
FeatureResumability
ValueContinue from last checkpoint
FeatureAudit trail
ValueHistorical workflow states stored
FeatureOptional
ValueNot required for local development

If you've ever lost a 5-minute diagnostic run right before the important output, you'll understand why this matters.


Why Consolidate Tools into a Single MCP Server?

Tools are the agent's hands. v0.2.0 streamlined MCP into a single FastMCP server.

Before (multiple endpoints):

IssueInconsistent discovery
ImpactDifferent tools, different patterns
IssueValidation variations
ImpactEach endpoint validated differently
IssuePolicy enforcement
ImpactHard to apply consistent rules

After (single FastMCP server):

BenefitConsistent discovery
ImpactAll tools in one catalog
BenefitUniform validation
ImpactSame validation logic everywhere
BenefitCentral policy
ImpactApproval rules applied consistently
BenefitFoundation for growth
ImpactEasy to add Helm, Jenkins, Argo tools

What Was the Real Win from the v0.2.0 Rebuild?

The headline changes were nice, but the best outcome was harder to tweet:

Skyflo became easier to operate.

MetricComponents to monitor
ImprovementFewer
MetricFailure modes
ImprovementReduced
MetricDebugging time
ImprovementShorter
MetricNew developer onboarding
ImprovementFaster

Fewer moving parts means fewer 3am mysteries. And in DevOps, reducing mysteries is basically the product.

Related articles:


FAQ: Skyflo v0.2.0 Architecture Changes

Why did Skyflo switch from WebSockets to SSE? SSE is simpler, works with standard HTTP infrastructure, doesn't require special proxy configuration, and handles reconnection automatically. WebSockets added complexity without proportional benefits for one-way streaming.

What is the "more agents" trap? The pattern of splitting AI agent responsibilities across multiple specialized agents (planner, executor, verifier). While architecturally elegant, it increases latency, duplicates context, and makes debugging harder.

How does LangGraph improve agent architecture? LangGraph provides a graph-based workflow with explicit phases (entry, model, gate, final) that's easier to debug, test, and reason about than distributed multi-agent systems.

Is Postgres checkpointing required for Skyflo? No. Checkpointing is optional and primarily valuable for production deployments. Local development works fine without it.

What tools does the consolidated MCP server support? The FastMCP server supports Kubernetes (kubectl), Helm, Argo Rollouts, and Jenkins tools—all with consistent schemas, validation, and policy enforcement.

Schedule a Demo

See Skyflo in Action

Book a personalized demo with our team. We'll show you how Skyflo can transform your DevOps workflows.