SSE Done Right: Streaming Tokens + Tool Events Without Melting Your Proxy

Why Does Real-Time Streaming Change User Trust in AI Agents?

People underestimate how much "real-time" changes trust.

Agent Behavior	User Perception
90-second wait, then full response	"I tolerate it"
Streaming what it's doing live	"It's like a teammate"

Agent Behavior90-second wait, then full response

User Perception"I tolerate it"

Agent BehaviorStreaming what it's doing live

User Perception"It's like a teammate"

If an agent takes 90 seconds and then prints a single final paragraph, you don't trust it. You tolerate it. If the agent streams what it's doing, you start treating it like a collaborator.

Skyflo streams over Server-Sent Events (SSE). It's not trendy. It's just the right tool for the job.

Why Choose SSE Over WebSockets for AI Agent Streaming?

WebSockets are powerful, but they introduce complexity:

Aspect	WebSockets	SSE
Connection state	Stateful, complex lifecycle	Stateless HTTP
Proxy support	Often problematic	Standard HTTP works
Failure modes	Many partial failure states	Simpler error handling
Reconnection	Manual implementation	Built into EventSource
Direction	Bidirectional	Server → Client (sufficient for streaming)

AspectConnection state

WebSocketsStateful, complex lifecycle

SSEStateless HTTP

AspectProxy support

WebSocketsOften problematic

SSEStandard HTTP works

AspectFailure modes

WebSocketsMany partial failure states

SSESimpler error handling

AspectReconnection

WebSocketsManual implementation

SSEBuilt into EventSource

AspectDirection

WebSocketsBidirectional

SSEServer → Client (sufficient for streaming)

SSE wins in self-hosted environments (Kubernetes clusters with wildly different ingress setups) because boring compatibility beats powerful features that break in production.

What Events Should AI Agents Stream Beyond Tokens?

Streaming only text is a common early mistake. It looks cool, but it hides the actual work.

Skyflo streams four event types:

Event Type	Purpose	Example
`token`	LLM output narration	"I'll check the pod status..."
`workflow`	State transitions	`executing`, `awaiting_approval`
`tool`	Real work execution	Tool name, arguments, results
`approval`	Safety boundaries	Pending approval details

Event Typetoken

PurposeLLM output narration

Example"I'll check the pod status..."

Event Typeworkflow

PurposeState transitions

Exampleexecuting, awaiting_approval

Event Typetool

PurposeReal work execution

ExampleTool name, arguments, results

Event Typeapproval

PurposeSafety boundaries

ExamplePending approval details

This makes the UI feel deterministic even when the model is probabilistic.

What SSE Endpoints Does Skyflo's Engine Expose?

Skyflo's Engine exposes SSE on two primary endpoints:

bash

# Main chat interaction
POST /api/v1/agent/chat

# Approval flow continuation
POST /api/v1/agent/approvals/{call_id}

Why approvals continue as a stream: Operators want to see exactly what happened after they clicked "approve": the tool execution, any errors, and the verification results.

Why Are Heartbeats Critical for SSE Connections?

SSE connections can die due to proxy idle timeouts. Heartbeats solve two problems:

Problem	How Heartbeats Help
Client uncertainty	Confirms the run is still alive
Proxy timeouts	Prevents "idle" connection termination

ProblemClient uncertainty

How Heartbeats HelpConfirms the run is still alive

ProblemProxy timeouts

How Heartbeats HelpPrevents "idle" connection termination

Important: Heartbeats alone won't save you if your proxy buffers events or times out aggressively. You need proper proxy configuration.

How Do You Configure NGINX for Long-Running SSE Streams?

If you proxy SSE through NGINX, the default configuration often ruins long-running streams.

Required NGINX directives for SSE:

nginx

# Prevent 60-second timeout (default)
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;

# Disable buffering for real-time delivery
proxy_buffering off;

# Enable streaming
chunked_transfer_encoding on;

Symptoms without proper configuration:

Symptom	Cause
Streams cut off mid-tool	`proxy_read_timeout` too short
Events delivered in bursts	`proxy_buffering on` (default)
499/504 errors	Timeout during long operations
Delayed tool results	Buffering accumulates events

SymptomStreams cut off mid-tool

Causeproxy_read_timeout too short

SymptomEvents delivered in bursts

Causeproxy_buffering on (default)

Symptom499/504 errors

CauseTimeout during long operations

SymptomDelayed tool results

CauseBuffering accumulates events

How Does Redis Pub/Sub Improve SSE Reliability?

Connections drop. Browsers refresh. Wi‑Fi dies.

The problem: If workflow state is tied to the SSE connection, you lose everything on disconnect.

Skyflo's solution: Redis pub/sub keyed by run ID:

code

Workflow → Redis pub/sub (run_id) → SSE stream(s)
                    ↑
             Source of truth

Benefits of Redis-backed streaming:

Feature	Benefit
Stop signals	Can interrupt from any client
Consistent events	All clients see same sequence
Multi-client support	Slack bridge, CLI, web UI
Decoupled state	Workflow continues if client disconnects

FeatureStop signals

BenefitCan interrupt from any client

FeatureConsistent events

BenefitAll clients see same sequence

FeatureMulti-client support

BenefitSlack bridge, CLI, web UI

FeatureDecoupled state

BenefitWorkflow continues if client disconnects

Key principle: Treat the SSE stream as a view, not the source of truth.

Why Is "Stop" the Most Important AI Agent Feature?

In ops, the most important button isn't "send."

It's "stop."

Scenario	Why Stop Matters
Wrong operation	Cancel before damage
Runaway execution	End expensive LLM calls
Changed context	New information invalidates plan
User error	Mistyped prompt, wrong intent

ScenarioWrong operation

Why Stop MattersCancel before damage

ScenarioRunaway execution

Why Stop MattersEnd expensive LLM calls

ScenarioChanged context

Why Stop MattersNew information invalidates plan

ScenarioUser error

Why Stop MattersMistyped prompt, wrong intent

Skyflo's stop implementation:

Honored mid-stream at any point
Propagates through Redis pub/sub
Cleans up pending tool executions
Returns control immediately

If you're building an agent, implement stop early. Everything else is a nice demo until you do.

Related articles:

FAQ: SSE Streaming for AI Agents

What is Server-Sent Events (SSE)? SSE is a standard HTTP-based technology for servers to push updates to clients. Unlike WebSockets, it's one-directional (server to client) and works with standard HTTP infrastructure without special proxy configuration.

Why does SSE work better than WebSockets for AI agents? SSE is simpler, works with standard proxies and load balancers, handles reconnection automatically via the EventSource API, and doesn't require stateful connection management.

What NGINX configuration is required for SSE? Set proxy_read_timeout and proxy_send_timeout to at least 3600s, disable proxy_buffering, and enable chunked_transfer_encoding. Without these, streams will timeout or events will be buffered.

How do heartbeats work in SSE? The server periodically sends a small "heartbeat" event to keep the connection alive and prove the workflow is still running. This prevents proxy idle timeouts and reassures clients.

Why use Redis pub/sub with SSE? Redis decouples workflow state from the HTTP connection. This enables stop signals from any client, consistent event delivery across multiple clients, and workflow continuation even if the browser disconnects.

How do you implement stop functionality for AI agents? Publish a stop signal to the Redis channel keyed by run ID. The workflow checks for stop signals at each phase boundary and terminates gracefully, returning control to the user.