Why Does Real-Time Streaming Change User Trust in AI Agents?
People underestimate how much "real-time" changes trust.
| Agent Behavior | User Perception |
|---|---|
| 90-second wait, then full response | "I tolerate it" |
| Streaming what it's doing live | "It's like a teammate" |
If an agent takes 90 seconds and then prints a single final paragraph, you don't trust it—you tolerate it. If the agent streams what it's doing, you start treating it like a collaborator.
Skyflo streams over Server-Sent Events (SSE). It's not trendy. It's just the right tool for the job.
Why Choose SSE Over WebSockets for AI Agent Streaming?
WebSockets are powerful, but they introduce complexity:
| Aspect | WebSockets | SSE |
|---|---|---|
| Connection state | Stateful, complex lifecycle | Stateless HTTP |
| Proxy support | Often problematic | Standard HTTP works |
| Failure modes | Many partial failure states | Simpler error handling |
| Reconnection | Manual implementation | Built into EventSource |
| Direction | Bidirectional | Server → Client (sufficient for streaming) |
SSE wins in self-hosted environments (Kubernetes clusters with wildly different ingress setups) because boring compatibility beats powerful features that break in production.
What Events Should AI Agents Stream Beyond Tokens?
Streaming only text is a common early mistake. It looks cool, but it hides the actual work.
Skyflo streams four event types:
| Event Type | Purpose | Example |
|---|---|---|
token | LLM output narration | "I'll check the pod status..." |
workflow | State transitions | executing, awaiting_approval |
tool | Real work execution | Tool name, arguments, results |
approval | Safety boundaries | Pending approval details |
tokenworkflowexecuting, awaiting_approvaltoolapprovalThis makes the UI feel deterministic even when the model is probabilistic.
What SSE Endpoints Does Skyflo's Engine Expose?
Skyflo's Engine exposes SSE on two primary endpoints:
# Main chat interaction
POST /api/v1/agent/chat
# Approval flow continuation
POST /api/v1/agent/approvals/{call_id}Why approvals continue as a stream: Operators want to see exactly what happened after they clicked "approve"—the tool execution, any errors, and the verification results.
Why Are Heartbeats Critical for SSE Connections?
SSE connections can die due to proxy idle timeouts. Heartbeats solve two problems:
| Problem | How Heartbeats Help |
|---|---|
| Client uncertainty | Confirms the run is still alive |
| Proxy timeouts | Prevents "idle" connection termination |
Important: Heartbeats alone won't save you if your proxy buffers events or times out aggressively. You need proper proxy configuration.
How Do You Configure NGINX for Long-Running SSE Streams?
If you proxy SSE through NGINX, the default configuration often ruins long-running streams.
Required NGINX directives for SSE:
# Prevent 60-second timeout (default)
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
# Disable buffering for real-time delivery
proxy_buffering off;
# Enable streaming
chunked_transfer_encoding on;Symptoms without proper configuration:
| Symptom | Cause |
|---|---|
| Streams cut off mid-tool | proxy_read_timeout too short |
| Events delivered in bursts | proxy_buffering on (default) |
| 499/504 errors | Timeout during long operations |
| Delayed tool results | Buffering accumulates events |
proxy_read_timeout too shortproxy_buffering on (default)How Does Redis Pub/Sub Improve SSE Reliability?
Connections drop. Browsers refresh. Wi‑Fi dies.
The problem: If workflow state is tied to the SSE connection, you lose everything on disconnect.
Skyflo's solution: Redis pub/sub keyed by run ID:
Workflow → Redis pub/sub (run_id) → SSE stream(s)
↑
Source of truthBenefits of Redis-backed streaming:
| Feature | Benefit |
|---|---|
| Stop signals | Can interrupt from any client |
| Consistent events | All clients see same sequence |
| Multi-client support | Slack bridge, CLI, web UI |
| Decoupled state | Workflow continues if client disconnects |
Key principle: Treat the SSE stream as a view, not the source of truth.
Why Is "Stop" the Most Important AI Agent Feature?
In ops, the most important button isn't "send."
It's "stop."
| Scenario | Why Stop Matters |
|---|---|
| Wrong operation | Cancel before damage |
| Runaway execution | End expensive LLM calls |
| Changed context | New information invalidates plan |
| User error | Mistyped prompt, wrong intent |
Skyflo's stop implementation:
- Honored mid-stream at any point
- Propagates through Redis pub/sub
- Cleans up pending tool executions
- Returns control immediately
If you're building an agent, implement stop early. Everything else is a nice demo until you do.
Related articles:
- v0.2.0: The Rebuild — From WebSockets to SSE
- Inside Skyflo's LangGraph Workflow: Plan → Execute → Verify
- Real-Time Token Metrics: TTFT, TTR, Cached Tokens, and Cost
FAQ: SSE Streaming for AI Agents
What is Server-Sent Events (SSE)? SSE is a standard HTTP-based technology for servers to push updates to clients. Unlike WebSockets, it's one-directional (server to client) and works with standard HTTP infrastructure without special proxy configuration.
Why does SSE work better than WebSockets for AI agents? SSE is simpler, works with standard proxies and load balancers, handles reconnection automatically via the EventSource API, and doesn't require stateful connection management.
What NGINX configuration is required for SSE? Set proxy_read_timeout and proxy_send_timeout to at least 3600s, disable proxy_buffering, and enable chunked_transfer_encoding. Without these, streams will timeout or events will be buffered.
How do heartbeats work in SSE? The server periodically sends a small "heartbeat" event to keep the connection alive and prove the workflow is still running. This prevents proxy idle timeouts and reassures clients.
Why use Redis pub/sub with SSE? Redis decouples workflow state from the HTTP connection. This enables stop signals from any client, consistent event delivery across multiple clients, and workflow continuation even if the browser disconnects.
How do you implement stop functionality for AI agents? Publish a stop signal to the Redis channel keyed by run ID. The workflow checks for stop signals at each phase boundary and terminates gracefully, returning control to the user.