Blog

Deep dives on Skyflo’s architecture, safety model, and releases — written for DevOps and SRE teams shipping real systems.

2025-12-149 min

Token + Latency Analytics: Building a Dashboard That Engineers Actually Use

Turning TTFT/TTR and cost into trends, budgets, and actionable insights across your conversations.

roadmapanalyticsmetricsui
Read article
2025-12-0712 min

Slack as an Ops Console: Bringing Human‑in‑the‑Loop to Where Work Happens

A single-tenant Slack bridge plan: streamed updates, approvals in-thread, and guardrails that don’t feel heavy.

slackintegrationsroadmapsafety
Read article
2025-11-3010 min

Auto‑Summarization for Long Conversations: Keep Context, Cut the Tax

A design for summarizing older turns when you approach context limits—without losing the details operators care about.

roadmapenginecontext
Read article
2025-11-2313 min

Programmatic Tool Calling: When an LLM Should Write Glue Code

Loops, batching, parallelism, and summarization—where code beats prompts, and how to sandbox it safely.

roadmaporchestrationsecurityai-agents
Read article
2025-11-1612 min

The Case for Tool Search: Shrinking Context Without Losing Capability

A roadmap post: defer tool schemas until needed, reduce token bloat, and keep the agent accurate under pressure.

roadmapcontextmcpai-agents
Read article
2025-11-098 min

Kubernetes Metrics for AI Agents: `kubectl top` Tools and What They Unlock

Adding read-only metrics tools so an agent can answer the question everyone asks first: “what’s hot right now?”

kubernetesmetricsmcp
Read article
2025-11-027 min

Helm Template as a Safety Primitive: Preview Before You Touch the Cluster

Rendering manifests with inline values, catching surprises early, and building a diff-first culture.

helmkubernetessafety
Read article
2025-10-268 min

Kubernetes Rollbacks with Confidence: Rollout History + Undo as First‑Class Tools

Shipping safe rollback primitives for deployments/daemonsets/statefulsets—and where approvals belong.

kubernetesmcpsafety
Read article
2025-10-199 min

Designing a Terminal‑Inspired UI That’s Actually Accessible

Focus, live regions, contrast, and keyboard navigation—what we changed to make a command-center UI work for everyone.

accessibilityuidesign
Read article
2025-10-1211 min

Real‑Time Token Metrics: TTFT, TTR, Cached Tokens, and Cost (Trust Builders)

Operators don’t trust black boxes. Here’s how we expose LLM latency and usage without spamming the UI.

observabilitymetricsuiengine
Read article
2025-10-0510 min

FastMCP Streamable HTTP: Migrating Off Legacy SSE Transport

Why we moved, what broke, and how Streamable HTTP made MCP communication simpler and more reliable.

mcpreliabilityhttparchitecture
Read article
2025-09-289 min

v0.3.2: Batch Approvals Without Losing Safety (Approve All, Safely)

Designing bulk approval controls that respect read-only tools, remain idempotent, and keep the operator in control.

releaseuisafety
Read article
2025-09-218 min

v0.3.1: Chat Queueing + Server‑Side History Search (UX for Real Operators)

Why fast history, debounced search, and prompt queueing matter when you’re triaging an incident at 2am.

releaseuiuxreliability
Read article
2025-09-149 min

Storing Integration Credentials the Boring Way: Kubernetes Secrets + References

How Skyflo avoids leaking secrets into prompts, keeps credentials server-side, and still feels seamless in the UI.

securitykubernetesintegrations
Read article
2025-09-0712 min

Jenkins in Skyflo: Secure Auth, CSRF, and Parameter‑Aware Builds

A deep dive into the Jenkins toolset, integration-aware discovery, and why builds must be parameter-first.

jenkinsci-cdintegrationssecurity
Read article
2025-08-317 min

v0.2.0: The Rebuild — From WebSockets to SSE and a Cleaner Agent Core

What we learned rebuilding Skyflo’s core loop, and why “simpler” was the biggest performance unlock.

releasearchitecturesse
Read article
2025-08-248 min

SSE Done Right: Streaming Tokens + Tool Events Without Melting Your Proxy

A hands-on guide to reliable server-sent events for long-running infra tasks, including NGINX hardening.

ssereliabilitynginxengine
Read article
2025-08-1711 min

MCP in Practice: Standardizing DevOps Tools So AI Can’t Go Rogue

Why Skyflo’s MCP server exists, how tools are validated, and what “readOnlyHint” really buys you in prod.

mcptoolingsecuritykubernetes
Read article
2025-08-1010 min

Inside Skyflo’s LangGraph Workflow: Plan → Execute → Verify (Without the Hype)

How Skyflo compiles a compact graph, streams progress, and decides when to continue, stop, or request approval.

architecturelanggraphenginestreaming
Read article
2025-08-039 min

Why Human-in-the-Loop Is Non‑Negotiable for AI in Production Ops

A practical look at approvals, safety gates, and why “agent autonomy” should still ship with guardrails.

safetysecuritykubernetesai-agents
Read article