Why Do Kubernetes Rollbacks Often Fail in Practice?
Rollbacks are one of those operations teams talk about as if they're trivial—until they need one during an incident.
In real incidents, rollbacks fail because:
- No one remembers which revision was "good"
- Rollout history is cluttered and confusing
- Engineers panic and "just restart things" instead of rolling back
- Wrong resource type (deployment vs. statefulset vs. daemonset)
Skyflo treats rollbacks as a structured workflow, not a single command.
What Is the Safe Rollback Workflow?
A safe rollback follows this sequence:
| Step | Operation | Access Level |
|---|---|---|
| 1 | Inspect rollout history | Read-only |
| 2 | Identify target revision | Read-only |
| 3 | Request approval for rollback | Human decision |
| 4 | Execute rollback | Write (after approval) |
| 5 | Verify rollout status | Read-only |
The key insight: Never undo without first inspecting history. If an agent executes kubectl rollout undo without checking which revision exists, you'll eventually roll back to the wrong thing.
How Does Skyflo Expose Rollout Tools via MCP?
Skyflo provides two MCP tools that enforce the safe sequence:
`k8s_rollout_history` (read-only)
kubectl rollout history deployment/api-server -n productionReturns revision list with change causes, timestamps, and annotations.
`k8s_rollout_undo` (requires approval)
kubectl rollout undo deployment/api-server -n production --to-revision=3Only executes after explicit human approval.
The agent's natural flow becomes:
- "Let me check the rollout history for this deployment"
- "I see revision 4 (current) was deployed 30 minutes ago, revision 3 was stable for 2 weeks"
- "I recommend rolling back to revision 3. [Approve] [Deny]"
Why Do Resource Types Matter for Rollbacks?
Rollout commands behave differently across Kubernetes resource types:
| Resource | Rollout Support | Notes |
|---|---|---|
| Deployment | Full | Most common rollback target |
| DaemonSet | Full | Rolling updates across nodes |
| StatefulSet | Full | Ordered rollback with pod identity |
| ReplicaSet | None | Managed by Deployments |
Skyflo's tools explicitly require resource type as a parameter rather than guessing. Guessing is how you end up rolling back a deployment when you meant a statefulset.
Why Do Approvals Belong on Undo Operations?
Rollout history is read-only and executes immediately—it gathers information.
Rollout undo changes production state. Therefore:
- Undo requires explicit approval
- The approval shows: namespace, resource name, target revision
- Audit trail: Who approved, when, what revision
This gives operators a chance to verify they're rolling back:
- The correct namespace
- The correct resource
- The correct revision
Related articles:
FAQ: Kubernetes Rollbacks with AI Agents
What is kubectl rollout history? kubectl rollout history shows the revision history of a deployment, daemonset, or statefulset, including change causes and timestamps.
What is kubectl rollout undo? kubectl rollout undo reverts a workload to a previous revision. Without --to-revision, it rolls back to the immediately previous revision.
Why should you check history before rolling back? Checking history ensures you understand what revision you're reverting to. Blindly undoing might roll back to an equally broken or older broken state.
Do all Kubernetes resources support rollout? No. Deployments, DaemonSets, and StatefulSets support rollout. ReplicaSets, Pods, and Services do not have rollout history.