Multi-LLM Support
Skyflo uses LiteLLM for multi-provider LLM support. No vendor lock-in on the AI layer. Configure the model and API key in Helm values. The Engine routes all LLM calls through LiteLLM.
Supported Providers
| Provider | Model Example | Config Key |
|---|---|---|
| Google Gemini (recommended) | gemini/gemini-2.5-pro | geminiApiKey |
| OpenAI | openai/gpt-5.3-codex | openaiApiKey |
| Anthropic | anthropic/claude-sonnet-4-6 | anthropicApiKey |
| Moonshot | moonshot/kimi-k2.5 | moonshotApiKey |
| DeepSeek | deepseek/deepseek-reasoner | deepseekApiKey |
| Groq | groq/openai/gpt-oss-120b | groqApiKey |
| Mistral | mistral/mistral-large | mistralApiKey |
| Ollama (self-hosted) | ollama/llama3.1:8b | ollamaApiKey (+ set llmHost) |
| AWS Bedrock | bedrock/... | awsAccessKeyId + awsSecretAccessKey |
| Azure OpenAI | azure/... | azureApiKey |
Skyflo connects to LLM providers through LiteLLM. Additional providers include HuggingFace, Databricks, Fireworks AI, Together AI, NVIDIA NIM, Perplexity, and xAI. See the full list of supported models.
Configuration
Set engine.secrets.llmModel to the LiteLLM model string. Set the corresponding API key in engine.secrets:
engine:
secrets:
llmModel: gemini/gemini-2.5-pro
geminiApiKey: AI...For self-hosted models (Ollama or other local endpoints), also set engine.secrets.llmHost:
engine:
secrets:
llmModel: ollama/llama3.1:8b
llmHost: http://ollama.default.svc:11434Thinking / Reasoning Models
Skyflo auto-detects reasoning capabilities using LiteLLM's model registry. Models with native reasoning support are enabled automatically:
- OpenAI o-series (o1, o3)
- Anthropic Claude with extended thinking
- DeepSeek-R1
When detected, reasoning is enabled at high effort by default. The reasoning process streams to the Command Center in collapsible thinking blocks, giving full visibility into the model's chain of thought.
Setting llmReasoningEffort to high yields the best results with Skyflo.
Configuration
Override defaults via Helm values under engine.secrets:
| Setting | Default | Description |
|---|---|---|
llmReasoningEffort | high | Reasoning effort level (low, medium, high) |
llmThinkingBudgetTokens | 10000 | Anthropic-specific thinking budget |
llmMaxTokens | 16384 | Max output tokens when thinking is enabled |
Higher effort yields more thorough reasoning for complex multi-step operations but increases latency and cost.
Best Practice
Start with gemini/gemini-2.5-pro for production use. It handles multi-step planning and tool orchestration well across Kubernetes, Helm, and CI/CD workflows.
For read-heavy workflows (discovery, logs, status), lighter models can suffice. For mutations and rollbacks, prefer a model with strong reasoning.
