Multi-LLM Support

Skyflo uses LiteLLM for multi-provider LLM support. No vendor lock-in on the AI layer. Configure the model and API key in Helm values. The Engine routes all LLM calls through LiteLLM.

Supported Providers

Provider	Model Example	Config Key
Google Gemini (recommended)	`gemini/gemini-2.5-pro`	geminiApiKey
OpenAI	`openai/gpt-5.3-codex`	openaiApiKey
Anthropic	`anthropic/claude-sonnet-4-6`	anthropicApiKey
Moonshot	`moonshot/kimi-k2.5`	moonshotApiKey
DeepSeek	`deepseek/deepseek-reasoner`	deepseekApiKey
Groq	`groq/openai/gpt-oss-120b`	groqApiKey
Mistral	`mistral/mistral-large`	mistralApiKey
Ollama (self-hosted)	`ollama/llama3.1:8b`	ollamaApiKey (+ set llmHost)
AWS Bedrock	`bedrock/...`	awsAccessKeyId + awsSecretAccessKey
Azure OpenAI	`azure/...`	azureApiKey

Skyflo connects to LLM providers through LiteLLM. Additional providers include HuggingFace, Databricks, Fireworks AI, Together AI, NVIDIA NIM, Perplexity, and xAI. See the full list of supported models.

Configuration

Set engine.secrets.llmModel to the LiteLLM model string. Set the corresponding API key in engine.secrets:

yaml

engine:
  secrets:
    llmModel: gemini/gemini-2.5-pro
    geminiApiKey: AI...

For self-hosted models (Ollama or other local endpoints), also set engine.secrets.llmHost:

yaml

engine:
  secrets:
    llmModel: ollama/llama3.1:8b
    llmHost: http://ollama.default.svc:11434

Thinking / Reasoning Models

Skyflo auto-detects reasoning capabilities using LiteLLM's model registry. Models with native reasoning support are enabled automatically:

OpenAI o-series (o1, o3)
Anthropic Claude with extended thinking
DeepSeek-R1

When detected, reasoning is enabled at high effort by default. The reasoning process streams to the Command Center in collapsible thinking blocks, giving full visibility into the model's chain of thought.

Setting llmReasoningEffort to high yields the best results with Skyflo.

Configuration

Override defaults via Helm values under engine.secrets:

Setting	Default	Description
`llmReasoningEffort`	`high`	Reasoning effort level (`low`, `medium`, `high`)
`llmThinkingBudgetTokens`	`10000`	Anthropic-specific thinking budget
`llmMaxTokens`	`16384`	Max output tokens when thinking is enabled

Higher effort yields more thorough reasoning for complex multi-step operations but increases latency and cost.

Best Practice

Start with gemini/gemini-2.5-pro for production use. It handles multi-step planning and tool orchestration well across Kubernetes, Helm, and CI/CD workflows.

For read-heavy workflows (discovery, logs, status), lighter models can suffice. For mutations and rollbacks, prefer a model with strong reasoning.