ConceptsMulti-LLM Support

Multi-LLM Support

Skyflo uses LiteLLM for multi-provider LLM support. No vendor lock-in on the AI layer. Configure the model and API key in Helm values. The Engine routes all LLM calls through LiteLLM.

Supported Providers

ProviderModel ExampleConfig Key
Google Gemini (recommended)gemini/gemini-2.5-progeminiApiKey
OpenAIopenai/gpt-5.3-codexopenaiApiKey
Anthropicanthropic/claude-sonnet-4-6anthropicApiKey
Moonshotmoonshot/kimi-k2.5moonshotApiKey
DeepSeekdeepseek/deepseek-reasonerdeepseekApiKey
Groqgroq/openai/gpt-oss-120bgroqApiKey
Mistralmistral/mistral-largemistralApiKey
Ollama (self-hosted)ollama/llama3.1:8bollamaApiKey (+ set llmHost)
AWS Bedrockbedrock/...awsAccessKeyId + awsSecretAccessKey
Azure OpenAIazure/...azureApiKey

Skyflo connects to LLM providers through LiteLLM. Additional providers include HuggingFace, Databricks, Fireworks AI, Together AI, NVIDIA NIM, Perplexity, and xAI. See the full list of supported models.

Configuration

Set engine.secrets.llmModel to the LiteLLM model string. Set the corresponding API key in engine.secrets:

yaml
engine:
  secrets:
    llmModel: gemini/gemini-2.5-pro
    geminiApiKey: AI...

For self-hosted models (Ollama or other local endpoints), also set engine.secrets.llmHost:

yaml
engine:
  secrets:
    llmModel: ollama/llama3.1:8b
    llmHost: http://ollama.default.svc:11434

Thinking / Reasoning Models

Skyflo auto-detects reasoning capabilities using LiteLLM's model registry. Models with native reasoning support are enabled automatically:

  • OpenAI o-series (o1, o3)
  • Anthropic Claude with extended thinking
  • DeepSeek-R1

When detected, reasoning is enabled at high effort by default. The reasoning process streams to the Command Center in collapsible thinking blocks, giving full visibility into the model's chain of thought.

Setting llmReasoningEffort to high yields the best results with Skyflo.

Configuration

Override defaults via Helm values under engine.secrets:

SettingDefaultDescription
llmReasoningEfforthighReasoning effort level (low, medium, high)
llmThinkingBudgetTokens10000Anthropic-specific thinking budget
llmMaxTokens16384Max output tokens when thinking is enabled

Higher effort yields more thorough reasoning for complex multi-step operations but increases latency and cost.

Best Practice

Start with gemini/gemini-2.5-pro for production use. It handles multi-step planning and tool orchestration well across Kubernetes, Helm, and CI/CD workflows.

For read-heavy workflows (discovery, logs, status), lighter models can suffice. For mutations and rollbacks, prefer a model with strong reasoning.