Blog

The Case for Tool Search: Shrinking Context Without Losing Capability

A roadmap post: defer tool schemas until needed, reduce token bloat, and keep the agent accurate under pressure.

12 min read
roadmapcontextmcpai-agents

What Is the Tool Context Problem in AI Agents?

As an AI agent's tool catalog grows, its performance degrades unless you architect for scale. The naive approach—loading every tool schema into the model context—works until you have:

  • 50+ tools with detailed schemas
  • Rich parameter descriptions and examples
  • Long conversation history
  • Complex nested tool definitions

Then you hit context bloat: the model becomes slower, less accurate, and more expensive per turn.


What Are the Hidden Costs of "All Tools Always" Architecture?

Loading all tool definitions upfront has compounding costs:

Cost TypeToken cost
ImpactEvery turn pays for 50+ tool schemas (~5,000-15,000 tokens)
Cost TypeAttention dilution
ImpactModel becomes less confident in tool selection
Cost TypeLatency
ImpactLarger prompts mean slower inference
Cost TypeScalability ceiling
ImpactTool count becomes a hard constraint

When operators ask "why is the agent confused about which tool to use?"—often, the answer is context overload.


How Does Tool Search Solve Context Bloat?

Tool search inverts the loading model:

Before (All Tools Always):

code
[System prompt] + [50 tool schemas] + [Conversation] → LLM

After (Tool Search):

code
[System prompt] + [5 core tools + search_tools] + [Conversation] → LLM
                              ↓
                    [On demand: load specific tool schema]

The agent should:

  1. Keep a small always-loaded tool set — Core discovery tools, common operations
  2. Use `search_tools` capability — When the agent needs something outside the core set
  3. Load tool details on demand — Full schema fetched only when needed

This mirrors how humans work. We don't memorize every kubectl flag—we search documentation when needed.


How Do You Implement Tool Search in MCP?

A practical implementation exposes two MCP tools:

1. `search_tools` (always loaded)

json
{
  "name": "search_tools",
  "description": "Search available tools by keyword or category",
  "parameters": {
    "query": "string - search term",
    "category": "string - optional filter (k8s, helm, jenkins)"
  },
  "returns": [
    { "name": "tool_name", "title": "Human Title", "description": "Brief desc", "tags": ["k8s"] }
  ]
}

2. `get_tool_schema` (always loaded)

json
{
  "name": "get_tool_schema", 
  "description": "Get full schema for a specific tool",
  "parameters": {
    "tool_name": "string - exact tool name from search"
  },
  "returns": "Full tool schema with parameters"
}

This keeps baseline context small (~500 tokens for discovery tools) while maintaining access to unlimited capability.


Tool search adds one extra LLM turn when a new tool is needed. In exchange:

BenefitSmaller baseline prompts
Impact80%+ reduction in tool schema tokens
BenefitSharper tool selection
ImpactModel sees only relevant tools for the task
BenefitLonger conversations
ImpactStay within context budget much longer
BenefitUnlimited scaling
ImpactAdd tools without degrading performance

For ops agents that need broad capability (Kubernetes + Helm + Jenkins + Argo + more), tool search is one of the highest-ROI architectural decisions you can make early.

Related articles:


FAQ: Tool Search for AI Agents

What is tool search in AI agents? Tool search is a pattern where agents discover and load tool schemas on demand rather than having all tools pre-loaded in context, reducing token usage and improving accuracy.

How much context does tool search save? Tool search can reduce tool-related context by 80%+ by replacing 50+ full schemas (~10,000+ tokens) with 2-3 discovery tools (~500 tokens).

Does tool search add latency? Yes, one additional LLM turn when a new tool is needed. However, the overall conversation becomes faster because smaller context means faster inference per turn.

When should tools be always-loaded vs. searchable? Always load: core discovery tools, very common operations (k8s get, describe). Make searchable: specialized tools, integrations that aren't always relevant.

Schedule a Demo

See Skyflo in Action

Book a personalized demo with our team. We'll show you how Skyflo can transform your DevOps workflows.