The Case for Tool Search: Shrinking Context Without Losing Capability

What Is the Tool Context Problem in AI Agents?

As an AI agent's tool catalog grows, its performance degrades unless you architect for scale. The naive approach is to load every tool schema into the model context. It works until you have:

50+ tools with detailed schemas
Rich parameter descriptions and examples
Long conversation history
Complex nested tool definitions

Then you hit context bloat: the model becomes slower, less accurate, and more expensive per turn.

What Are the Hidden Costs of "All Tools Always" Architecture?

Loading all tool definitions upfront has compounding costs:

Cost Type	Impact
Token cost	Every turn pays for 50+ tool schemas (~5,000-15,000 tokens)
Attention dilution	Model becomes less confident in tool selection
Latency	Larger prompts mean slower inference
Scalability ceiling	Tool count becomes a hard constraint

Cost TypeToken cost

ImpactEvery turn pays for 50+ tool schemas (~5,000-15,000 tokens)

Cost TypeAttention dilution

ImpactModel becomes less confident in tool selection

Cost TypeLatency

ImpactLarger prompts mean slower inference

Cost TypeScalability ceiling

ImpactTool count becomes a hard constraint

When operators ask "why is the agent confused about which tool to use?", the answer is often context overload.

How Does Tool Search Solve Context Bloat?

Tool search inverts the loading model:

Before (All Tools Always):

code

[System prompt] + [50 tool schemas] + [Conversation] → LLM

After (Tool Search):

code

[System prompt] + [5 core tools + search_tools] + [Conversation] → LLM
                              ↓
                    [On demand: load specific tool schema]

The agent should:

Keep a small always-loaded tool set: core discovery tools and common operations
Use `search_tools` capability: when the agent needs something outside the core set
Load tool details on demand: load the full schema only when needed

This mirrors how humans work. We don't memorize every kubectl flag; we look it up when we need it.

How Do You Implement Tool Search in MCP?

A practical implementation exposes two MCP tools:

1. `search_tools` (always loaded)

json

{
  "name": "search_tools",
  "description": "Search available tools by keyword or category",
  "parameters": {
    "query": "string - search term",
    "category": "string - optional filter (k8s, helm, jenkins)"
  },
  "returns": [
    { "name": "tool_name", "title": "Human Title", "description": "Brief desc", "tags": ["k8s"] }
  ]
}

2. `get_tool_schema` (always loaded)

json

{
  "name": "get_tool_schema", 
  "description": "Get full schema for a specific tool",
  "parameters": {
    "tool_name": "string - exact tool name from search"
  },
  "returns": "Full tool schema with parameters"
}

This keeps baseline context small (~500 tokens for discovery tools) while maintaining access to unlimited capability.

What Are the Tradeoffs of Tool Search?

Tool search adds one extra LLM turn when a new tool is needed. In exchange:

Benefit	Impact
Smaller baseline prompts	80%+ reduction in tool schema tokens
Sharper tool selection	Model sees only relevant tools for the task
Longer conversations	Stay within context budget much longer
Unlimited scaling	Add tools without degrading performance

BenefitSmaller baseline prompts

Impact80%+ reduction in tool schema tokens

BenefitSharper tool selection

ImpactModel sees only relevant tools for the task

BenefitLonger conversations

ImpactStay within context budget much longer

BenefitUnlimited scaling

ImpactAdd tools without degrading performance

For ops agents that need broad capability (Kubernetes + Helm + Jenkins + Argo + more), tool search is one of the highest-ROI architectural decisions you can make early.

Related articles:

FAQ: Tool Search for AI Agents

What is tool search in AI agents? Tool search is a pattern where agents discover and load tool schemas on demand rather than having all tools pre-loaded in context, reducing token usage and improving accuracy.

How much context does tool search save? Tool search can reduce tool-related context by 80%+ by replacing 50+ full schemas (~10,000+ tokens) with 2-3 discovery tools (~500 tokens).

Does tool search add latency? Yes, one additional LLM turn when a new tool is needed. However, the overall conversation becomes faster because smaller context means faster inference per turn.

When should tools be always-loaded vs. searchable? Always load: core discovery tools, very common operations (k8s get, describe). Make searchable: specialized tools, integrations that aren't always relevant.