What Is the Tool Context Problem in AI Agents?
As an AI agent's tool catalog grows, its performance degrades unless you architect for scale. The naive approach—loading every tool schema into the model context—works until you have:
- 50+ tools with detailed schemas
- Rich parameter descriptions and examples
- Long conversation history
- Complex nested tool definitions
Then you hit context bloat: the model becomes slower, less accurate, and more expensive per turn.
What Are the Hidden Costs of "All Tools Always" Architecture?
Loading all tool definitions upfront has compounding costs:
| Cost Type | Impact |
|---|---|
| Token cost | Every turn pays for 50+ tool schemas (~5,000-15,000 tokens) |
| Attention dilution | Model becomes less confident in tool selection |
| Latency | Larger prompts mean slower inference |
| Scalability ceiling | Tool count becomes a hard constraint |
When operators ask "why is the agent confused about which tool to use?"—often, the answer is context overload.
How Does Tool Search Solve Context Bloat?
Tool search inverts the loading model:
Before (All Tools Always):
[System prompt] + [50 tool schemas] + [Conversation] → LLMAfter (Tool Search):
[System prompt] + [5 core tools + search_tools] + [Conversation] → LLM
↓
[On demand: load specific tool schema]The agent should:
- Keep a small always-loaded tool set — Core discovery tools, common operations
- Use `search_tools` capability — When the agent needs something outside the core set
- Load tool details on demand — Full schema fetched only when needed
This mirrors how humans work. We don't memorize every kubectl flag—we search documentation when needed.
How Do You Implement Tool Search in MCP?
A practical implementation exposes two MCP tools:
1. `search_tools` (always loaded)
{
"name": "search_tools",
"description": "Search available tools by keyword or category",
"parameters": {
"query": "string - search term",
"category": "string - optional filter (k8s, helm, jenkins)"
},
"returns": [
{ "name": "tool_name", "title": "Human Title", "description": "Brief desc", "tags": ["k8s"] }
]
}2. `get_tool_schema` (always loaded)
{
"name": "get_tool_schema",
"description": "Get full schema for a specific tool",
"parameters": {
"tool_name": "string - exact tool name from search"
},
"returns": "Full tool schema with parameters"
}This keeps baseline context small (~500 tokens for discovery tools) while maintaining access to unlimited capability.
What Are the Tradeoffs of Tool Search?
Tool search adds one extra LLM turn when a new tool is needed. In exchange:
| Benefit | Impact |
|---|---|
| Smaller baseline prompts | 80%+ reduction in tool schema tokens |
| Sharper tool selection | Model sees only relevant tools for the task |
| Longer conversations | Stay within context budget much longer |
| Unlimited scaling | Add tools without degrading performance |
For ops agents that need broad capability (Kubernetes + Helm + Jenkins + Argo + more), tool search is one of the highest-ROI architectural decisions you can make early.
Related articles:
- Auto-Summarization for Long Conversations
- MCP in Practice: Standardizing DevOps Tools So AI Can't Go Rogue
FAQ: Tool Search for AI Agents
What is tool search in AI agents? Tool search is a pattern where agents discover and load tool schemas on demand rather than having all tools pre-loaded in context, reducing token usage and improving accuracy.
How much context does tool search save? Tool search can reduce tool-related context by 80%+ by replacing 50+ full schemas (~10,000+ tokens) with 2-3 discovery tools (~500 tokens).
Does tool search add latency? Yes, one additional LLM turn when a new tool is needed. However, the overall conversation becomes faster because smaller context means faster inference per turn.
When should tools be always-loaded vs. searchable? Always load: core discovery tools, very common operations (k8s get, describe). Make searchable: specialized tools, integrations that aren't always relevant.