Programmatic Tool Calling: When an LLM Should Write Glue Code

What Is Programmatic Tool Calling and When Should You Use It?

Programmatic Tool Calling (PTC) is when an AI agent generates and executes code to orchestrate multiple tool calls, rather than making sequential natural language requests.

There's a point where natural language orchestration becomes inefficient. Consider this request:

"Check the health of all pods in production across 12 namespaces and summarize the top failure modes."

Without PTC: The agent makes 12+ sequential tool calls, each requiring a full LLM turn, burning tokens and adding latency.

With PTC: The agent writes a small script that calls tools in parallel, aggregates results locally, and returns a concise summary.

PTC is ideal when you need:

Loops over multiple resources
Parallel execution across namespaces/clusters
Local aggregation before returning results
Reduced context pollution from intermediate outputs

Why Does Code Beat Natural Language for Loops and Batching?

LLMs have fundamental limitations when it comes to repetitive operations:

Task	LLM via Natural Language	Code
Long sequences	Poor consistency, drift	Deterministic
Parallel execution	Sequential only	Native async support
Intermediate results	Pollute context window	Stay local to script
Batching	Inconsistent grouping	Precise control

TaskLong sequences

LLM via Natural LanguagePoor consistency, drift

CodeDeterministic

TaskParallel execution

LLM via Natural LanguageSequential only

CodeNative async support

TaskIntermediate results

LLM via Natural LanguagePollute context window

CodeStay local to script

TaskBatching

LLM via Natural LanguageInconsistent grouping

CodePrecise control

The architecture becomes:

LLM writes a small, sandboxed script
Script calls call_tool(...) repeatedly (possibly in parallel)
Only the final printed output enters the conversation context

This keeps context clean while enabling complex orchestration.

How Do You Sandbox LLM-Generated Code Safely?

The moment you allow "execute code," aggressive sandboxing becomes mandatory:

Security Control	Implementation
Import restrictions	Whitelist only safe modules (json, asyncio, etc.)
Execution timeouts	Hard limit (e.g., 30 seconds) per script
Tool call limits	Maximum number of tool invocations per script
Tool restrictions	Default to read-only tools; require explicit opt-in for writes
Resource limits	Memory and CPU constraints on execution environment

Security ControlImport restrictions

ImplementationWhitelist only safe modules (json, asyncio, etc.)

Security ControlExecution timeouts

ImplementationHard limit (e.g., 30 seconds) per script

Security ControlTool call limits

ImplementationMaximum number of tool invocations per script

Security ControlTool restrictions

ImplementationDefault to read-only tools; require explicit opt-in for writes

Security ControlResource limits

ImplementationMemory and CPU constraints on execution environment

PTC should be an orchestration mechanism, not a backdoor shell. The code can only call pre-defined tools—it cannot make arbitrary system calls or network requests.

What Does Programmatic Tool Calling Look Like in Practice?

Here's an example of PTC for checking pod health across namespaces:

python

async def main():
    namespaces = ["default", "kube-system", "prod", "staging"]
    
    # Parallel tool calls - doesn't pollute conversation context
    results = await asyncio.gather(*[
        call_tool("k8s_get", resource_type="pods", namespace=ns, output="json")
        for ns in namespaces
    ])
    
    # Local aggregation
    unhealthy = []
    for ns, pods in zip(namespaces, results):
        for pod in pods.get("items", []):
            if pod["status"]["phase"] != "Running":
                unhealthy.append(f"{ns}/{pod['metadata']['name']}")
    
    # Only this output enters the conversation
    print(json.dumps({
        "namespaces_checked": len(namespaces),
        "unhealthy_pods": unhealthy[:10],  # Limit output size
        "total_unhealthy": len(unhealthy)
    }, indent=2))

The conversation sees only the final JSON summary, not 12 separate tool outputs.

Related articles:

FAQ: Programmatic Tool Calling for AI Agents

What is programmatic tool calling (PTC)? PTC is when an AI agent generates executable code to orchestrate multiple tool calls, enabling parallel execution and local aggregation without polluting conversation context.

When should an AI agent use code instead of natural language? Use code for loops over multiple resources, parallel execution across namespaces, batch operations, and any task where intermediate results would bloat context.

How do you secure LLM-generated code execution? Sandbox execution with import restrictions, timeouts, tool call limits, and default read-only access. The code should only interact with pre-defined tools, never arbitrary system calls.

Does PTC replace regular tool calling? No. PTC complements regular tool calling. Simple, single-resource operations should use direct tool calls. PTC is for complex orchestration that would otherwise require many sequential turns.