Blog

Programmatic Tool Calling: When an LLM Should Write Glue Code

Loops, batching, parallelism, and summarization—where code beats prompts, and how to sandbox it safely.

13 min read
roadmaporchestrationsecurityai-agents

What Is Programmatic Tool Calling and When Should You Use It?

Programmatic Tool Calling (PTC) is when an AI agent generates and executes code to orchestrate multiple tool calls, rather than making sequential natural language requests.

There's a point where natural language orchestration becomes inefficient. Consider this request:

"Check the health of all pods in production across 12 namespaces and summarize the top failure modes."

Without PTC: The agent makes 12+ sequential tool calls, each requiring a full LLM turn, burning tokens and adding latency.

With PTC: The agent writes a small script that calls tools in parallel, aggregates results locally, and returns a concise summary.

PTC is ideal when you need:

  • Loops over multiple resources
  • Parallel execution across namespaces/clusters
  • Local aggregation before returning results
  • Reduced context pollution from intermediate outputs

Why Does Code Beat Natural Language for Loops and Batching?

LLMs have fundamental limitations when it comes to repetitive operations:

TaskLong sequences
LLM via Natural LanguagePoor consistency, drift
CodeDeterministic
TaskParallel execution
LLM via Natural LanguageSequential only
CodeNative async support
TaskIntermediate results
LLM via Natural LanguagePollute context window
CodeStay local to script
TaskBatching
LLM via Natural LanguageInconsistent grouping
CodePrecise control

The architecture becomes:

  1. LLM writes a small, sandboxed script
  2. Script calls call_tool(...) repeatedly (possibly in parallel)
  3. Only the final printed output enters the conversation context

This keeps context clean while enabling complex orchestration.


How Do You Sandbox LLM-Generated Code Safely?

The moment you allow "execute code," aggressive sandboxing becomes mandatory:

Security ControlImport restrictions
ImplementationWhitelist only safe modules (json, asyncio, etc.)
Security ControlExecution timeouts
ImplementationHard limit (e.g., 30 seconds) per script
Security ControlTool call limits
ImplementationMaximum number of tool invocations per script
Security ControlTool restrictions
ImplementationDefault to read-only tools; require explicit opt-in for writes
Security ControlResource limits
ImplementationMemory and CPU constraints on execution environment

PTC should be an orchestration mechanism, not a backdoor shell. The code can only call pre-defined tools—it cannot make arbitrary system calls or network requests.


What Does Programmatic Tool Calling Look Like in Practice?

Here's an example of PTC for checking pod health across namespaces:

python
async def main():
    namespaces = ["default", "kube-system", "prod", "staging"]
    
    # Parallel tool calls - doesn't pollute conversation context
    results = await asyncio.gather(*[
        call_tool("k8s_get", resource_type="pods", namespace=ns, output="json")
        for ns in namespaces
    ])
    
    # Local aggregation
    unhealthy = []
    for ns, pods in zip(namespaces, results):
        for pod in pods.get("items", []):
            if pod["status"]["phase"] != "Running":
                unhealthy.append(f"{ns}/{pod['metadata']['name']}")
    
    # Only this output enters the conversation
    print(json.dumps({
        "namespaces_checked": len(namespaces),
        "unhealthy_pods": unhealthy[:10],  # Limit output size
        "total_unhealthy": len(unhealthy)
    }, indent=2))

The conversation sees only the final JSON summary, not 12 separate tool outputs.

Related articles:


FAQ: Programmatic Tool Calling for AI Agents

What is programmatic tool calling (PTC)? PTC is when an AI agent generates executable code to orchestrate multiple tool calls, enabling parallel execution and local aggregation without polluting conversation context.

When should an AI agent use code instead of natural language? Use code for loops over multiple resources, parallel execution across namespaces, batch operations, and any task where intermediate results would bloat context.

How do you secure LLM-generated code execution? Sandbox execution with import restrictions, timeouts, tool call limits, and default read-only access. The code should only interact with pre-defined tools, never arbitrary system calls.

Does PTC replace regular tool calling? No. PTC complements regular tool calling. Simple, single-resource operations should use direct tool calls. PTC is for complex orchestration that would otherwise require many sequential turns.

Schedule a Demo

See Skyflo in Action

Book a personalized demo with our team. We'll show you how Skyflo can transform your DevOps workflows.