What LLM Metrics Should AI Agents Track and Display?
LLMs introduce a new class of operational questions that traditional monitoring doesn't answer:
- Why did this response take 12 seconds?
- Why did costs suddenly spike?
- Are we hitting provider rate limits?
- Is prompt caching actually working?
If you can't answer these, your AI agent becomes a black box that users avoid.
Core metrics Skyflo tracks and emits:
| Metric | Definition | Why It Matters |
|---|---|---|
| Prompt tokens | Input tokens sent to the model | Primary cost driver |
| Completion tokens | Output tokens generated | Secondary cost driver |
| Cached tokens | Tokens served from cache | Cost savings indicator |
| Estimated cost | USD cost for this turn | Budget visibility |
| TTFT | Time to First Token | Perceived responsiveness |
| TTR | Time to complete Response | Total latency |
What Is TTFT and Why Does It Matter for User Experience?
TTFT (Time to First Token) measures the latency between sending a request and receiving the first token of the response.
TTFT matters because it's what users feel:
| TTFT | User Perception |
|---|---|
| < 500ms | "Instant" — feels like autocomplete |
| 500ms - 2s | "Fast" — acceptable for complex queries |
| 2s - 5s | "Slow" — users start to disengage |
| > 5s | "Broken" — users assume something failed |
Displaying TTFT in real-time helps users understand whether slowness is network, model, or prompt size—and empowers them to adjust.
How Do Real-Time Metrics Change User Behavior?
When you display TTFT/TTR live, users stop guessing and start collaborating with the system:
Before metrics visibility:
"The agent is slow. It's probably broken. Let me try refreshing."
After metrics visibility:
"TTFT is 3.2s—this model is slow for interactive work. Let me switch to a faster model for debugging."
Users learn to:
- Choose appropriate models for different tasks
- Request concise responses when latency matters
- Understand cost/quality tradeoffs
- Identify when caching is helping
Why Should You Persist LLM Metrics?
Real-time metrics inform in-the-moment decisions. Persisted metrics enable optimization:
| Question | Requires Persisted Metrics |
|---|---|
| "Which conversations are expensive?" | Yes |
| "Which task types cause long TTR?" | Yes |
| "How does caching affect weekly costs?" | Yes |
| "Is model X faster than model Y for our workload?" | Yes |
By storing metrics alongside conversation history, you can:
- Build cost dashboards per team/project
- Identify optimization opportunities
- Set alerts on cost anomalies
- Justify model selection decisions
How Do Metrics Build Trust in AI Systems?
Users don't need perfect predictability from AI systems. They need transparency and control:
| Transparency Feature | Trust Impact |
|---|---|
| Show latency | Users understand wait times |
| Show cost | Users feel budget control |
| Show retries | Users know the system is working |
| Show rate limiting | Users understand external constraints |
Metrics don't just inform engineering decisions—they build trust with every user who interacts with the system.
Related articles:
FAQ: LLM Token Metrics and Observability
What is TTFT (Time to First Token)? TTFT measures the latency from sending a request to receiving the first token. It's the primary indicator of perceived responsiveness.
What is TTR (Time to Respond)? TTR measures total response latency from request to completion. It includes TTFT plus the time to generate all completion tokens.
How are LLM costs calculated? Costs are calculated from token counts multiplied by per-token pricing: (prompt_tokens × input_price) + (completion_tokens × output_price).
Why show metrics in real-time instead of just in dashboards? Real-time metrics enable users to make in-the-moment decisions (e.g., cancel a slow request, switch models) and build immediate trust in system behavior.