Real‑Time Token Metrics: TTFT, TTR, Cached Tokens, and Cost (Trust Builders)

What LLM Metrics Should AI Agents Track and Display?

LLMs introduce a new class of operational questions that traditional monitoring doesn't answer:

Why did this response take 12 seconds?
Why did costs suddenly spike?
Are we hitting provider rate limits?
Is prompt caching actually working?

If you can't answer these, your AI agent becomes a black box that users avoid.

Core metrics Skyflo tracks and emits:

Metric	Definition	Why It Matters
Prompt tokens	Input tokens sent to the model	Primary cost driver
Completion tokens	Output tokens generated	Secondary cost driver
Cached tokens	Tokens served from cache	Cost savings indicator
Estimated cost	USD cost for this turn	Budget visibility
TTFT	Time to First Token	Perceived responsiveness
TTR	Time to complete Response	Total latency

MetricPrompt tokens

DefinitionInput tokens sent to the model

Why It MattersPrimary cost driver

MetricCompletion tokens

DefinitionOutput tokens generated

Why It MattersSecondary cost driver

MetricCached tokens

DefinitionTokens served from cache

Why It MattersCost savings indicator

MetricEstimated cost

DefinitionUSD cost for this turn

Why It MattersBudget visibility

MetricTTFT

DefinitionTime to First Token

Why It MattersPerceived responsiveness

MetricTTR

DefinitionTime to complete Response

Why It MattersTotal latency

What Is TTFT and Why Does It Matter for User Experience?

TTFT (Time to First Token) measures the latency between sending a request and receiving the first token of the response.

TTFT matters because it's what users feel:

TTFT	User Perception
< 500ms	"Instant" (feels like autocomplete)
500ms - 2s	"Fast" (acceptable for complex queries)
2s - 5s	"Slow" (users start to disengage)
> 5s	"Broken" (users assume something failed)

TTFT< 500ms

User Perception"Instant" (feels like autocomplete)

TTFT500ms - 2s

User Perception"Fast" (acceptable for complex queries)

TTFT2s - 5s

User Perception"Slow" (users start to disengage)

TTFT> 5s

User Perception"Broken" (users assume something failed)

Displaying TTFT in real-time helps users understand whether slowness is network, model, or prompt size, and it empowers them to adjust.

How Do Real-Time Metrics Change User Behavior?

When you display TTFT/TTR live, users stop guessing and start collaborating with the system:

Before metrics visibility:

"The agent is slow. It's probably broken. Let me try refreshing."

After metrics visibility:

"TTFT is 3.2s; this model is slow for interactive work. Let me switch to a faster model for debugging."

Users learn to:

Choose appropriate models for different tasks
Request concise responses when latency matters
Understand cost/quality tradeoffs
Identify when caching is helping

Why Should You Persist LLM Metrics?

Real-time metrics inform in-the-moment decisions. Persisted metrics enable optimization:

Question	Requires Persisted Metrics
"Which conversations are expensive?"	Yes
"Which task types cause long TTR?"	Yes
"How does caching affect weekly costs?"	Yes
"Is model X faster than model Y for our workload?"	Yes

Question"Which conversations are expensive?"

Requires Persisted MetricsYes

Question"Which task types cause long TTR?"

Requires Persisted MetricsYes

Question"How does caching affect weekly costs?"

Requires Persisted MetricsYes

Question"Is model X faster than model Y for our workload?"

Requires Persisted MetricsYes

By storing metrics alongside conversation history, you can:

Build cost dashboards per team/project
Identify optimization opportunities
Set alerts on cost anomalies
Justify model selection decisions

How Do Metrics Build Trust in AI Systems?

Users don't need perfect predictability from AI systems. They need transparency and control:

Transparency Feature	Trust Impact
Show latency	Users understand wait times
Show cost	Users feel budget control
Show retries	Users know the system is working
Show rate limiting	Users understand external constraints

Transparency FeatureShow latency

Trust ImpactUsers understand wait times

Transparency FeatureShow cost

Trust ImpactUsers feel budget control

Transparency FeatureShow retries

Trust ImpactUsers know the system is working

Transparency FeatureShow rate limiting

Trust ImpactUsers understand external constraints

Metrics don't just inform engineering decisions; they build trust with every user who interacts with the system.

Related articles:

FAQ: LLM Token Metrics and Observability

What is TTFT (Time to First Token)? TTFT measures the latency from sending a request to receiving the first token. It's the primary indicator of perceived responsiveness.

What is TTR (Time to Respond)? TTR measures total response latency from request to completion. It includes TTFT plus the time to generate all completion tokens.

How are LLM costs calculated? Costs are calculated from token counts multiplied by per-token pricing: (prompt_tokens × input_price) + (completion_tokens × output_price).

Why show metrics in real-time instead of just in dashboards? Real-time metrics enable users to make in-the-moment decisions (e.g., cancel a slow request, switch models) and build immediate trust in system behavior.

What LLM Metrics Should AI Agents Track and Display?

What Is TTFT and Why Does It Matter for User Experience?

How Do Real-Time Metrics Change User Behavior?

Why Should You Persist LLM Metrics?

How Do Metrics Build Trust in AI Systems?

FAQ: LLM Token Metrics and Observability

See Skyflo in Action