Metrics
22 predefined metrics across 4 categories
Accuracy7
Whether the agent achieved the scenario's intended outcome
expected_outcomeAgent fabricated information not grounded in tool responses
hallucinationAll expected tools were called with correct arguments
tool_call_successAccuracy of the agent's speech-to-text understanding
transcription_accuracyResponse relevance to user queries
relevancyConsistent answers across the conversation
response_consistencyAgent correctly detected voicemail greeting
voicemail_detectionConversation Quality6
Count of times the agent interrupted the user
ai_interrupting_userMilliseconds for agent to stop after user interruption
stop_time_after_interruption_msResponse latency with P50/P90/P95/P99 percentiles
latency_msNo connection drops, audio gaps, or timeout errors
infrastructure_issuesTotal silence duration in conversation
silence_detectionAgent avoided repeating the same information
unnecessary_repetitionCustomer Experience4
Customer satisfaction score (0-100)
csatOverall conversation sentiment: positive/neutral/negative
sentimentCategorization of the call subject
topic_of_callWhere in the conversation the user dropped off
dropoff_nodeSpeech Quality5
Mean pitch frequency of agent voice
average_pitch_hzRatio of agent speaking time to total time
talk_ratioAgent words per minute
speaking_rate_wpmTime before agent picks up
ringing_durationAgent did not produce unintelligible speech
gibberish