Monitoring is the backbone of any production voice AI system. Without proper monitoring, you’re flying blind - unable to detect issues, optimize performance, or understand user behavior.
Real-time Detection:
Quality Assurance:
Business Intelligence:
Modern voice systems require structured logging in JSON format for easy parsing and analysis.
Standard Fields:
{
"timestamp": "2025-01-24T10:15:22Z",
"session_id": "abcd-1234-5678-efgh",
"call_id": "CA1234567890abcdef",
"user_id": "user_12345",
"phone_number": "+15551234567",
"event_type": "call_start",
"component": "ivr_gateway",
"latency_ms": 180,
"status": "success",
"metadata": {
"intent_detected": "CheckBalance",
"ivr_node": "BalanceMenu",
"confidence_score": 0.92
}
}
Call Lifecycle Events:
Performance Events:
User Interaction Events:
Speech Recognition Metrics:
Conversation Quality Metrics:
Customer Experience Metrics:
Technical Performance Metrics:
# ASR Accuracy Calculation
def calculate_asr_accuracy(recognized_text, actual_text):
"""Calculate Word Error Rate (WER)"""
recognized_words = recognized_text.lower().split()
actual_words = actual_text.lower().split()
# Calculate Levenshtein distance
distance = levenshtein_distance(recognized_words, actual_words)
wer = distance / len(actual_words)
accuracy = 1 - wer
return accuracy
# First Call Resolution Rate
def calculate_fcr_rate(total_calls, resolved_calls):
"""Calculate First Call Resolution rate"""
fcr_rate = (resolved_calls / total_calls) * 100
return fcr_rate
# Average Handling Time
def calculate_aht(call_durations):
"""Calculate Average Handling Time"""
total_duration = sum(call_durations)
aht = total_duration / len(call_durations)
return aht
Amazon CloudWatch:
Azure Monitor:
Google Cloud Operations:
Prometheus + Grafana:
ELK Stack (Elasticsearch, Logstash, Kibana):
Jaeger/Zipkin:
Twilio Voice Insights:
Genesys Cloud CX Analytics:
Asterisk Monitoring:
Critical Thresholds:
alerts:
- name: "High TTS Latency"
condition: "tts_latency_ms > 1000"
severity: "critical"
notification: ["slack", "pagerduty"]
- name: "High Error Rate"
condition: "error_rate > 0.02"
severity: "warning"
notification: ["slack"]
- name: "Low ASR Accuracy"
condition: "asr_accuracy < 0.85"
severity: "warning"
notification: ["email", "slack"]
- name: "System Down"
condition: "uptime < 0.99"
severity: "critical"
notification: ["pagerduty", "phone"]
Notification Channels:
Key Dashboard Components:
1. Logs (What Happened):
2. Metrics (How Much):
3. Traces (Where/When):
Trace Correlation:
# Example trace correlation
def handle_voice_request(request):
trace_id = generate_trace_id()
# Log with trace correlation
logger.info("Voice request received", extra={
"trace_id": trace_id,
"session_id": request.session_id,
"call_id": request.call_id
})
# Process through different services
with tracer.start_span("stt_processing", trace_id=trace_id):
text = process_speech(request.audio)
with tracer.start_span("intent_detection", trace_id=trace_id):
intent = detect_intent(text)
with tracer.start_span("tts_generation", trace_id=trace_id):
response = generate_speech(intent.response)
return response
Voice Anomaly Detection:
Machine Learning Models:
# Example anomaly detection
def detect_voice_anomaly(audio_features):
"""Detect anomalies in voice patterns"""
model = load_anomaly_detection_model()
# Extract features
features = extract_audio_features(audio_features)
# Predict anomaly score
anomaly_score = model.predict(features)
if anomaly_score > ANOMALY_THRESHOLD:
logger.warning("Voice anomaly detected", extra={
"anomaly_score": anomaly_score,
"features": features
})
# Trigger appropriate response
escalate_call()
return anomaly_score
import logging
import json
from datetime import datetime
from typing import Dict, Any
class VoiceSystemLogger:
"""Structured logger for voice AI systems"""
def __init__(self, service_name: str):
self.service_name = service_name
self.logger = logging.getLogger(service_name)
def log_call_event(self, event_type: str, session_id: str,
call_id: str, metadata: Dict[str, Any]):
"""Log call-related events"""
log_entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"service": self.service_name,
"event_type": event_type,
"session_id": session_id,
"call_id": call_id,
"metadata": metadata
}
self.logger.info(json.dumps(log_entry))
def log_performance_metric(self, metric_name: str, value: float,
session_id: str, metadata: Dict[str, Any] = None):
"""Log performance metrics"""
log_entry = {
"timestamp": datetime.utcnow().isoformat() + "Z",
"service": self.service_name,
"metric_name": metric_name,
"value": value,
"session_id": session_id,
"metadata": metadata or {}
}
self.logger.info(json.dumps(log_entry))
import dash
from dash import dcc, html
import plotly.graph_objs as go
from datetime import datetime, timedelta
def create_monitoring_dashboard():
"""Create real-time monitoring dashboard"""
app = dash.Dash(__name__)
app.layout = html.Div([
html.H1("Voice AI System Monitor"),
# System Health
html.Div([
html.H2("System Health"),
dcc.Graph(id="system-health"),
dcc.Interval(id="health-interval", interval=30000) # 30 seconds
]),
# Performance Metrics
html.Div([
html.H2("Performance Metrics"),
dcc.Graph(id="performance-metrics"),
dcc.Interval(id="performance-interval", interval=60000) # 1 minute
]),
# Call Volume
html.Div([
html.H2("Call Volume"),
dcc.Graph(id="call-volume"),
dcc.Interval(id="volume-interval", interval=300000) # 5 minutes
])
])
return app
Monitoring and analytics are essential for the success of any voice AI platform. They provide:
A well-implemented monitoring strategy ensures:
✅ This closes Chapter 6.
Chapter 7 will cover advanced voice AI features including emotion detection, speaker identification, and multilingual support for global call centers.