Monitoring & Analytics¶
Monitor system health, track usage metrics, analyze credit consumption, and set up alerts for your SaaS LiteLLM platform.
Overview¶
The monitoring system provides comprehensive insights into:
- System health and performance
- Usage metrics and analytics
- Credit consumption tracking
- Cost analysis
- Team activity monitoring
- Performance metrics
- Alert notifications
Dashboard Overview¶
Access the Monitoring Dashboard¶
Local Development:
Production:
Key Metrics at a Glance¶
The monitoring dashboard displays:
- Platform Health
- API uptime status
- Database connection health
- LiteLLM proxy status
-
Average response times
-
Usage Statistics
- Total jobs today/this month
- Active teams count
- Total API calls
-
Token consumption
-
Financial Metrics
- Total costs (USD)
- Credits allocated vs. used
- Cost per team breakdown
-
Revenue analytics
-
Performance Indicators
- Average latency
- Success/failure rates
- Model usage distribution
- Error rates by type
System Health Monitoring¶
Health Check Endpoint¶
The SaaS API provides a health check endpoint:
Response:
Component Health Checks¶
Monitor all system components:
# health_check.py
import httpx
from sqlalchemy import create_engine, text
async def check_system_health():
"""
Comprehensive health check for all system components
"""
health_status = {
"saas_api": False,
"litellm_proxy": False,
"database": False,
"overall": "unhealthy"
}
# 1. Check SaaS API
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get("http://localhost:8003/health")
health_status["saas_api"] = response.status_code == 200
except Exception as e:
print(f"SaaS API check failed: {e}")
# 2. Check LiteLLM Proxy
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get("http://localhost:8002/health")
health_status["litellm_proxy"] = response.status_code == 200
except Exception as e:
print(f"LiteLLM Proxy check failed: {e}")
# 3. Check Database
try:
engine = create_engine("postgresql://...")
with engine.connect() as conn:
result = conn.execute(text("SELECT 1"))
health_status["database"] = result.fetchone()[0] == 1
except Exception as e:
print(f"Database check failed: {e}")
# Overall health
if all([health_status["saas_api"], health_status["litellm_proxy"], health_status["database"]]):
health_status["overall"] = "healthy"
elif any([health_status["saas_api"], health_status["litellm_proxy"], health_status["database"]]):
health_status["overall"] = "degraded"
return health_status
# Usage
health = await check_system_health()
print(health)
Schedule health checks:
Real-Time Health Dashboard¶
// components/HealthDashboard.jsx
import { useEffect, useState } from 'react';
export function HealthDashboard() {
const [health, setHealth] = useState(null);
useEffect(() => {
const checkHealth = async () => {
const response = await fetch('/api/health-check');
const data = await response.json();
setHealth(data);
};
// Check health every 30 seconds
checkHealth();
const interval = setInterval(checkHealth, 30000);
return () => clearInterval(interval);
}, []);
return (
<div className="health-dashboard">
<ServiceStatus name="SaaS API" status={health?.saas_api} />
<ServiceStatus name="LiteLLM Proxy" status={health?.litellm_proxy} />
<ServiceStatus name="Database" status={health?.database} />
<OverallHealth status={health?.overall} />
</div>
);
}
Usage Metrics & Analytics¶
Team Usage Summary¶
Get comprehensive usage statistics for any team:
API Endpoint:
Parameters: - period: Format "YYYY-MM" (e.g., "2025-10") or "YYYY-MM-DD"
Example Request:
curl -X GET "http://localhost:8003/api/teams/team_abc123/usage?period=2025-10" \
-H "Authorization: Bearer sk-team-key-abc123"
Response:
{
"team_id": "team_abc123",
"period": "2025-10",
"summary": {
"total_jobs": 1234,
"successful_jobs": 1180,
"failed_jobs": 54,
"total_cost_usd": 156.78,
"total_tokens": 1250000,
"avg_cost_per_job": 0.1270
},
"job_types": {
"document_analysis": {
"count": 456,
"cost_usd": 67.89
},
"content_generation": {
"count": 378,
"cost_usd": 45.23
},
"data_extraction": {
"count": 400,
"cost_usd": 43.66
}
}
}
Organization-Wide Usage¶
Track usage across all teams in an organization:
API Endpoint:
Example Request:
Response:
{
"organization_id": "org_xyz789",
"period": "2025-10",
"summary": {
"total_jobs": 5678,
"completed_jobs": 5432,
"failed_jobs": 246,
"credits_used": 5432,
"total_cost_usd": 678.90,
"total_tokens": 5600000
},
"teams": {
"team_abc123": {
"jobs": 1234,
"credits_used": 1180
},
"team_def456": {
"jobs": 2345,
"credits_used": 2301
},
"team_ghi789": {
"jobs": 2099,
"credits_used": 1951
}
}
}
Database Queries for Analytics¶
The system uses several tables for tracking usage:
1. Jobs Table - Individual job tracking
-- Get job statistics for the current month
SELECT
job_type,
status,
COUNT(*) as job_count,
AVG(EXTRACT(EPOCH FROM (completed_at - created_at))) as avg_duration_seconds
FROM jobs
WHERE created_at >= DATE_TRUNC('month', CURRENT_DATE)
GROUP BY job_type, status
ORDER BY job_count DESC;
2. LLM Calls Table - Individual API call tracking
-- Get model usage statistics
SELECT
model_group_used,
resolved_model,
COUNT(*) as call_count,
SUM(total_tokens) as total_tokens,
SUM(cost_usd) as total_cost_usd,
AVG(latency_ms) as avg_latency_ms
FROM llm_calls
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY model_group_used, resolved_model
ORDER BY call_count DESC;
3. Job Cost Summaries Table - Aggregated job costs
-- Get cost summary for all jobs
SELECT
j.team_id,
j.organization_id,
COUNT(DISTINCT j.job_id) as total_jobs,
SUM(jcs.total_calls) as total_api_calls,
SUM(jcs.total_tokens) as total_tokens,
SUM(jcs.total_cost_usd) as total_cost_usd,
AVG(jcs.avg_latency_ms) as avg_latency_ms
FROM jobs j
JOIN job_cost_summaries jcs ON j.job_id = jcs.job_id
WHERE j.created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY j.team_id, j.organization_id
ORDER BY total_cost_usd DESC;
4. Team Usage Summaries Table - Pre-calculated analytics
-- Get monthly usage summaries for all teams
SELECT
team_id,
period,
period_type,
total_jobs,
successful_jobs,
failed_jobs,
total_cost_usd,
total_tokens,
job_type_breakdown
FROM team_usage_summaries
WHERE period_type = 'monthly'
AND period >= TO_CHAR(CURRENT_DATE - INTERVAL '6 months', 'YYYY-MM')
ORDER BY period DESC, total_cost_usd DESC;
Usage Analytics Dashboard¶
// pages/monitoring/usage.jsx
import { BarChart, LineChart } from '@/components/Charts';
export default function UsageAnalytics() {
const [period, setPeriod] = useState('2025-10');
const [data, setData] = useState(null);
useEffect(() => {
// Fetch usage data
fetch(`/api/analytics/usage?period=${period}`)
.then(res => res.json())
.then(data => setData(data));
}, [period]);
return (
<div className="usage-analytics">
<h1>Usage Analytics - {period}</h1>
<div className="stats-grid">
<StatCard
title="Total Jobs"
value={data?.total_jobs}
trend={data?.jobs_trend}
/>
<StatCard
title="Success Rate"
value={`${data?.success_rate}%`}
trend={data?.success_trend}
/>
<StatCard
title="Total Cost"
value={`$${data?.total_cost}`}
trend={data?.cost_trend}
/>
<StatCard
title="Avg Latency"
value={`${data?.avg_latency}ms`}
trend={data?.latency_trend}
/>
</div>
<BarChart
title="Jobs by Type"
data={data?.job_types}
xAxis="job_type"
yAxis="count"
/>
<LineChart
title="Daily Usage Trend"
data={data?.daily_usage}
xAxis="date"
yAxis="jobs"
/>
<TeamUsageTable teams={data?.teams} />
</div>
);
}
Credit Consumption Tracking¶
Credit Balance Monitoring¶
Check credit balance for any team:
API Endpoint:
Response:
{
"team_id": "team_abc123",
"organization_id": "org_xyz789",
"credits_allocated": 1000,
"credits_used": 245,
"credits_remaining": 755,
"credit_limit": 1000,
"auto_refill": false,
"refill_amount": null,
"refill_period": null,
"created_at": "2025-10-01T00:00:00Z",
"updated_at": "2025-10-15T14:30:00Z"
}
Credit Transaction History¶
Track all credit transactions:
API Endpoint:
Response:
{
"team_id": "team_abc123",
"total": 15,
"transactions": [
{
"transaction_id": "550e8400-e29b-41d4-a716-446655440000",
"team_id": "team_abc123",
"organization_id": "org_xyz789",
"job_id": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"transaction_type": "deduction",
"credits_amount": -1,
"credits_before": 756,
"credits_after": 755,
"reason": "Job document_analysis completed successfully",
"created_at": "2025-10-15T14:30:00Z"
},
{
"transaction_id": "660e8400-e29b-41d4-a716-446655440001",
"team_id": "team_abc123",
"organization_id": "org_xyz789",
"job_id": null,
"transaction_type": "allocation",
"credits_amount": 500,
"credits_before": 256,
"credits_after": 756,
"reason": "Monthly credit allocation",
"created_at": "2025-10-01T00:00:00Z"
}
]
}
Credit Consumption Analytics¶
Query credit usage patterns:
-- Daily credit consumption trend
SELECT
DATE(created_at) as date,
SUM(CASE WHEN transaction_type = 'deduction' THEN ABS(credits_amount) ELSE 0 END) as credits_used,
SUM(CASE WHEN transaction_type = 'allocation' THEN credits_amount ELSE 0 END) as credits_added
FROM credit_transactions
WHERE team_id = 'team_abc123'
AND created_at >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;
-- Top teams by credit consumption
SELECT
tc.team_id,
tc.organization_id,
tc.credits_allocated,
tc.credits_used,
tc.credits_remaining,
ROUND((tc.credits_used::float / NULLIF(tc.credits_allocated, 0) * 100), 2) as usage_percentage
FROM team_credits tc
WHERE tc.credits_allocated > 0
ORDER BY usage_percentage DESC
LIMIT 20;
Low Credit Alerts¶
Monitor teams approaching credit exhaustion:
# services/credit_alerts.py
from sqlalchemy import and_
from models.credits import TeamCredits
from models.organizations import Organization
def check_low_credit_teams(db: Session, threshold_percent: float = 20.0):
"""
Find teams with credits below threshold percentage
"""
teams = db.query(TeamCredits).filter(
and_(
TeamCredits.credits_allocated > 0,
(TeamCredits.credits_remaining / TeamCredits.credits_allocated * 100) <= threshold_percent
)
).all()
alerts = []
for team in teams:
org = db.query(Organization).filter(
Organization.organization_id == team.organization_id
).first()
alerts.append({
"team_id": team.team_id,
"organization_id": team.organization_id,
"organization_name": org.name if org else "Unknown",
"credits_remaining": team.credits_remaining,
"credits_allocated": team.credits_allocated,
"usage_percent": round((team.credits_used / team.credits_allocated * 100), 2)
})
return alerts
# Usage
low_credit_teams = check_low_credit_teams(db, threshold_percent=10.0)
for team in low_credit_teams:
send_alert(team)
Performance Metrics¶
Latency Tracking¶
Monitor API response times:
-- Average latency by model group
SELECT
model_group_used,
COUNT(*) as call_count,
AVG(latency_ms) as avg_latency_ms,
MIN(latency_ms) as min_latency_ms,
MAX(latency_ms) as max_latency_ms,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY latency_ms) as p50_latency_ms,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY latency_ms) as p95_latency_ms,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY latency_ms) as p99_latency_ms
FROM llm_calls
WHERE created_at >= CURRENT_DATE - INTERVAL '24 hours'
AND latency_ms IS NOT NULL
GROUP BY model_group_used
ORDER BY avg_latency_ms DESC;
Error Rate Monitoring¶
Track failure rates:
-- Error rates by job type
SELECT
job_type,
COUNT(*) as total_jobs,
SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END) as failed_jobs,
ROUND((SUM(CASE WHEN status = 'failed' THEN 1 ELSE 0 END)::float / COUNT(*) * 100), 2) as error_rate
FROM jobs
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY job_type
ORDER BY error_rate DESC;
Success Rate Dashboard¶
// components/PerformanceMetrics.jsx
export function PerformanceMetrics({ data }) {
return (
<div className="performance-metrics">
<MetricCard
title="Success Rate"
value={`${data.success_rate}%`}
description="Jobs completed successfully"
trend={data.success_trend}
/>
<MetricCard
title="Average Latency"
value={`${data.avg_latency}ms`}
description="API response time (P95)"
trend={data.latency_trend}
/>
<MetricCard
title="Error Rate"
value={`${data.error_rate}%`}
description="Failed requests"
trend={data.error_trend}
/>
<MetricCard
title="Throughput"
value={`${data.requests_per_minute}/min`}
description="Requests per minute"
trend={data.throughput_trend}
/>
</div>
);
}
Alerts & Notifications¶
Alert Types¶
1. System Alerts - Service downtime - Database connection failures - High error rates - Performance degradation
2. Usage Alerts - Unusual traffic spikes - Team approaching credit limit - Zero credit remaining - High cost anomalies
3. Security Alerts - Multiple failed authentication attempts - Suspicious API usage patterns - Rate limit violations - Unauthorized access attempts
Alert Configuration¶
# config/alerts.py
ALERT_THRESHOLDS = {
"credit_low": 10, # Percentage
"credit_critical": 0, # Credits remaining
"error_rate_high": 5.0, # Percentage
"latency_high": 5000, # Milliseconds
"cost_spike": 200.0, # Percent increase
}
ALERT_CHANNELS = {
"email": ["admin@company.com", "ops@company.com"],
"slack": "https://hooks.slack.com/services/...",
"webhook": "https://your-monitoring-system.com/alerts"
}
Alert Implementation¶
# services/alert_manager.py
import httpx
from typing import Dict, Any
class AlertManager:
def __init__(self, config: Dict[str, Any]):
self.config = config
async def send_alert(self, alert_type: str, message: str, severity: str = "warning"):
"""
Send alert to configured channels
"""
alert_data = {
"type": alert_type,
"message": message,
"severity": severity,
"timestamp": datetime.utcnow().isoformat()
}
# Send to email
if "email" in self.config:
await self._send_email(alert_data)
# Send to Slack
if "slack" in self.config:
await self._send_slack(alert_data)
# Send to webhook
if "webhook" in self.config:
await self._send_webhook(alert_data)
async def _send_slack(self, alert_data: Dict[str, Any]):
"""Send alert to Slack"""
webhook_url = self.config["slack"]
payload = {
"text": f"🚨 {alert_data['severity'].upper()}: {alert_data['message']}",
"attachments": [
{
"color": self._get_color(alert_data['severity']),
"fields": [
{"title": "Type", "value": alert_data['type'], "short": True},
{"title": "Time", "value": alert_data['timestamp'], "short": True}
]
}
]
}
async with httpx.AsyncClient() as client:
await client.post(webhook_url, json=payload)
def _get_color(self, severity: str) -> str:
colors = {
"info": "#36a64f",
"warning": "#ff9800",
"error": "#f44336",
"critical": "#9c27b0"
}
return colors.get(severity, "#808080")
# Usage
alert_manager = AlertManager(ALERT_CHANNELS)
# Low credit alert
if team.credits_remaining <= ALERT_THRESHOLDS["credit_critical"]:
await alert_manager.send_alert(
alert_type="credit_critical",
message=f"Team {team.team_id} has {team.credits_remaining} credits remaining",
severity="critical"
)
Scheduled Alert Checks¶
# scripts/check_alerts.py
import asyncio
from services.alert_manager import AlertManager, ALERT_THRESHOLDS
async def check_all_alerts():
"""
Run all alert checks
"""
alert_manager = AlertManager(ALERT_CHANNELS)
# Check low credits
low_credit_teams = check_low_credit_teams(db, ALERT_THRESHOLDS["credit_low"])
for team in low_credit_teams:
await alert_manager.send_alert(
"credit_low",
f"Team {team['team_id']} has {team['credits_remaining']} credits ({team['usage_percent']}% used)",
"warning"
)
# Check high error rates
error_stats = check_error_rates(db)
for stat in error_stats:
if stat['error_rate'] > ALERT_THRESHOLDS["error_rate_high"]:
await alert_manager.send_alert(
"error_rate_high",
f"Job type {stat['job_type']} has {stat['error_rate']}% error rate",
"error"
)
# Check high latency
latency_stats = check_latency(db)
for stat in latency_stats:
if stat['p95_latency'] > ALERT_THRESHOLDS["latency_high"]:
await alert_manager.send_alert(
"latency_high",
f"Model group {stat['model_group']} has {stat['p95_latency']}ms P95 latency",
"warning"
)
if __name__ == "__main__":
asyncio.run(check_all_alerts())
Schedule via cron:
Monitoring Best Practices¶
1. Set Up Comprehensive Logging¶
# config/logging.py
import logging
from logging.handlers import RotatingFileHandler
def setup_logging():
logger = logging.getLogger('saas_llm')
logger.setLevel(logging.INFO)
# File handler with rotation
handler = RotatingFileHandler(
'logs/saas_llm.log',
maxBytes=10485760, # 10MB
backupCount=10
)
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
2. Track Key Business Metrics¶
-- Create a view for quick dashboard queries
CREATE VIEW monitoring_dashboard AS
SELECT
COUNT(DISTINCT j.team_id) as active_teams,
COUNT(j.job_id) as total_jobs_today,
SUM(CASE WHEN j.status = 'completed' THEN 1 ELSE 0 END) as successful_jobs_today,
SUM(CASE WHEN j.status = 'failed' THEN 1 ELSE 0 END) as failed_jobs_today,
SUM(jcs.total_cost_usd) as total_cost_today,
AVG(jcs.avg_latency_ms) as avg_latency_today
FROM jobs j
LEFT JOIN job_cost_summaries jcs ON j.job_id = jcs.job_id
WHERE j.created_at >= CURRENT_DATE;
3. Implement Real-Time Dashboards¶
Use WebSockets for live updates:
// Real-time monitoring
const ws = new WebSocket('ws://localhost:8003/ws/monitoring');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
// Update dashboard in real-time
updateMetrics(data);
};
4. Regular Performance Reviews¶
Schedule weekly/monthly performance reviews:
-- Weekly performance report
SELECT
DATE_TRUNC('week', j.created_at) as week,
COUNT(*) as total_jobs,
AVG(jcs.total_cost_usd) as avg_cost_per_job,
AVG(jcs.avg_latency_ms) as avg_latency,
SUM(CASE WHEN j.status = 'failed' THEN 1 ELSE 0 END)::float / COUNT(*) * 100 as error_rate
FROM jobs j
JOIN job_cost_summaries jcs ON j.job_id = jcs.job_id
WHERE j.created_at >= CURRENT_DATE - INTERVAL '12 weeks'
GROUP BY week
ORDER BY week DESC;
Troubleshooting¶
High Latency Issues¶
Problem: Average latency above 2000ms
Investigation:
-- Find slow API calls
SELECT
call_id,
job_id,
model_group_used,
resolved_model,
latency_ms,
total_tokens,
created_at
FROM llm_calls
WHERE latency_ms > 5000
AND created_at >= CURRENT_DATE - INTERVAL '24 hours'
ORDER BY latency_ms DESC
LIMIT 50;
Solutions: 1. Check LiteLLM proxy logs 2. Verify model provider status 3. Consider adding timeout configurations 4. Review model selection strategy
Missing Usage Data¶
Problem: Usage summaries not updating
Solution: Manually recalculate summaries
# scripts/recalculate_usage.py
from services.usage_calculator import calculate_team_usage
def recalculate_all_usage(period: str):
teams = db.query(TeamCredits).all()
for team in teams:
summary = calculate_team_usage(
db,
team_id=team.team_id,
period=period
)
# Store summary
db.merge(summary)
db.commit()
# Usage
recalculate_all_usage("2025-10")
Next Steps¶
Now that you understand monitoring:
- Set Up Alerts - Configure notifications
- Analyze Costs - Deep dive into costs
- Optimize Performance - Improve latency
Additional Resources¶
- API Reference - Monitoring endpoints
- Database Schema - Tracking tables
- Performance Tuning - Optimization guide