Architecture¶
Understand the layered architecture of SaaS LiteLLM and how it's built on top of LiteLLM to provide a SaaS-ready API.
Overview¶
SaaS LiteLLM is built on top of LiteLLM and uses a layered architecture that adds a SaaS wrapper around the LiteLLM proxy, enabling:
Built on LiteLLM
SaaS LiteLLM leverages LiteLLM as its foundation for unified LLM API access to 100+ providers (OpenAI, Anthropic, Google, Azure, etc.). The SaaS layer adds job-based cost tracking, multi-tenancy, and business features on top of LiteLLM's core routing capabilities.
- Job-based cost tracking - Multiple LLM calls grouped into jobs
- Team isolation - Teams never directly access LiteLLM
- Hidden complexity - Model selection and pricing completely abstracted
- Per-job cost aggregation - Track costs per business operation
- Flexible pricing - Set your own markup and pricing strategy
System Architecture¶
graph TD
A[Your SaaS Application] -->|HTTP/JSON| B[SaaS API Layer :8003]
B -->|Virtual Keys| C[LiteLLM Proxy :8002]
C -->|API Calls| D[OpenAI]
C -->|API Calls| E[Anthropic]
C -->|API Calls| F[Google & Others]
B -.Job Tracking.-> G[PostgreSQL Database]
B -.Cost Analytics.-> G
C -.Caching.-> H[Redis Cache]
I[Admin Dashboard :3002] -.Management.-> B
style A fill:#E3F2FD
style B fill:#4CAF50
style C fill:#2196F3
style G fill:#FF9800
style H fill:#F44336
style I fill:#9C27B0 Component Breakdown¶
1. Your SaaS Application¶
What it is: - Your customer-facing application (web app, mobile app, API client) - The application that your teams/customers use directly
What it does: - Makes API calls to the SaaS API layer using virtual keys - Implements your business logic and user interface - Never interacts with LiteLLM directly
Example:
# Your application code
response = requests.post(
"https://your-saas-api.com/api/jobs/{job_id}/llm-call",
headers={"Authorization": "Bearer sk-your-virtual-key"},
json={"messages": [{"role": "user", "content": "Analyze..."}]}
)
2. SaaS API Layer (Port 8003)¶
What it is: - FastAPI application that wraps LiteLLM - The layer that provides job-based endpoints - This is what you expose to your teams
What it does: - Authenticates requests using virtual keys - Creates and manages jobs for cost tracking - Proxies LLM calls to LiteLLM with job context - Aggregates costs per job in PostgreSQL - Enforces team budgets and access controls - Provides usage analytics
Key Endpoints: - POST /api/jobs/create - Create a new job - POST /api/jobs/{job_id}/llm-call - Make an LLM call - POST /api/jobs/{job_id}/llm-call-stream - Streaming LLM call - POST /api/jobs/{job_id}/complete - Complete a job - GET /api/teams/{team_id}/usage - Get team usage stats
3. LiteLLM Proxy (Port 8002)¶
What it is: - Standard LiteLLM proxy server (the foundation of SaaS LiteLLM) - Handles actual LLM routing to 100+ providers - This is internal only - never exposed to teams
What it does: - Routes requests to appropriate LLM providers (OpenAI, Anthropic, Google, Azure, AWS Bedrock, etc.) - Provides unified OpenAI-compatible API across all providers - Manages rate limiting per team (TPM/RPM limits) - Handles caching in Redis for cost savings - Manages fallbacks and retries - Tracks usage in LiteLLM's own database tables - Load balances across multiple models/providers
Why it's hidden: - Teams don't need to understand LiteLLM configuration - Model selection is abstracted away - Pricing is completely hidden - You maintain full control over infrastructure - Provider switching is transparent to teams
4. PostgreSQL Database¶
What it stores:
Your SaaS Tables: - jobs - Job metadata and status - llm_calls - Individual LLM calls per job - job_cost_summaries - Aggregated costs per job - organizations - Organization management - teams - Team management with credit allocation - model_access_groups - Control which teams access which models - model_aliases - Model configuration and pricing - credit_transactions - Credit history and transactions
LiteLLM Tables (auto-created by LiteLLM): - LiteLLM_VerificationToken - Virtual keys - LiteLLM_UserTable - LiteLLM users - LiteLLM_TeamTable - LiteLLM teams - LiteLLM_SpendLogs - Usage tracking
Benefits: - Historical tracking of all jobs and calls - Cost analytics per team, organization, or job type - Credit transaction history - Usage reporting and insights
5. Redis Cache¶
What it does: - Caches LLM responses for identical requests - Reduces costs by avoiding duplicate API calls - Improves latency for repeated queries
Configuration: - Configurable TTL per model - Automatic cache key generation based on request - Transparent to your application code
6. Admin Dashboard (Port 3002)¶
What it is: - Next.js application for platform management - Web UI for administrators
What it does: - Create and manage organizations - Create and manage teams - Configure model access groups - Allocate credits to teams - Suspend/resume teams - Monitor usage and costs - View analytics and reports
Who uses it: - Platform administrators - Finance/billing teams - Customer support
Data Flow¶
Let's trace a typical request through the system:
sequenceDiagram
participant App as Your SaaS App
participant API as SaaS API :8003
participant DB as PostgreSQL
participant LLM as LiteLLM :8002
participant Provider as OpenAI/Anthropic
participant Cache as Redis
App->>API: 1. POST /api/jobs/create
API->>DB: Create job record
DB-->>API: job_id
API-->>App: Return job_id
App->>API: 2. POST /api/jobs/{job_id}/llm-call
API->>DB: Update job status to "in_progress"
API->>LLM: Forward request with virtual key
LLM->>Cache: Check cache
alt Cache hit
Cache-->>LLM: Return cached response
else Cache miss
LLM->>Provider: Make API call
Provider-->>LLM: Return response
LLM->>Cache: Store in cache
end
LLM-->>API: Return response + cost metadata
API->>DB: Record llm_call with costs
API-->>App: Return response (no cost info)
App->>API: 3. POST /api/jobs/{job_id}/complete
API->>DB: Update job status to "completed"
API->>DB: Aggregate costs for job
DB-->>API: Total costs
API-->>App: Return job summary Step-by-Step Flow¶
- Create Job
- Your app creates a job for tracking
- SaaS API stores job in PostgreSQL
-
Returns job_id to your app
-
Make LLM Calls
- Your app makes one or more LLM calls using the job_id
- SaaS API forwards to LiteLLM with virtual key
- LiteLLM checks Redis cache first
- If not cached, calls the provider (OpenAI, Anthropic, etc.)
- Response returned to your app (without cost info)
-
Cost metadata stored in PostgreSQL
-
Complete Job
- Your app marks the job as completed
- SaaS API aggregates all costs for the job
- Returns summary (for internal use only)
Why This Architecture?¶
🎯 Complete Abstraction¶
Problem: Teams shouldn't need to understand LiteLLM, models, tokens, or pricing.
Solution: The SaaS API layer provides a simple, business-oriented interface. Teams only see jobs and calls, not models or costs.
💰 Job-Based Cost Tracking¶
Problem: A single business operation (like "analyze document") often requires multiple LLM calls. How do you track total cost?
Solution: Group related LLM calls into jobs. Track aggregate cost per job, not per API call.
Example:
# One job = Multiple LLM calls = One aggregated cost
job_id = create_job("document_analysis")
extract_text(job_id) # Call 1: $0.005
classify_content(job_id) # Call 2: $0.008
generate_summary(job_id) # Call 3: $0.010
complete_job(job_id)
# Total cost: $0.023
# You charge: $0.10 (flat rate)
# Your profit: $0.077
🔒 Team Isolation¶
Problem: How do you manage multiple teams with different budgets and access levels?
Solution: Organizations → Teams → Model Access Groups hierarchy with credit allocation and budget controls.
💵 Flexible Pricing¶
Problem: Your pricing strategy might not match actual LLM costs (e.g., you want flat-rate pricing).
Solution: Actual costs are tracked internally but never exposed. You set your own pricing strategy.
Pricing Options: - Flat rate per job - "\(0.10 per document analysis" - **Tiered pricing** - "\)0.05 for first 100 jobs, $0.03 after" - Markup pricing - "Actual cost + 30% markup" - Subscription - "Unlimited jobs for $99/month"
📊 Usage Analytics¶
Problem: You need to understand which workflows are expensive, which teams use the most, and where to optimize.
Solution: Detailed analytics per team, organization, job type, and time period.
🛡️ Budget Protection¶
Problem: Teams could accidentally run up huge costs.
Solution: Credit allocation with suspend/pause capabilities. Teams can't exceed their budget.
Scalability Considerations¶
Horizontal Scaling¶
SaaS API Layer: - Stateless FastAPI application - Can run multiple instances behind a load balancer - Each instance connects to same PostgreSQL and Redis
LiteLLM Proxy: - Also stateless - Can run multiple instances for high availability - Shares PostgreSQL and Redis for coordination
Database Optimization¶
PostgreSQL: - Index on team_id, job_id, created_at for fast queries - Partitioning for large llm_calls table - Read replicas for analytics queries
Redis: - Separate cache instances for different regions - Eviction policy: LRU (Least Recently Used) - Monitoring for cache hit rates
Cost Optimization¶
- Redis caching reduces duplicate API calls
- Model fallbacks use cheaper models when appropriate
- Rate limiting prevents runaway costs
- Budget controls enforce hard limits
Security Model¶
API Keys and Authentication¶
| Layer | Authentication | Purpose |
|---|---|---|
| Your App → SaaS API | Virtual keys (Bearer token) | Team authentication |
| SaaS API → LiteLLM | Virtual keys (internal) | Internal routing |
| LiteLLM → Providers | Provider API keys | Provider authentication |
| Admin → Dashboard | Your auth system | Admin access |
Data Isolation¶
- All queries filtered by
team_id - Job IDs are UUIDs (non-guessable)
- LiteLLM master key never exposed to teams
- Model pricing completely abstracted
- Internal tables (costs, models) never exposed via API
Access Controls¶
- Model Access Groups - Control which teams can use which models
- Credit Limits - Teams can't exceed allocated credits
- Rate Limits - TPM/RPM limits per team
- Suspend/Pause - Disable teams if needed
Deployment Modes¶
Local Development¶
- SaaS API on port 8003
- LiteLLM on port 8002
- PostgreSQL on port 5432
- Redis on port 6380
- Admin Dashboard on port 3002
Production (Railway)¶
- SaaS API on Railway with custom domain
- LiteLLM as internal Railway service
- PostgreSQL as Railway addon
- Redis as Railway addon
- Admin Dashboard on Vercel or Railway
Docker Compose¶
- All services in Docker containers
- Easy orchestration and scaling
- Volume mounts for persistence
Key Benefits¶
✅ Teams never see LiteLLM - Complete abstraction layer
✅ Job-based cost tracking - True cost per business operation
✅ Model flexibility - Change models without affecting clients
✅ Pricing control - Set your own markup and pricing strategy
✅ Usage analytics - Detailed insights per team and job type
✅ Budget protection - Prevent runaway costs with credit limits
✅ Multi-call jobs - Track related LLM calls as a single unit
✅ Streaming support - Server-Sent Events for real-time responses
✅ Caching - Automatic response caching for cost savings
✅ Rate limiting - Per-team TPM/RPM limits
Next Steps¶
Now that you understand the architecture:
- Complete the Installation - Get all services running
- Set Up Admin Dashboard - Create teams and organizations
- Learn Integration Patterns - Integrate into your app
- Try Examples - Run working code
Additional Resources¶
- Database Schema - Detailed table schemas
- Credit System - How credits work
- Model Resolution - How models are selected
- Streaming Architecture - SSE implementation details