Introduction to SaaS LiteLLM¶
What is SaaS LiteLLM?¶
SaaS LiteLLM is a production-ready platform built on top of LiteLLM specifically designed for multi-tenant SaaS applications. It provides a complete abstraction layer over LiteLLM with job-based cost tracking, allowing you to build LLM-powered SaaS products without exposing infrastructure complexity to your customers.
Built on LiteLLM
SaaS LiteLLM is built on top of LiteLLM, which provides unified API access to 100+ LLM providers including OpenAI, Anthropic, Google, Azure, Cohere, and many more. LiteLLM handles the provider routing, while SaaS LiteLLM adds the SaaS-ready features on top.
SaaS LiteLLM wraps the LiteLLM proxy with a SaaS-oriented API layer that:
- Groups multiple LLM calls into jobs for better cost tracking
- Completely hides models, pricing, and infrastructure from your teams
- Provides per-team isolation with independent budgets and access controls
- Enables flexible pricing strategies while tracking actual provider costs
- Includes an admin dashboard for managing teams and model access
Why Use SaaS LiteLLM?¶
The Problem¶
If you're building a SaaS application that uses LLMs, you face several challenges:
- Cost Attribution - How do you track costs per customer or per business operation when a single workflow makes multiple LLM calls?
- Pricing Strategy - How do you charge customers without exposing your actual LLM costs?
- Multi-Tenancy - How do you isolate teams with different budgets and access levels?
- Complexity - Teams shouldn't need to understand models, tokens, or LiteLLM configuration
The Solution¶
SaaS LiteLLM solves these problems by:
- Job-Based Tracking - Group related LLM calls into jobs (e.g., "document_analysis", "chat_session") and track aggregate costs
- Cost Abstraction - Teams never see actual costs or models - you can implement any pricing strategy
- Built-in Multi-Tenancy - Organizations, teams, model access groups, and credit allocation
- Simple API - Clean, business-oriented API instead of raw LLM endpoints
Key Features¶
🎯 SaaS-Ready Architecture¶
- Job-Based Workflow - Create a job, make multiple LLM calls, complete the job, get aggregated costs
- Hidden Complexity - Teams interact with your SaaS API, never seeing LiteLLM, models, or pricing
- Cost Aggregation - Track true costs per business operation, not per API call
- Usage Analytics - Detailed insights per team, organization, and job type
💰 Business Features¶
- Cost Transparency - See actual LiteLLM costs vs. what you charge customers
- Flexible Pricing - Implement flat rate, tiered, markup-based, or custom pricing
- Budget Controls - Per-team credit allocation with suspend/pause capabilities
- Profit Tracking - Calculate margins per job, team, or organization
🔧 Technical Features¶
- Multi-Tenant - Organizations → Teams → Model Access Groups architecture
- Model Access Control - Control which teams can access which models via access groups
- Virtual Keys - Automatic virtual key generation (completely hidden from teams)
- Multiple Providers - Support for OpenAI, Anthropic, Google, and 100+ models via LiteLLM
- Streaming Support - Server-Sent Events (SSE) for real-time streaming responses
- Redis Caching - Automatic response caching for cost savings and performance
- Rate Limiting - Per-team TPM/RPM limits
- Admin Dashboard - Next.js dashboard for managing the platform
- Type Safety - Pydantic models throughout for request/response validation
- Production Ready - Deploy to Railway with Docker in minutes
Architecture Overview¶
SaaS LiteLLM uses a layered architecture that abstracts LiteLLM behind your SaaS API:
graph TD
A[Your SaaS Application] --> B[SaaS API :8003]
B --> C[LiteLLM Proxy :8002]
C --> D[PostgreSQL Database]
C --> E[Redis Cache]
C --> F[OpenAI]
C --> G[Anthropic]
C --> H[Other Providers]
B -.Job Tracking.-> D
style B fill:#4CAF50
style C fill:#2196F3
style D fill:#FF9800
style E fill:#F44336 Component Breakdown¶
1. Your SaaS Application¶
- Your customer-facing application (web app, mobile app, etc.)
- Makes API calls to the SaaS API layer
- Teams never see LiteLLM or models directly
2. SaaS API Layer (Port 8003)¶
- FastAPI application that wraps LiteLLM
- Provides job-based endpoints:
/api/jobs/create,/api/jobs/{id}/llm-call, etc. - Handles authentication, team isolation, and cost tracking
- This is what you expose to your teams
3. LiteLLM Proxy (Port 8002)¶
- Standard LiteLLM proxy server
- Handles actual LLM routing to providers (OpenAI, Anthropic, etc.)
- Manages virtual keys, rate limiting, and caching
- This is internal only - never exposed to teams
4. PostgreSQL Database¶
- Stores jobs, LLM calls, teams, organizations, and usage data
- Provides cost aggregation and analytics
- Enables historical tracking and reporting
5. Redis Cache¶
- Caches LLM responses for identical requests
- Reduces costs and improves latency
- Configurable TTL per model
6. Admin Dashboard (Port 3002)¶
- Next.js application for platform management
- Create organizations, teams, model access groups
- Allocate credits, suspend/resume teams
- Monitor usage and costs
Use Cases¶
SaaS LiteLLM is perfect for these scenarios:
Document Processing SaaS¶
- Job: Document analysis workflow
- LLM Calls: Extract text → Summarize → Classify → Generate insights
- Benefit: Track total cost per document, not per API call
Chat Application¶
- Job: Chat session (conversation with context)
- LLM Calls: Multiple messages in a conversation
- Benefit: Track cost per session, charge per conversation
Data Extraction Platform¶
- Job: Extract structured data from unstructured text
- LLM Calls: Parse → Validate → Transform → Enrich
- Benefit: Flat-rate pricing regardless of text length
AI Writing Assistant¶
- Job: Content generation task
- LLM Calls: Research → Outline → Write → Edit → Polish
- Benefit: Predictable pricing per content piece
API Translation Service¶
- Job: Multi-language translation task
- LLM Calls: One call per language
- Benefit: Track cost per translation job
How It Works¶
Here's a simple workflow:
import requests
API = "http://localhost:8003/api"
# 1. Create job for tracking
job = requests.post(f"{API}/jobs/create", json={
"team_id": "acme-corp",
"job_type": "document_analysis",
"metadata": {"document_id": "doc_123"}
}).json()
job_id = job["job_id"]
# 2. Make LLM calls within the job
response = requests.post(f"{API}/jobs/{job_id}/llm-call", json={
"messages": [
{"role": "user", "content": "Analyze this document..."}
]
}).json()
# 3. Complete job and get costs
result = requests.post(f"{API}/jobs/{job_id}/complete", json={
"status": "completed"
}).json()
# Internal tracking shows:
# - actual_cost_usd: $0.0234
# - You can charge: $0.10 (flat rate)
# - Your profit: $0.0766
Key Points: - Teams only see your SaaS API, never LiteLLM - No model names, token counts, or costs exposed - You control pricing strategy completely - All costs tracked per job automatically
What is LiteLLM?¶
LiteLLM is an open-source library that provides a unified interface to 100+ LLM providers. It standardizes the API across different providers so you can easily switch between:
- OpenAI (GPT-4, GPT-3.5-turbo, etc.)
- Anthropic (Claude 3 Opus, Sonnet, Haiku, etc.)
- Google (Gemini Pro, PaLM, etc.)
- Azure OpenAI Service
- AWS Bedrock (Claude, Llama, etc.)
- Cohere, Replicate, Hugging Face, Together AI
- And 95+ more providers
What LiteLLM Provides:
- ✅ Unified API format (OpenAI-compatible)
- ✅ Provider-specific authentication handling
- ✅ Rate limiting and automatic retries
- ✅ Fallback routing between models
- ✅ Cost tracking per API call
- ✅ Response caching
- ✅ Load balancing across providers
What SaaS LiteLLM Adds on Top¶
SaaS LiteLLM takes LiteLLM's powerful routing capabilities and adds a complete SaaS-ready layer:
✅ Job-Based Cost Tracking - Group multiple LLM calls into business operations (e.g., "document_analysis"), not individual API calls
✅ Multi-Tenant Architecture - Full organization → teams → model access groups hierarchy with credit allocation
✅ Simplified Billing - Charge 1 credit per job instead of tracking tokens per call
✅ Team Isolation - Completely hide models, pricing, and infrastructure from your teams
✅ Admin Dashboard - Web UI for managing teams, credits, model access, and monitoring usage
✅ SaaS API Layer - Clean REST API designed for customer-facing applications
Comparison with Standard LiteLLM¶
| Feature | Standard LiteLLM | SaaS LiteLLM (Built on LiteLLM) |
|---|---|---|
| Foundation | Core routing library | LiteLLM + SaaS wrapper |
| API Style | Raw LLM endpoints | Job-based workflow |
| Cost Tracking | Per API call | Per business operation (job) |
| Team Visibility | See models, costs | Hidden - abstracted away |
| Pricing Model | Pass-through | Flexible - set your own |
| Multi-Tenancy | Virtual keys only | Organizations + Teams + Access Groups |
| Admin Interface | Basic UI | Full dashboard with credit management |
| Budget Controls | Rate limits | Credits, suspend/pause, budget modes |
| Billing | Token-based | Credit-based (per job) |
| Use Case | Admin/internal use | Customer-facing SaaS |
What Makes This "SaaS-Ready"?¶
- Complete Abstraction - Teams never interact with LiteLLM directly
- Business-Oriented - API designed around jobs/tasks, not models/tokens
- Cost Management - Built-in credit system with allocation and tracking
- Multi-Tenant - Full organization/team hierarchy with isolation
- Admin Tools - Dashboard for managing teams and monitoring usage
- Flexible Pricing - Decouple what you charge from what you pay
- Production Features - Streaming, caching, rate limiting, error handling
Next Steps¶
Ready to get started?
- Quickstart Guide - Get up and running in 5 minutes
- Installation Guide - Detailed setup instructions
- Architecture Deep Dive - Understand the full system design
- Integration Guide - Integrate into your app
Additional Resources¶
- Examples - Working code examples
- API Reference - Complete API documentation
- Admin Dashboard Guide - Manage your platform