Model Resolution Reference¶
Complete technical reference for the model group resolution system in SaaS LiteLLM. This system allows you to define named model groups (like "ResumeAgent", "ParsingAgent") with primary models and fallbacks, then control which teams have access to which groups.
Overview¶
The model resolution system provides:
- Named Model Groups: Abstract model names from implementation (e.g., "ResumeAgent" instead of "gpt-4-turbo")
- Primary + Fallback Models: Define fallback chains for reliability
- Access Control: Per-team permissions for model groups
- Easy Model Updates: Change underlying models without code changes
- Model Versioning: Switch between model versions centrally
Core Concepts¶
Model Groups¶
A model group is a named collection of models with a priority order:
ResumeAgent:
Priority 0 (Primary): gpt-4-turbo
Priority 1 (Fallback): gpt-4
Priority 2 (Fallback): gpt-3.5-turbo
Your application requests model_group="ResumeAgent", and the system resolves it to the actual model.
Why Model Groups?¶
Problem: Hard-coding model names creates maintenance nightmares:
# ❌ Bad - Hard-coded model names
response = litellm.completion(
model="gpt-4-turbo-2024-04-09", # What if we want to change this?
messages=messages
)
Solution: Use model groups:
# ✅ Good - Abstract model group
response = saas_api.llm_call(
job_id=job_id,
model_group="ResumeAgent", # Resolved to actual model by system
messages=messages
)
Benefits:
- Update models centrally in database (no code changes)
- Different teams can use different models for same group
- Easy A/B testing of models
- Rollback model changes instantly
- Add fallbacks without code changes
Database Schema¶
model_groups Table¶
Defines named model groups.
CREATE TABLE model_groups (
model_group_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
group_name VARCHAR(100) UNIQUE NOT NULL,
display_name VARCHAR(200),
description TEXT,
status VARCHAR(50) DEFAULT 'active',
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
Columns:
model_group_id: Unique identifiergroup_name: Code-friendly name (e.g., "ResumeAgent")display_name: Human-readable name (e.g., "Resume Analysis Agent")description: Purpose and usage notesstatus:activeorinactive(inactive groups can't be used)
Example:
INSERT INTO model_groups (group_name, display_name, description)
VALUES (
'ResumeAgent',
'Resume Analysis Agent',
'High-quality model for resume parsing, skills extraction, and job matching'
);
model_group_models Table¶
Maps models to groups with priority for fallback handling.
CREATE TABLE model_group_models (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
model_group_id UUID NOT NULL REFERENCES model_groups(model_group_id) ON DELETE CASCADE,
model_name VARCHAR(200) NOT NULL,
priority INTEGER DEFAULT 0,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_model_group_models_lookup
ON model_group_models(model_group_id, priority, is_active);
Columns:
model_group_id: Parent groupmodel_name: LiteLLM model identifier (e.g., "gpt-4-turbo", "claude-3-opus")priority: 0 = primary, 1 = first fallback, 2 = second fallback, etc.is_active: Whether this model is enabled (allows disabling without deletion)
Priority System:
- Priority 0: Primary model (always tried first)
- Priority 1: First fallback (used if primary fails or unavailable)
- Priority 2+: Additional fallbacks
Example:
-- Get model group ID
SELECT model_group_id FROM model_groups WHERE group_name = 'ResumeAgent';
-- Returns: 'a1b2c3d4-...'
-- Add models with priorities
INSERT INTO model_group_models (model_group_id, model_name, priority)
VALUES
('a1b2c3d4-...', 'gpt-4-turbo', 0), -- Primary
('a1b2c3d4-...', 'gpt-4', 1), -- First fallback
('a1b2c3d4-...', 'gpt-3.5-turbo', 2); -- Second fallback
team_model_groups Table¶
Junction table controlling which teams have access to which model groups.
CREATE TABLE team_model_groups (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
team_id VARCHAR(255) NOT NULL,
model_group_id UUID NOT NULL REFERENCES model_groups(model_group_id) ON DELETE CASCADE,
assigned_at TIMESTAMP NOT NULL DEFAULT NOW(),
CONSTRAINT unique_team_model_group UNIQUE(team_id, model_group_id)
);
CREATE INDEX idx_team_model_groups_team ON team_model_groups(team_id);
CREATE INDEX idx_team_model_groups_group ON team_model_groups(model_group_id);
Columns:
team_id: Team identifiermodel_group_id: Model group this team can accessassigned_at: When access was granted
Example:
-- Grant team access to model group
INSERT INTO team_model_groups (team_id, model_group_id)
SELECT 'team-alpha', model_group_id
FROM model_groups
WHERE group_name = 'ResumeAgent';
-- Grant access to multiple groups
INSERT INTO team_model_groups (team_id, model_group_id)
SELECT 'team-alpha', model_group_id
FROM model_groups
WHERE group_name IN ('ResumeAgent', 'ParsingAgent', 'ChatAgent');
Model Resolver Service¶
The ModelResolver service handles model group resolution and access control.
Location¶
src/services/model_resolver.py
Class Definition¶
Key Methods¶
1. verify_team_access()¶
Check if a team has access to a specific model group.
def verify_team_access(self, team_id: str, model_group_name: str) -> bool:
# Get model group
model_group = self.db.query(ModelGroup).filter(
ModelGroup.group_name == model_group_name
).first()
if not model_group:
return False
# Check if team has this group assigned
assignment = self.db.query(TeamModelGroup).filter(
and_(
TeamModelGroup.team_id == team_id,
TeamModelGroup.model_group_id == model_group.model_group_id
)
).first()
return assignment is not None
Returns: True if team has access, False otherwise
Example:
resolver = ModelResolver(db)
if resolver.verify_team_access("team-alpha", "ResumeAgent"):
print("Access granted")
else:
print("Access denied")
2. resolve_model_group()¶
Resolve a model group name to actual model(s).
def resolve_model_group(
self,
team_id: str,
model_group_name: str,
include_fallbacks: bool = True
) -> Tuple[str, List[str]]:
Parameters:
team_id: Team requesting the modelmodel_group_name: Name of model group (e.g., "ResumeAgent")include_fallbacks: Whether to return fallback models (default:True)
Returns: Tuple of (primary_model, [fallback_models])
Raises: ModelResolutionError if: - Team doesn't have access to group - Model group not found or inactive - No active models configured for group
Implementation:
def resolve_model_group(
self,
team_id: str,
model_group_name: str,
include_fallbacks: bool = True
) -> Tuple[str, List[str]]:
# 1. Verify access
if not self.verify_team_access(team_id, model_group_name):
raise ModelResolutionError(
f"Team '{team_id}' does not have access to model group '{model_group_name}'"
)
# 2. Get model group
model_group = self.db.query(ModelGroup).filter(
and_(
ModelGroup.group_name == model_group_name,
ModelGroup.status == "active"
)
).first()
if not model_group:
raise ModelResolutionError(
f"Model group '{model_group_name}' not found or inactive"
)
# 3. Get models sorted by priority
models = self.db.query(ModelGroupModel).filter(
and_(
ModelGroupModel.model_group_id == model_group.model_group_id,
ModelGroupModel.is_active == True
)
).order_by(ModelGroupModel.priority).all()
if not models:
raise ModelResolutionError(
f"No active models configured for group '{model_group_name}'"
)
# 4. Return primary + fallbacks
primary_model = models[0].model_name
fallback_models = [m.model_name for m in models[1:]] if include_fallbacks else []
return primary_model, fallback_models
Example:
resolver = ModelResolver(db)
primary, fallbacks = resolver.resolve_model_group(
team_id="team-alpha",
model_group_name="ResumeAgent"
)
print(f"Primary: {primary}")
# Output: Primary: gpt-4-turbo
print(f"Fallbacks: {fallbacks}")
# Output: Fallbacks: ['gpt-4', 'gpt-3.5-turbo']
3. get_model_group_by_name()¶
Retrieve model group details.
def get_model_group_by_name(self, model_group_name: str) -> Optional[ModelGroup]:
return self.db.query(ModelGroup).filter(
ModelGroup.group_name == model_group_name
).first()
Returns: ModelGroup object or None
4. get_team_model_groups()¶
Get all model groups assigned to a team.
def get_team_model_groups(self, team_id: str) -> List[ModelGroup]:
assignments = self.db.query(TeamModelGroup).filter(
TeamModelGroup.team_id == team_id
).all()
model_group_ids = [a.model_group_id for a in assignments]
return self.db.query(ModelGroup).filter(
ModelGroup.model_group_id.in_(model_group_ids)
).all()
Returns: List of ModelGroup objects
Example:
resolver = ModelResolver(db)
groups = resolver.get_team_model_groups("team-alpha")
for group in groups:
print(f"- {group.group_name}: {group.display_name}")
# Output:
# - ResumeAgent: Resume Analysis Agent
# - ParsingAgent: Document Parsing Agent
# - ChatAgent: General Chat Agent
Resolution Flow¶
Complete Resolution Process¶
sequenceDiagram
participant App as Application
participant API as SaaS API
participant Resolver as ModelResolver
participant DB as PostgreSQL
participant LLM as LiteLLM
App->>API: POST /api/jobs/{job_id}/llm-call<br/>model_group="ResumeAgent"
API->>Resolver: resolve_model_group(team_id, "ResumeAgent")
Resolver->>DB: Check team_model_groups
DB-->>Resolver: Access granted
Resolver->>DB: Get model_groups WHERE group_name='ResumeAgent'
DB-->>Resolver: model_group_id, status='active'
Resolver->>DB: Get model_group_models<br/>ORDER BY priority
DB-->>Resolver: [gpt-4-turbo, gpt-4, gpt-3.5-turbo]
Resolver-->>API: primary='gpt-4-turbo', fallbacks=['gpt-4', ...]
API->>LLM: Call with model='gpt-4-turbo'
LLM-->>API: Response
API->>DB: Record llm_call:<br/>model_group_used='ResumeAgent'<br/>resolved_model='gpt-4-turbo'
API-->>App: Response + metadata Step-by-Step¶
- Application makes request with
model_groupparameter - API receives request, extracts
team_idfrom auth token - Resolver checks access via
team_model_groupstable - Access granted → Resolver fetches model group
- Get models ordered by priority (ascending)
- Return primary (priority 0) and fallbacks (priority 1+)
- API calls LiteLLM with resolved primary model
- Record call in database with both
model_group_usedandresolved_model - Return response to application
API Integration¶
Make LLM Call with Model Group¶
From src/saas_api.py:
@app.post("/api/jobs/{job_id}/llm-call")
async def make_llm_call(
job_id: str,
request: LLMCallRequest, # Contains model_group field
db: Session = Depends(get_db),
authenticated_team_id: str = Depends(verify_virtual_key)
):
# Get job
job = db.query(Job).filter(Job.job_id == uuid.UUID(job_id)).first()
# Verify job belongs to team
if job.team_id != authenticated_team_id:
raise HTTPException(403, "Job does not belong to your team")
# Get team's virtual key
team_credits = db.query(TeamCredits).filter(
TeamCredits.team_id == job.team_id
).first()
if not team_credits.virtual_key:
raise HTTPException(500, "Team has no virtual key configured")
# Resolve model group to actual model
model_resolver = ModelResolver(db)
try:
primary_model, fallback_models = model_resolver.resolve_model_group(
team_id=job.team_id,
model_group_name=request.model_group
)
except ModelResolutionError as e:
raise HTTPException(403, str(e))
# Track model group usage in job
if not job.model_groups_used:
job.model_groups_used = []
if request.model_group not in job.model_groups_used:
job.model_groups_used.append(request.model_group)
db.commit()
# Call LiteLLM with resolved model
litellm_response = await call_litellm(
model=primary_model, # Use resolved primary model
messages=request.messages,
virtual_key=team_credits.virtual_key,
team_id=job.team_id,
temperature=request.temperature,
max_tokens=request.max_tokens
)
# Store LLM call record
llm_call = LLMCall(
job_id=job.job_id,
model_used=litellm_response.get("model", primary_model),
model_group_used=request.model_group, # Track requested group
resolved_model=primary_model, # Track resolved model
prompt_tokens=usage.get("prompt_tokens", 0),
completion_tokens=usage.get("completion_tokens", 0),
total_tokens=usage.get("total_tokens", 0),
cost_usd=cost_usd,
latency_ms=latency_ms
)
db.add(llm_call)
db.commit()
return response
## Model Tracking Fields
### Understanding the Three Model Fields
The `llm_calls` table tracks three different model identifiers to provide complete visibility into model resolution:
| Field | Type | Description | Example Value |
|-------|------|-------------|---------------|
| `model_group_used` | string | The model group name requested by the application | `"ResumeAgent"` |
| `resolved_model` | string | The model your system resolved to from the model group | `"gpt-4-turbo"` |
| `model_used` | string | The actual model LiteLLM used (may differ if LiteLLM applies fallbacks) | `"gpt-4-turbo-2024-04-09"` |
### Why Three Fields?
This three-tier tracking provides complete audit trail:
**Example Scenario:**
1. **Application requests** `model_group="ResumeAgent"`
2. **Your system resolves** to `resolved_model="gpt-4-turbo"` (from model group configuration)
3. **LiteLLM returns** `model="gpt-4-turbo-2024-04-09"` (specific version/deployment)
**Tracking All Three Allows:**
- **`model_group_used`**: Track which logical model groups are most popular
- **`resolved_model`**: Verify your resolution logic worked correctly
- **`model_used`**: See the actual model/version LiteLLM used (important for cost tracking and debugging)
### Typical Values
```python
# Example 1: Standard resolution
{
"model_group_used": "ResumeAgent", # What app requested
"resolved_model": "gpt-4-turbo", # What you resolved to
"model_used": "gpt-4-turbo-2024-04-09" # What LiteLLM used
}
# Example 2: Direct model request (not using model groups)
{
"model_group_used": null, # No group used
"resolved_model": "gpt-4", # Direct model request
"model_used": "gpt-4-0613" # Specific version used
}
# Example 3: Fallback scenario
{
"model_group_used": "ResumeAgent", # What app requested
"resolved_model": "gpt-4-turbo", # What you resolved to
"model_used": "gpt-4" # LiteLLM fell back to this
}
Querying Model Usage¶
Track most-used model groups:
SELECT
model_group_used,
COUNT(*) as call_count,
AVG(cost_usd) as avg_cost,
AVG(latency_ms) as avg_latency
FROM llm_calls
WHERE model_group_used IS NOT NULL
AND created_at >= NOW() - INTERVAL '7 days'
GROUP BY model_group_used
ORDER BY call_count DESC;
Verify resolution accuracy:
-- Check if resolved models match actual models used
SELECT
model_group_used,
resolved_model,
model_used,
COUNT(*) as instances
FROM llm_calls
WHERE model_group_used IS NOT NULL
AND created_at >= NOW() - INTERVAL '1 day'
GROUP BY model_group_used, resolved_model, model_used
ORDER BY instances DESC;
Detect unexpected fallbacks:
-- Find cases where LiteLLM used a different model than resolved
SELECT
job_id,
model_group_used,
resolved_model,
model_used,
created_at
FROM llm_calls
WHERE resolved_model != model_used
AND model_group_used IS NOT NULL
ORDER BY created_at DESC
LIMIT 100;
### Request Format
```json
POST /api/jobs/{job_id}/llm-call
{
"model_group": "ResumeAgent",
"messages": [
{"role": "user", "content": "Analyze this resume..."}
],
"temperature": 0.7,
"max_tokens": 2000,
"purpose": "Skills extraction"
}
Response Format¶
{
"call_id": "call-uuid-123",
"response": {
"content": "This resume shows strong Python skills...",
"finish_reason": "stop"
},
"metadata": {
"tokens_used": 1850,
"latency_ms": 2340,
"model_group": "ResumeAgent"
}
}
Note: The actual resolved model is not exposed to the client (keeps implementation details hidden).
Error Handling¶
ModelResolutionError¶
Raised when resolution fails.
Scenarios:
-
Team lacks access:
-
Model group not found:
-
No models configured:
HTTP Response:
Status code: 403 Forbidden
Common Operations¶
1. Create Model Group¶
-- Create group
INSERT INTO model_groups (group_name, display_name, description)
VALUES (
'ChatAgent',
'General Chat Agent',
'Conversational AI for customer support and general queries'
)
RETURNING model_group_id;
-- Returns: 'xyz-uuid-789'
-- Add models
INSERT INTO model_group_models (model_group_id, model_name, priority)
VALUES
('xyz-uuid-789', 'gpt-4-turbo', 0),
('xyz-uuid-789', 'gpt-3.5-turbo', 1);
2. Assign Model Group to Team¶
INSERT INTO team_model_groups (team_id, model_group_id)
SELECT 'team-beta', model_group_id
FROM model_groups
WHERE group_name = 'ChatAgent';
3. Update Primary Model¶
-- Swap priorities to change primary
UPDATE model_group_models
SET priority = CASE
WHEN model_name = 'gpt-4-turbo' THEN 1
WHEN model_name = 'gpt-4' THEN 0
END
WHERE model_group_id = (
SELECT model_group_id FROM model_groups WHERE group_name = 'ResumeAgent'
);
4. Add Fallback Model¶
INSERT INTO model_group_models (model_group_id, model_name, priority)
SELECT
model_group_id,
'claude-3-opus',
3 -- Third fallback
FROM model_groups
WHERE group_name = 'ResumeAgent';
5. Disable Model (without deletion)¶
UPDATE model_group_models
SET is_active = FALSE
WHERE model_name = 'gpt-3.5-turbo'
AND model_group_id = (
SELECT model_group_id FROM model_groups WHERE group_name = 'ResumeAgent'
);
6. Revoke Team Access¶
DELETE FROM team_model_groups
WHERE team_id = 'team-alpha'
AND model_group_id = (
SELECT model_group_id FROM model_groups WHERE group_name = 'ResumeAgent'
);
Analytics Queries¶
Model Group Usage by Team¶
SELECT
llm.model_group_used,
llm.resolved_model,
COUNT(*) AS call_count,
AVG(llm.latency_ms) AS avg_latency_ms,
SUM(llm.total_tokens) AS total_tokens,
SUM(llm.cost_usd) AS total_cost_usd
FROM llm_calls llm
JOIN jobs j ON llm.job_id = j.job_id
WHERE j.team_id = 'team-alpha'
AND llm.created_at >= NOW() - INTERVAL '30 days'
GROUP BY llm.model_group_used, llm.resolved_model
ORDER BY call_count DESC;
Model Group Access Report¶
SELECT
mg.group_name,
mg.display_name,
COUNT(DISTINCT tmg.team_id) AS team_count,
STRING_AGG(DISTINCT mgm.model_name, ', ' ORDER BY mgm.priority) AS models
FROM model_groups mg
LEFT JOIN team_model_groups tmg ON mg.model_group_id = tmg.model_group_id
LEFT JOIN model_group_models mgm ON mg.model_group_id = mgm.model_group_id
WHERE mg.status = 'active'
AND mgm.is_active = TRUE
GROUP BY mg.group_name, mg.display_name
ORDER BY team_count DESC;
Teams Without Model Group Access¶
SELECT DISTINCT j.team_id
FROM jobs j
LEFT JOIN team_model_groups tmg ON j.team_id = tmg.team_id
WHERE tmg.team_id IS NULL
AND j.created_at >= NOW() - INTERVAL '7 days';
Best Practices¶
1. Use Descriptive Group Names¶
-- ✅ Good - Clear purpose
'ResumeAgent', 'DocumentParser', 'ChatAssistant'
-- ❌ Bad - Unclear
'Agent1', 'ModelA', 'Default'
2. Always Have Fallbacks¶
Configure at least 2 models per group for reliability:
INSERT INTO model_group_models (model_group_id, model_name, priority)
VALUES
(group_id, 'gpt-4-turbo', 0), -- Primary
(group_id, 'gpt-3.5-turbo', 1); -- Fallback
3. Test Model Changes¶
Before switching primary models, test with a subset of teams:
-- Create test group with new model
INSERT INTO model_groups (group_name, display_name)
VALUES ('ResumeAgent-Beta', 'Resume Agent (Testing GPT-4.5)');
-- Assign to test teams only
INSERT INTO team_model_groups (team_id, model_group_id)
SELECT 'team-test', model_group_id
FROM model_groups
WHERE group_name = 'ResumeAgent-Beta';
4. Track Resolution History¶
The llm_calls table tracks both model_group_used and resolved_model, allowing you to: - See which groups are most popular - Identify when models change - Audit resolution decisions
5. Document Model Group Purposes¶
Use the description field:
UPDATE model_groups
SET description = 'Use for: Resume parsing, skills extraction, job matching. Optimized for structured outputs. Fallback to GPT-4 if quota exceeded.'
WHERE group_name = 'ResumeAgent';
Advanced Patterns¶
Per-Team Model Overrides¶
Allow specific teams to use different models for the same group:
-- Create team-specific variant
INSERT INTO model_groups (group_name, display_name)
VALUES ('ResumeAgent-Premium', 'Resume Agent (Enterprise)');
INSERT INTO model_group_models (model_group_id, model_name, priority)
SELECT model_group_id, 'gpt-4-turbo', 0
FROM model_groups
WHERE group_name = 'ResumeAgent-Premium';
-- Assign premium version to enterprise team
INSERT INTO team_model_groups (team_id, model_group_id)
SELECT 'team-enterprise', model_group_id
FROM model_groups
WHERE group_name = 'ResumeAgent-Premium';
Cost-Based Routing¶
Route to cheaper models for non-critical tasks:
-- Budget model group
INSERT INTO model_groups (group_name, display_name)
VALUES ('ChatAgent-Budget', 'Chat Agent (Economy)');
INSERT INTO model_group_models (model_group_id, model_name, priority)
SELECT model_group_id, 'gpt-3.5-turbo', 0
FROM model_groups
WHERE group_name = 'ChatAgent-Budget';
-- Assign to free-tier teams
INSERT INTO team_model_groups (team_id, model_group_id)
SELECT team_id, (SELECT model_group_id FROM model_groups WHERE group_name = 'ChatAgent-Budget')
FROM team_credits
WHERE credits_remaining < 100;
A/B Testing Models¶
Split traffic between models to compare performance:
-- Create A/B variants
INSERT INTO model_groups (group_name, display_name)
VALUES
('ResumeAgent-A', 'Resume Agent (GPT-4 Test)'),
('ResumeAgent-B', 'Resume Agent (Claude Test)');
-- Configure models
INSERT INTO model_group_models (model_group_id, model_name, priority)
VALUES
((SELECT model_group_id FROM model_groups WHERE group_name = 'ResumeAgent-A'), 'gpt-4-turbo', 0),
((SELECT model_group_id FROM model_groups WHERE group_name = 'ResumeAgent-B'), 'claude-3-opus', 0);
-- Assign 50% of teams to each
-- (Application logic splits teams randomly)
See Also¶
- Database Schema Reference - Model group table definitions
- API Reference: Model Groups - Model group management endpoints
- LiteLLM Model List - Supported model identifiers