Job Workflow¶
Learn about the job-based workflow that powers SaaS LiteLLM's cost tracking and billing system.
What is a Job?¶
A Job is a logical grouping of related LLM calls that represent a single business operation or workflow. Instead of tracking and billing individual LLM calls, the platform groups them into jobs for simplified cost management.
Why Jobs?¶
Traditional LLM APIs charge per API call. This creates complexity when a single workflow makes multiple calls:
| Traditional Approach | Job-Based Approach |
|---|---|
| ❌ Charge per API call | ✅ Charge per completed job |
| ❌ Hard to track workflow costs | ✅ Aggregate costs per workflow |
| ❌ Complex billing | ✅ Simple billing (1 credit = 1 job) |
| ❌ API-centric | ✅ Business-centric |
Real-World Example: Resume Analysis¶
Your resume analysis tool performs multiple LLM operations:
Job: "resume_analysis"
├── LLM Call 1: Parse resume text
├── LLM Call 2: Compare requirements with candidate qualifications
└── LLM Call 3: Generate executive summary
Result: 1 Job = 1 Credit charged (not 3 credits!)
Key Benefits: - Simplified Billing - Pay per job completion, not per LLM call - Cost Aggregation - Track total cost across multiple calls - Usage Analytics - Understand which workflows are expensive - Failure Handling - Failed jobs don't consume credits
Job Lifecycle¶
A job goes through several states during its lifetime:
stateDiagram-v2
[*] --> pending: Create Job (POST /api/jobs/create)
pending --> in_progress: First LLM Call
in_progress --> in_progress: Additional LLM Calls
in_progress --> completed: Complete Job (status: completed)
in_progress --> failed: Complete Job (status: failed)
completed --> [*]: 1 credit deducted ✓
failed --> [*]: No credit deducted
note right of completed
Credit deducted ONLY for completed jobs
with all successful LLM calls
end note
note right of failed
Failed jobs don't consume credits,
allowing you to retry without cost
end note Job States¶
| State | Description | Credit Impact |
|---|---|---|
pending | Job created, no LLM calls yet | None |
in_progress | At least one LLM call made | None (until completion) |
completed | Job successfully completed | 1 credit deducted |
failed | Job failed or cancelled | No credit deducted |
Workflow Options¶
SaaS LiteLLM supports two workflow patterns:
1. Single-Call Workflow (Recommended for Simple Use Cases)¶
Best for chat applications, single-turn responses, and simple tasks requiring only one LLM call.
Advantages: - ~66% latency reduction (1 API call vs 3) - Simpler code - Automatic error handling - Built-in credit deduction
When to use: - Chat applications - Single-turn text generation - Simple classification tasks - Any workflow with just one LLM call
Jump to: Single-Call Workflow →
2. Multi-Step Workflow (For Complex Operations)¶
Best for complex workflows requiring multiple LLM calls, agentic workflows, or batch processing.
Advantages: - Granular control over each step - Support for multiple LLM calls per job - Retry individual calls without starting over - Track intermediate results
When to use: - Multi-step document analysis - Agentic workflows - Batch processing - Complex chains requiring multiple calls
Jump to: Multi-Step Workflow →
Single-Call Workflow¶
For simple workflows that only need a single LLM call, use the combined create-and-call endpoint:
Endpoint¶
Example: Chat Response¶
import requests
API = "http://localhost:8003/api"
headers = {
"Authorization": "Bearer sk-your-virtual-key",
"Content-Type": "application/json"
}
# Single API call - creates job, calls LLM, and completes job
response = requests.post(
f"{API}/jobs/create-and-call",
headers=headers,
json={
"team_id": "acme-corp",
"job_type": "chat_response",
"model": "gpt-4", # Model alias or group
"messages": [
{"role": "user", "content": "What is Python?"}
],
"temperature": 0.7,
"max_tokens": 500
}
)
result = response.json()
print(f"Response: {result['response']['content']}")
print(f"Tokens used: {result['metadata']['tokens_used']}")
print(f"Credits remaining: {result['costs']['credits_remaining']}")
const API = "http://localhost:8003/api";
const headers = {
'Authorization': 'Bearer sk-your-virtual-key',
'Content-Type': 'application/json'
};
// Single API call - creates job, calls LLM, and completes job
const response = await fetch(`${API}/jobs/create-and-call`, {
method: 'POST',
headers: headers,
body: JSON.stringify({
team_id: 'acme-corp',
job_type: 'chat_response',
model: 'gpt-4',
messages: [
{role: 'user', content: 'What is Python?'}
],
temperature: 0.7,
max_tokens: 500
})
});
const result = await response.json();
console.log(`Response: ${result.response.content}`);
console.log(`Tokens used: ${result.metadata.tokens_used}`);
console.log(`Credits remaining: ${result.costs.credits_remaining}`);
curl -X POST http://localhost:8003/api/jobs/create-and-call \
-H "Authorization: Bearer sk-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"team_id": "acme-corp",
"job_type": "chat_response",
"model": "gpt-4",
"messages": [
{"role": "user", "content": "What is Python?"}
],
"temperature": 0.7,
"max_tokens": 500
}'
Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"response": {
"content": "Python is a high-level programming language...",
"finish_reason": "stop"
},
"metadata": {
"tokens_used": 256,
"latency_ms": 1340,
"model": "gpt-4"
},
"costs": {
"total_calls": 1,
"successful_calls": 1,
"failed_calls": 0,
"total_tokens": 256,
"total_cost_usd": 0.0128,
"avg_latency_ms": 1340,
"credit_applied": true,
"credits_remaining": 999
},
"completed_at": "2024-10-14T12:00:05.340Z"
}
Error Handling¶
The single-call endpoint automatically handles errors:
try:
response = requests.post(
f"{API}/jobs/create-and-call",
headers=headers,
json={
"team_id": "acme-corp",
"job_type": "chat",
"model": "gpt-4",
"messages": messages
},
timeout=30
)
response.raise_for_status()
result = response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 403:
print(f"Access error: {e.response.json()['detail']}")
elif e.response.status_code == 500:
print(f"LLM call failed: {e.response.json()['detail']}")
else:
print(f"Request failed: {e}")
except requests.exceptions.Timeout:
print("Request timed out")
Performance Comparison¶
| Workflow Type | API Calls | Latency | Best For |
|---|---|---|---|
| Single-Call | 1 | ~1.5s | Chat apps, simple tasks |
| Multi-Step | 3+ | ~4.5s+ | Complex workflows, multiple calls |
Latency breakdown:
Single-Call: [Create+Call+Complete] = ~1.5s
↓
Multi-Step: [Create] → [Call] → [Complete] = ~4.5s
~0.1s ~1.4s ~0.1s
Multi-Step Workflow¶
For complex workflows requiring multiple LLM calls, use the full job lifecycle:
1. Create Job¶
Create a job to start tracking LLM calls:
import requests
API = "http://localhost:8003/api"
headers = {
"Authorization": "Bearer sk-your-virtual-key",
"Content-Type": "application/json"
}
# Create job
response = requests.post(
f"{API}/jobs/create",
headers=headers,
json={
"team_id": "acme-corp",
"user_id": "john@acme.com",
"job_type": "document_analysis",
"metadata": {
"document_id": "doc_123",
"document_name": "report.pdf",
"pages": 5
}
}
)
job = response.json()
job_id = job["job_id"]
print(f"Created job: {job_id}")
const API = "http://localhost:8003/api";
const headers = {
'Authorization': 'Bearer sk-your-virtual-key',
'Content-Type': 'application/json'
};
// Create job
const response = await fetch(`${API}/jobs/create`, {
method: 'POST',
headers: headers,
body: JSON.stringify({
team_id: 'acme-corp',
user_id: 'john@acme.com',
job_type: 'document_analysis',
metadata: {
document_id: 'doc_123',
document_name: 'report.pdf',
pages: 5
}
})
});
const job = await response.json();
const jobId = job.job_id;
console.log(`Created job: ${jobId}`);
curl -X POST http://localhost:8003/api/jobs/create \
-H "Authorization: Bearer sk-your-virtual-key" \
-H "Content-Type: application/json" \
-d '{
"team_id": "acme-corp",
"user_id": "john@acme.com",
"job_type": "document_analysis",
"metadata": {
"document_id": "doc_123",
"document_name": "report.pdf",
"pages": 5
}
}'
Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "pending",
"created_at": "2024-10-14T12:00:00.000Z"
}
2. Make LLM Calls¶
Make one or more LLM calls associated with the job:
# Make multiple LLM calls for the same job
for step in ["parse", "analyze", "summarize"]:
response = requests.post(
f"{API}/jobs/{job_id}/llm-call",
headers=headers,
json={
"messages": [
{"role": "user", "content": f"{step} this document..."}
],
"purpose": step # Optional: for tracking
}
)
result = response.json()
print(f"{step}: {result['response']['content'][:50]}...")
Response:
{
"call_id": "call-uuid-1",
"response": {
"content": "Here is the analysis...",
"finish_reason": "stop"
},
"metadata": {
"tokens_used": 450,
"latency_ms": 1250
}
}
Job Status Changes
After the first LLM call, the job status automatically changes from pending to in_progress.
3. Complete Job¶
Complete the job to trigger credit deduction and get cost summary:
# Complete job successfully
response = requests.post(
f"{API}/jobs/{job_id}/complete",
headers=headers,
json={
"status": "completed",
"metadata": {
"result": "success",
"output_file": "analysis_123.json"
}
}
)
result = response.json()
print(f"Job completed!")
print(f"Total calls: {result['costs']['total_calls']}")
print(f"Total cost: ${result['costs']['total_cost_usd']:.4f}")
print(f"Credit deducted: {result['costs']['credit_applied']}")
print(f"Credits remaining: {result['costs']['credits_remaining']}")
Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"status": "completed",
"completed_at": "2024-10-14T12:05:23.000Z",
"costs": {
"total_calls": 3,
"successful_calls": 3,
"failed_calls": 0,
"total_tokens": 1350,
"total_cost_usd": 0.0045,
"avg_latency_ms": 1200,
"credit_applied": true,
"credits_remaining": 999
},
"calls": [
{
"call_id": "call-uuid-1",
"purpose": "parse",
"tokens": 450,
"latency_ms": 1250,
"error": null
},
{
"call_id": "call-uuid-2",
"purpose": "analyze",
"tokens": 480,
"latency_ms": 1180,
"error": null
},
{
"call_id": "call-uuid-3",
"purpose": "summarize",
"tokens": 420,
"latency_ms": 1170,
"error": null
}
]
}
Credit Deduction Rules¶
Credits are ONLY deducted when ALL of these conditions are met:
- ✅ Job status is
"completed"(not"failed") - ✅ All LLM calls succeeded (no failed calls)
- ✅ Credit hasn't already been applied
Key Points: - 1 Job = 1 Credit regardless of: - Number of LLM calls (could be 1 or 100) - Actual USD cost (tracked separately for analytics) - Models used (different models in same job) - Time duration (seconds or hours)
Handling Failures¶
If something goes wrong, complete the job as failed to avoid credit deduction:
try:
# Make LLM calls
for step in workflow_steps:
make_llm_call(job_id, step)
# Complete successfully
complete_job(job_id, "completed")
except Exception as e:
# Complete as failed (no credit deducted)
requests.post(
f"{API}/jobs/{job_id}/complete",
headers=headers,
json={
"status": "failed",
"error_message": str(e),
"metadata": {"error_type": type(e).__name__}
}
)
print(f"Job failed: {e} (no credit charged)")
Always Complete Jobs
Always complete jobs (even as failed) to prevent "zombie jobs" that stay in in_progress state forever. This keeps your analytics clean.
Job Metadata¶
Use metadata to store custom information about your jobs:
Job Creation Metadata¶
job = requests.post(f"{API}/jobs/create", headers=headers, json={
"team_id": "acme-corp",
"user_id": "john@acme.com",
"job_type": "document_analysis",
"metadata": {
# Your app's context
"task_id": "task_123",
"user_email": "john@acme.com",
"document_name": "report.pdf",
"document_size_mb": 2.5,
"priority": "high",
"source": "upload",
"workflow_version": "v2.1"
}
}).json()
Completion Metadata¶
result = requests.post(
f"{API}/jobs/{job_id}/complete",
headers=headers,
json={
"status": "completed",
"metadata": {
# Your app's results
"output_file": "analysis_123.json",
"confidence_score": 0.95,
"warnings": ["Low quality image on page 3"],
"processing_time_seconds": 12.5,
"pages_processed": 5
}
}
).json()
Complete Example: Multi-Step Workflow¶
Here's a complete example showing a multi-step document analysis workflow:
import requests
from typing import List, Dict
class DocumentAnalyzer:
def __init__(self, base_url: str, virtual_key: str):
self.base_url = base_url
self.headers = {
"Authorization": f"Bearer {virtual_key}",
"Content-Type": "application/json"
}
def analyze_document(
self,
team_id: str,
document_text: str,
document_name: str
) -> Dict:
"""
Complete document analysis workflow:
1. Extract key information
2. Classify document type
3. Generate summary
3 LLM calls = 1 Job = 1 Credit
"""
# Step 1: Create job
job = requests.post(
f"{self.base_url}/api/jobs/create",
headers=self.headers,
json={
"team_id": team_id,
"job_type": "document_analysis",
"metadata": {
"document_name": document_name,
"document_length": len(document_text)
}
}
).json()
job_id = job["job_id"]
print(f"✓ Created job: {job_id}")
try:
# Step 2: Extract key information
extraction = requests.post(
f"{self.base_url}/api/jobs/{job_id}/llm-call",
headers=self.headers,
json={
"messages": [
{
"role": "system",
"content": "Extract key information from documents."
},
{
"role": "user",
"content": f"Extract key info from:\n\n{document_text}"
}
],
"purpose": "extraction"
}
).json()
extracted_info = extraction["response"]["content"]
print(f"✓ Extracted information")
# Step 3: Classify document type
classification = requests.post(
f"{self.base_url}/api/jobs/{job_id}/llm-call",
headers=self.headers,
json={
"messages": [
{
"role": "system",
"content": "Classify document types."
},
{
"role": "user",
"content": f"Classify: {extracted_info}"
}
],
"purpose": "classification"
}
).json()
doc_type = classification["response"]["content"]
print(f"✓ Classified as: {doc_type}")
# Step 4: Generate summary
summary = requests.post(
f"{self.base_url}/api/jobs/{job_id}/llm-call",
headers=self.headers,
json={
"messages": [
{
"role": "system",
"content": "Generate concise summaries."
},
{
"role": "user",
"content": f"Summarize: {extracted_info}"
}
],
"purpose": "summarization"
}
).json()
doc_summary = summary["response"]["content"]
print(f"✓ Generated summary")
# Step 5: Complete job successfully
completion = requests.post(
f"{self.base_url}/api/jobs/{job_id}/complete",
headers=self.headers,
json={
"status": "completed",
"metadata": {
"result": "success",
"document_type": doc_type,
"summary_length": len(doc_summary)
}
}
).json()
print(f"\n✅ Job completed!")
print(f" Total calls: {completion['costs']['total_calls']}")
print(f" Total cost: ${completion['costs']['total_cost_usd']:.4f}")
print(f" Credits remaining: {completion['costs']['credits_remaining']}")
return {
"extracted_info": extracted_info,
"document_type": doc_type,
"summary": doc_summary,
"costs": completion["costs"]
}
except Exception as e:
# Complete as failed (no credit charge)
requests.post(
f"{self.base_url}/api/jobs/{job_id}/complete",
headers=self.headers,
json={
"status": "failed",
"error_message": str(e)
}
)
print(f"❌ Job failed: {e} (no credit charged)")
raise
# Usage
analyzer = DocumentAnalyzer(
base_url="http://localhost:8003",
virtual_key="sk-your-virtual-key"
)
result = analyzer.analyze_document(
team_id="acme-corp",
document_text="Your document text here...",
document_name="report.pdf"
)
print("\n" + "="*50)
print("ANALYSIS RESULTS")
print("="*50)
print(f"Type: {result['document_type']}")
print(f"Summary: {result['summary']}")
Output:
✓ Created job: 7f3d9a8b-4c21-4e89-b5d3-2a1c8f6e9b0d
✓ Extracted information
✓ Classified as: Financial Report
✓ Generated summary
✅ Job completed!
Total calls: 3
Total cost: $0.0045
Credits remaining: 999
==================================================
ANALYSIS RESULTS
==================================================
Type: Financial Report
Summary: This quarterly financial report shows...
Best Practices¶
1. Always Complete Jobs¶
# ✅ GOOD: Always complete jobs
try:
make_llm_calls(job_id)
complete_job(job_id, "completed")
except Exception as e:
complete_job(job_id, "failed", error_message=str(e))
# ❌ BAD: Leaving jobs incomplete
try:
make_llm_calls(job_id)
complete_job(job_id, "completed")
except Exception as e:
pass # Job stays in in_progress forever!
2. Use Meaningful Job Types¶
# ✅ GOOD: Descriptive job types
job_types = [
"document_analysis",
"resume_parsing",
"chat_session",
"data_extraction",
"content_generation"
]
# ❌ BAD: Generic job types
job_types = ["job1", "job2", "task", "process"]
3. Add Contextual Metadata¶
# ✅ GOOD: Rich metadata
metadata = {
"task_id": "task_123",
"user_email": "john@acme.com",
"document_name": "report.pdf",
"priority": "high",
"workflow_version": "v2.1"
}
# ❌ BAD: Empty or minimal metadata
metadata = {}
4. Use Purpose for Call Tracking¶
# ✅ GOOD: Label each LLM call
purposes = ["parsing", "classification", "summarization"]
# ❌ BAD: No purpose tracking
purpose = None
Viewing Job Details¶
You can retrieve job details at any time:
Response:
{
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"team_id": "acme-corp",
"user_id": "john@acme.com",
"job_type": "document_analysis",
"status": "completed",
"created_at": "2024-10-14T12:00:00.000Z",
"completed_at": "2024-10-14T12:05:23.000Z",
"metadata": {
"document_id": "doc_123",
"result": "success"
},
"costs": {
"total_calls": 3,
"total_cost_usd": 0.0045,
"credit_applied": true
},
"calls": [...]
}
Next Steps¶
- Non-Streaming Calls - Learn about standard LLM calls
- Streaming Calls - Implement real-time streaming
- Error Handling - Handle errors gracefully
- Examples - See working code examples