Type-Safe Python Client¶
The easiest way to integrate with SaaS LiteLLM is using our type-safe Python client with full async support and Pydantic validation.
Download the Client
Complete source code, installation guide, and usage examples.
Why Use the Typed Client?¶
Raw API:
# Complex, verbose, error-prone
import requests
response = requests.post(
"http://localhost:8003/api/jobs/create",
headers={"Authorization": f"Bearer {VIRTUAL_KEY}"},
json={"team_id": "acme-corp", "job_type": "analysis"}
)
job = response.json()
llm_response = requests.post(
f"http://localhost:8003/api/jobs/{job['job_id']}/llm-call",
headers={"Authorization": f"Bearer {VIRTUAL_KEY}"},
json={"messages": [{"role": "user", "content": "Hello"}]}
)
# ...
Typed Client:
# Clean, typed, easy
from examples.typed_client import SaaSLLMClient
async with SaaSLLMClient(
base_url="http://localhost:8003",
team_id="acme-corp",
virtual_key="sk-your-key"
) as client:
job_id = await client.create_job("analysis")
response = await client.chat(job_id, [
{"role": "user", "content": "Hello"}
])
await client.complete_job(job_id, "completed")
✅ Type hints and autocomplete ✅ Automatic error handling ✅ Context manager support ✅ Pydantic validation ✅ Async/await support ✅ Cleaner code
Installation¶
Copy the Client¶
The typed client is in examples/typed_client.py:
Install Dependencies¶
Quick Start¶
Basic Usage¶
import asyncio
from saas_litellm_client import SaaSLLMClient
async def main():
async with SaaSLLMClient(
base_url="http://localhost:8003",
team_id="acme-corp",
virtual_key="sk-your-virtual-key-here"
) as client:
# Create job
job_id = await client.create_job("my_first_job")
print(f"Created job: {job_id}")
# Make LLM call
response = await client.chat(
job_id=job_id,
messages=[
{"role": "user", "content": "What is Python?"}
]
)
print(f"Response: {response.choices[0].message['content']}")
# Complete job
result = await client.complete_job(job_id, "completed")
print(f"Credits remaining: {result.credits_remaining}")
if __name__ == "__main__":
asyncio.run(main())
Client Methods¶
create_job()¶
Create a new job for tracking:
job_id = await client.create_job(
job_type="document_analysis",
metadata={"document_id": "doc_123", "user": "john"}
)
Parameters: - job_type (str): Type of job (e.g., "analysis", "chat", "extraction") - metadata (dict, optional): Custom data
Returns: str - Job ID (UUID)
chat()¶
Make a non-streaming LLM call:
response = await client.chat(
job_id=job_id,
messages=[
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Explain quantum computing"}
],
temperature=0.7,
max_tokens=500
)
content = response.choices[0].message["content"]
Parameters: - job_id (str): Job ID from create_job() - messages (list): Chat messages - temperature (float, optional): 0.0-2.0, default 0.7 - max_tokens (int, optional): Max response length - top_p (float, optional): Nucleus sampling - frequency_penalty (float, optional): Reduce repetition - presence_penalty (float, optional): Encourage new topics - stop (str|list, optional): Stop sequences
Returns: ChatCompletionResponse - Pydantic model with response
chat_stream()¶
Make a streaming LLM call:
async for chunk in client.chat_stream(
job_id=job_id,
messages=[{"role": "user", "content": "Tell me a story"}]
):
if chunk.choices:
content = chunk.choices[0].delta.get("content", "")
print(content, end="", flush=True)
Parameters: Same as chat()
Yields: ChatCompletionChunk - Streaming chunks
structured_output()¶
Get type-safe structured responses with Pydantic models:
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
email: str
person = await client.structured_output(
job_id=job_id,
messages=[{
"role": "user",
"content": "Extract: John Smith, 35, john@example.com"
}],
response_model=Person
)
print(f"Name: {person.name}, Age: {person.age}")
Parameters: - job_id (str): Job ID - messages (list): Chat messages - response_model (Type[BaseModel]): Pydantic model class - Other chat parameters
Returns: Instance of your Pydantic model
complete_job()¶
Mark job as completed:
result = await client.complete_job(
job_id=job_id,
status="completed",
metadata={"result": "success", "output_file": "result.json"}
)
print(f"Credits remaining: {result.credits_remaining}")
print(f"Total calls: {result.total_calls}")
Parameters: - job_id (str): Job ID - status (str): "completed" or "failed" - metadata (dict, optional): Additional data
Returns: JobCompletionResult - Pydantic model with results
Complete Example¶
Document Analysis¶
import asyncio
from saas_litellm_client import SaaSLLMClient
async def analyze_document(document_text: str):
"""Analyze a document: extract key points and summarize"""
async with SaaSLLMClient(
base_url="http://localhost:8003",
team_id="acme-corp",
virtual_key="sk-your-key"
) as client:
# Create job
job_id = await client.create_job(
job_type="document_analysis",
metadata={"document_length": len(document_text)}
)
try:
# Extract key points
key_points_response = await client.chat(
job_id=job_id,
messages=[
{
"role": "system",
"content": "Extract key points as bullet points"
},
{
"role": "user",
"content": f"Extract key points from:\n\n{document_text}"
}
],
temperature=0.3,
max_tokens=500
)
key_points = key_points_response.choices[0].message["content"]
print(f"Key Points:\n{key_points}\n")
# Generate summary
summary_response = await client.chat(
job_id=job_id,
messages=[
{
"role": "system",
"content": "Create concise summaries"
},
{
"role": "user",
"content": f"Summarize in 2-3 sentences:\n\n{document_text}"
}
],
temperature=0.5,
max_tokens=200
)
summary = summary_response.choices[0].message["content"]
print(f"Summary:\n{summary}\n")
# Complete job
result = await client.complete_job(job_id, "completed")
print(f"Analysis complete! Credits remaining: {result.credits_remaining}")
return {
"key_points": key_points,
"summary": summary
}
except Exception as e:
# Mark job as failed
await client.complete_job(job_id, "failed")
raise
if __name__ == "__main__":
document = """
Artificial intelligence (AI) is transforming industries worldwide.
Machine learning algorithms can process vast amounts of data and
identify patterns. This technology is being applied in healthcare,
finance, and transportation for various innovative solutions.
"""
result = asyncio.run(analyze_document(document))
Streaming Chat¶
import asyncio
from saas_litellm_client import SaaSLLMClient
async def interactive_chat():
"""Interactive streaming chat session"""
async with SaaSLLMClient(
base_url="http://localhost:8003",
team_id="acme-corp",
virtual_key="sk-your-key"
) as client:
# Create job for chat session
job_id = await client.create_job("chat_session")
messages = []
while True:
# Get user input
user_input = input("\nYou: ")
if user_input.lower() in ['quit', 'exit', 'bye']:
break
# Add to conversation
messages.append({"role": "user", "content": user_input})
# Stream response
print("Assistant: ", end="", flush=True)
assistant_response = ""
async for chunk in client.chat_stream(
job_id=job_id,
messages=messages,
temperature=0.7
):
if chunk.choices:
content = chunk.choices[0].delta.get("content", "")
assistant_response += content
print(content, end="", flush=True)
# Add assistant response to conversation
messages.append({"role": "assistant", "content": assistant_response})
# Complete job
result = await client.complete_job(job_id, "completed")
print(f"\n\nChat ended. Credits remaining: {result.credits_remaining}")
if __name__ == "__main__":
asyncio.run(interactive_chat())
Structured Data Extraction¶
import asyncio
from pydantic import BaseModel
from saas_litellm_client import SaaSLLMClient
class Resume(BaseModel):
name: str
email: str
phone: str
years_experience: int
skills: list[str]
education: str
async def parse_resume(resume_text: str):
"""Extract structured data from resume"""
async with SaaSLLMClient(
base_url="http://localhost:8003",
team_id="acme-corp",
virtual_key="sk-your-key"
) as client:
job_id = await client.create_job("resume_parsing")
try:
# Get structured output
resume = await client.structured_output(
job_id=job_id,
messages=[{
"role": "user",
"content": f"Extract structured data from this resume:\n\n{resume_text}"
}],
response_model=Resume
)
# resume is now a fully typed Resume object!
print(f"Name: {resume.name}")
print(f"Email: {resume.email}")
print(f"Experience: {resume.years_experience} years")
print(f"Skills: {', '.join(resume.skills)}")
await client.complete_job(job_id, "completed")
return resume
except Exception as e:
await client.complete_job(job_id, "failed")
raise
if __name__ == "__main__":
resume_text = """
John Doe
Email: john@example.com
Phone: (555) 123-4567
EXPERIENCE: 5 years as a software engineer
SKILLS: Python, JavaScript, React, Docker, Kubernetes
EDUCATION: BS in Computer Science, MIT
"""
result = asyncio.run(parse_resume(resume_text))
Error Handling¶
Basic Error Handling¶
from httpx import HTTPStatusError
async with SaaSLLMClient(...) as client:
try:
job_id = await client.create_job("test")
response = await client.chat(job_id, messages)
await client.complete_job(job_id, "completed")
except HTTPStatusError as e:
if e.response.status_code == 401:
print("Authentication failed - check your virtual key")
elif e.response.status_code == 403:
print("Access denied - check credits or team status")
elif e.response.status_code == 429:
print("Rate limited - wait and retry")
else:
print(f"HTTP error: {e}")
raise
except Exception as e:
print(f"Unexpected error: {e}")
raise
Retry Logic¶
import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def make_llm_call_with_retry(client, job_id, messages):
"""Make LLM call with automatic retries"""
return await client.chat(job_id, messages)
# Usage
async with SaaSLLMClient(...) as client:
job_id = await client.create_job("test")
try:
response = await make_llm_call_with_retry(client, job_id, messages)
await client.complete_job(job_id, "completed")
except Exception as e:
await client.complete_job(job_id, "failed")
raise
Configuration¶
Environment Variables¶
import os
from saas_litellm_client import SaaSLLMClient
# Load from environment
API_URL = os.environ.get("SAAS_LITELLM_API_URL", "http://localhost:8003")
TEAM_ID = os.environ["SAAS_LITELLM_TEAM_ID"]
VIRTUAL_KEY = os.environ["SAAS_LITELLM_VIRTUAL_KEY"]
async with SaaSLLMClient(
base_url=API_URL,
team_id=TEAM_ID,
virtual_key=VIRTUAL_KEY
) as client:
# Use client
pass
.env file:
SAAS_LITELLM_API_URL=https://api.yourcompany.com
SAAS_LITELLM_TEAM_ID=acme-prod
SAAS_LITELLM_VIRTUAL_KEY=sk-your-actual-key-here
Custom Timeout¶
async with SaaSLLMClient(
base_url="http://localhost:8003",
team_id="acme-corp",
virtual_key="sk-your-key",
timeout=60.0 # 60 seconds
) as client:
# Use client
pass
Advanced Usage¶
Concurrent Jobs¶
Process multiple jobs concurrently:
import asyncio
async def process_document(client, document):
"""Process one document"""
job_id = await client.create_job("analysis")
response = await client.chat(job_id, [
{"role": "user", "content": f"Analyze: {document}"}
])
await client.complete_job(job_id, "completed")
return response
async def process_batch(documents):
"""Process multiple documents concurrently"""
async with SaaSLLMClient(...) as client:
tasks = [process_document(client, doc) for doc in documents]
results = await asyncio.gather(*tasks)
return results
# Process 10 documents at once
documents = ["doc1", "doc2", ..., "doc10"]
results = asyncio.run(process_batch(documents))
Context Manager Options¶
# Option 1: Context manager (recommended)
async with SaaSLLMClient(...) as client:
# Automatic cleanup
pass
# Option 2: Manual lifecycle
client = SaaSLLMClient(...)
try:
job_id = await client.create_job("test")
# ...
finally:
await client.close()
Best Practices¶
- Always use context manager (
async with) for automatic cleanup - Handle errors properly - Always complete jobs, even on failure
- Use structured outputs for type safety when extracting data
- Set timeouts - Don't let requests hang forever
- Monitor credits - Check balance periodically
- Reuse client - Don't create new client for each request
- Environment variables - Never hardcode credentials
Troubleshooting¶
Import Error¶
Problem: ImportError: No module named 'httpx'
Solution:
Authentication Error¶
Problem: 401 Unauthorized
Solutions: 1. Check virtual key is correct 2. Verify Authorization: Bearer sk-... format 3. Check team is active (not suspended)
Timeout Error¶
Problem: Request times out
Solutions: 1. Increase timeout: SaaSLLMClient(..., timeout=120) 2. Check API is running and accessible 3. For long responses, use streaming instead
Next Steps¶
Now that you understand the typed client:
- See More Examples - Additional code examples
- Learn About Streaming - Real-time responses
- Structured Outputs - Type-safe data extraction
- Error Handling - Comprehensive error handling
- Best Practices - Production-ready patterns