Type-Safe Python Client¶

The easiest way to integrate with SaaS LiteLLM is using our type-safe Python client with full async support and Pydantic validation.

Download the Client

Get the Typed Client

Complete source code, installation guide, and usage examples.

Why Use the Typed Client?¶

Raw API:

# Complex, verbose, error-prone
import requests

response = requests.post(
    "http://localhost:8003/api/jobs/create",
    headers={"Authorization": f"Bearer {VIRTUAL_KEY}"},
    json={"team_id": "acme-corp", "job_type": "analysis"}
)
job = response.json()

llm_response = requests.post(
    f"http://localhost:8003/api/jobs/{job['job_id']}/llm-call",
    headers={"Authorization": f"Bearer {VIRTUAL_KEY}"},
    json={"messages": [{"role": "user", "content": "Hello"}]}
)
# ...

Typed Client:

# Clean, typed, easy
from examples.typed_client import SaaSLLMClient

async with SaaSLLMClient(
    base_url="http://localhost:8003",
    team_id="acme-corp",
    virtual_key="sk-your-key"
) as client:
    job_id = await client.create_job("analysis")
    response = await client.chat(job_id, [
        {"role": "user", "content": "Hello"}
    ])
    await client.complete_job(job_id, "completed")

✅ Type hints and autocomplete ✅ Automatic error handling ✅ Context manager support ✅ Pydantic validation ✅ Async/await support ✅ Cleaner code

Installation¶

Copy the Client¶

The typed client is in examples/typed_client.py:

# Copy to your project
cp examples/typed_client.py your_project/saas_litellm_client.py

Install Dependencies¶

pip install httpx pydantic

Quick Start¶

Basic Usage¶

import asyncio
from saas_litellm_client import SaaSLLMClient

async def main():
    async with SaaSLLMClient(
        base_url="http://localhost:8003",
        team_id="acme-corp",
        virtual_key="sk-your-virtual-key-here"
    ) as client:

        # Create job
        job_id = await client.create_job("my_first_job")
        print(f"Created job: {job_id}")

        # Make LLM call
        response = await client.chat(
            job_id=job_id,
            messages=[
                {"role": "user", "content": "What is Python?"}
            ]
        )

        print(f"Response: {response.choices[0].message['content']}")

        # Complete job
        result = await client.complete_job(job_id, "completed")
        print(f"Credits remaining: {result.credits_remaining}")

if __name__ == "__main__":
    asyncio.run(main())

Client Methods¶

create_job()¶

Create a new job for tracking:

job_id = await client.create_job(
    job_type="document_analysis",
    metadata={"document_id": "doc_123", "user": "john"}
)

Parameters: - job_type (str): Type of job (e.g., "analysis", "chat", "extraction") - metadata (dict, optional): Custom data

Returns: str - Job ID (UUID)

chat()¶

Make a non-streaming LLM call:

response = await client.chat(
    job_id=job_id,
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Explain quantum computing"}
    ],
    temperature=0.7,
    max_tokens=500
)

content = response.choices[0].message["content"]

Parameters: - job_id (str): Job ID from create_job() - messages (list): Chat messages - temperature (float, optional): 0.0-2.0, default 0.7 - max_tokens (int, optional): Max response length - top_p (float, optional): Nucleus sampling - frequency_penalty (float, optional): Reduce repetition - presence_penalty (float, optional): Encourage new topics - stop (str|list, optional): Stop sequences

Returns: ChatCompletionResponse - Pydantic model with response

chat_stream()¶

Make a streaming LLM call:

async for chunk in client.chat_stream(
    job_id=job_id,
    messages=[{"role": "user", "content": "Tell me a story"}]
):
    if chunk.choices:
        content = chunk.choices[0].delta.get("content", "")
        print(content, end="", flush=True)

Parameters: Same as chat()

Yields: ChatCompletionChunk - Streaming chunks

structured_output()¶

Get type-safe structured responses with Pydantic models:

from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int
    email: str

person = await client.structured_output(
    job_id=job_id,
    messages=[{
        "role": "user",
        "content": "Extract: John Smith, 35, john@example.com"
    }],
    response_model=Person
)

print(f"Name: {person.name}, Age: {person.age}")

Parameters: - job_id (str): Job ID - messages (list): Chat messages - response_model (Type[BaseModel]): Pydantic model class - Other chat parameters

Returns: Instance of your Pydantic model

complete_job()¶

Mark job as completed:

result = await client.complete_job(
    job_id=job_id,
    status="completed",
    metadata={"result": "success", "output_file": "result.json"}
)

print(f"Credits remaining: {result.credits_remaining}")
print(f"Total calls: {result.total_calls}")

Parameters: - job_id (str): Job ID - status (str): "completed" or "failed" - metadata (dict, optional): Additional data

Returns: JobCompletionResult - Pydantic model with results

Complete Example¶

Document Analysis¶

import asyncio
from saas_litellm_client import SaaSLLMClient

async def analyze_document(document_text: str):
    """Analyze a document: extract key points and summarize"""

    async with SaaSLLMClient(
        base_url="http://localhost:8003",
        team_id="acme-corp",
        virtual_key="sk-your-key"
    ) as client:

        # Create job
        job_id = await client.create_job(
            job_type="document_analysis",
            metadata={"document_length": len(document_text)}
        )

        try:
            # Extract key points
            key_points_response = await client.chat(
                job_id=job_id,
                messages=[
                    {
                        "role": "system",
                        "content": "Extract key points as bullet points"
                    },
                    {
                        "role": "user",
                        "content": f"Extract key points from:\n\n{document_text}"
                    }
                ],
                temperature=0.3,
                max_tokens=500
            )

            key_points = key_points_response.choices[0].message["content"]
            print(f"Key Points:\n{key_points}\n")

            # Generate summary
            summary_response = await client.chat(
                job_id=job_id,
                messages=[
                    {
                        "role": "system",
                        "content": "Create concise summaries"
                    },
                    {
                        "role": "user",
                        "content": f"Summarize in 2-3 sentences:\n\n{document_text}"
                    }
                ],
                temperature=0.5,
                max_tokens=200
            )

            summary = summary_response.choices[0].message["content"]
            print(f"Summary:\n{summary}\n")

            # Complete job
            result = await client.complete_job(job_id, "completed")
            print(f"Analysis complete! Credits remaining: {result.credits_remaining}")

            return {
                "key_points": key_points,
                "summary": summary
            }

        except Exception as e:
            # Mark job as failed
            await client.complete_job(job_id, "failed")
            raise

if __name__ == "__main__":
    document = """
    Artificial intelligence (AI) is transforming industries worldwide.
    Machine learning algorithms can process vast amounts of data and
    identify patterns. This technology is being applied in healthcare,
    finance, and transportation for various innovative solutions.
    """

    result = asyncio.run(analyze_document(document))

Streaming Chat¶

import asyncio
from saas_litellm_client import SaaSLLMClient

async def interactive_chat():
    """Interactive streaming chat session"""

    async with SaaSLLMClient(
        base_url="http://localhost:8003",
        team_id="acme-corp",
        virtual_key="sk-your-key"
    ) as client:

        # Create job for chat session
        job_id = await client.create_job("chat_session")
        messages = []

        while True:
            # Get user input
            user_input = input("\nYou: ")
            if user_input.lower() in ['quit', 'exit', 'bye']:
                break

            # Add to conversation
            messages.append({"role": "user", "content": user_input})

            # Stream response
            print("Assistant: ", end="", flush=True)
            assistant_response = ""

            async for chunk in client.chat_stream(
                job_id=job_id,
                messages=messages,
                temperature=0.7
            ):
                if chunk.choices:
                    content = chunk.choices[0].delta.get("content", "")
                    assistant_response += content
                    print(content, end="", flush=True)

            # Add assistant response to conversation
            messages.append({"role": "assistant", "content": assistant_response})

        # Complete job
        result = await client.complete_job(job_id, "completed")
        print(f"\n\nChat ended. Credits remaining: {result.credits_remaining}")

if __name__ == "__main__":
    asyncio.run(interactive_chat())

Structured Data Extraction¶

import asyncio
from pydantic import BaseModel
from saas_litellm_client import SaaSLLMClient

class Resume(BaseModel):
    name: str
    email: str
    phone: str
    years_experience: int
    skills: list[str]
    education: str

async def parse_resume(resume_text: str):
    """Extract structured data from resume"""

    async with SaaSLLMClient(
        base_url="http://localhost:8003",
        team_id="acme-corp",
        virtual_key="sk-your-key"
    ) as client:

        job_id = await client.create_job("resume_parsing")

        try:
            # Get structured output
            resume = await client.structured_output(
                job_id=job_id,
                messages=[{
                    "role": "user",
                    "content": f"Extract structured data from this resume:\n\n{resume_text}"
                }],
                response_model=Resume
            )

            # resume is now a fully typed Resume object!
            print(f"Name: {resume.name}")
            print(f"Email: {resume.email}")
            print(f"Experience: {resume.years_experience} years")
            print(f"Skills: {', '.join(resume.skills)}")

            await client.complete_job(job_id, "completed")
            return resume

        except Exception as e:
            await client.complete_job(job_id, "failed")
            raise

if __name__ == "__main__":
    resume_text = """
    John Doe
    Email: john@example.com
    Phone: (555) 123-4567

    EXPERIENCE: 5 years as a software engineer

    SKILLS: Python, JavaScript, React, Docker, Kubernetes

    EDUCATION: BS in Computer Science, MIT
    """

    result = asyncio.run(parse_resume(resume_text))

Error Handling¶

Basic Error Handling¶

from httpx import HTTPStatusError

async with SaaSLLMClient(...) as client:
    try:
        job_id = await client.create_job("test")
        response = await client.chat(job_id, messages)
        await client.complete_job(job_id, "completed")

    except HTTPStatusError as e:
        if e.response.status_code == 401:
            print("Authentication failed - check your virtual key")
        elif e.response.status_code == 403:
            print("Access denied - check credits or team status")
        elif e.response.status_code == 429:
            print("Rate limited - wait and retry")
        else:
            print(f"HTTP error: {e}")
        raise

    except Exception as e:
        print(f"Unexpected error: {e}")
        raise

Retry Logic¶

import asyncio
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
async def make_llm_call_with_retry(client, job_id, messages):
    """Make LLM call with automatic retries"""
    return await client.chat(job_id, messages)

# Usage
async with SaaSLLMClient(...) as client:
    job_id = await client.create_job("test")

    try:
        response = await make_llm_call_with_retry(client, job_id, messages)
        await client.complete_job(job_id, "completed")
    except Exception as e:
        await client.complete_job(job_id, "failed")
        raise

Configuration¶

Environment Variables¶

import os
from saas_litellm_client import SaaSLLMClient

# Load from environment
API_URL = os.environ.get("SAAS_LITELLM_API_URL", "http://localhost:8003")
TEAM_ID = os.environ["SAAS_LITELLM_TEAM_ID"]
VIRTUAL_KEY = os.environ["SAAS_LITELLM_VIRTUAL_KEY"]

async with SaaSLLMClient(
    base_url=API_URL,
    team_id=TEAM_ID,
    virtual_key=VIRTUAL_KEY
) as client:
    # Use client
    pass

.env file:

SAAS_LITELLM_API_URL=https://api.yourcompany.com
SAAS_LITELLM_TEAM_ID=acme-prod
SAAS_LITELLM_VIRTUAL_KEY=sk-your-actual-key-here

Custom Timeout¶

async with SaaSLLMClient(
    base_url="http://localhost:8003",
    team_id="acme-corp",
    virtual_key="sk-your-key",
    timeout=60.0  # 60 seconds
) as client:
    # Use client
    pass

Advanced Usage¶

Concurrent Jobs¶

Process multiple jobs concurrently:

import asyncio

async def process_document(client, document):
    """Process one document"""
    job_id = await client.create_job("analysis")
    response = await client.chat(job_id, [
        {"role": "user", "content": f"Analyze: {document}"}
    ])
    await client.complete_job(job_id, "completed")
    return response

async def process_batch(documents):
    """Process multiple documents concurrently"""
    async with SaaSLLMClient(...) as client:
        tasks = [process_document(client, doc) for doc in documents]
        results = await asyncio.gather(*tasks)
        return results

# Process 10 documents at once
documents = ["doc1", "doc2", ..., "doc10"]
results = asyncio.run(process_batch(documents))

Context Manager Options¶

# Option 1: Context manager (recommended)
async with SaaSLLMClient(...) as client:
    # Automatic cleanup
    pass

# Option 2: Manual lifecycle
client = SaaSLLMClient(...)
try:
    job_id = await client.create_job("test")
    # ...
finally:
    await client.close()

Best Practices¶

Always use context manager (async with) for automatic cleanup
Handle errors properly - Always complete jobs, even on failure
Use structured outputs for type safety when extracting data
Set timeouts - Don't let requests hang forever
Monitor credits - Check balance periodically
Reuse client - Don't create new client for each request
Environment variables - Never hardcode credentials

Troubleshooting¶

Import Error¶

Problem: ImportError: No module named 'httpx'

Solution:

pip install httpx pydantic

Authentication Error¶

Problem: 401 Unauthorized

Solutions: 1. Check virtual key is correct 2. Verify Authorization: Bearer sk-... format 3. Check team is active (not suspended)

Timeout Error¶

Problem: Request times out

Solutions: 1. Increase timeout: SaaSLLMClient(..., timeout=120) 2. Check API is running and accessible 3. For long responses, use streaming instead

Next Steps¶

Now that you understand the typed client:

See More Examples - Additional code examples
Learn About Streaming - Real-time responses
Structured Outputs - Type-safe data extraction
Error Handling - Comprehensive error handling
Best Practices - Production-ready patterns