Jobs API¶

The Jobs API provides endpoints for managing job lifecycle in the SaaS LiteLLM platform. Jobs are logical groupings of related LLM calls that represent a single business operation or workflow.

Overview¶

Jobs enable simplified billing where 1 job = 1 credit, regardless of how many LLM calls are made within that job. This provides a business-centric approach to cost management.

Base URL: /api/jobs

Authentication: All endpoints require a Bearer token (virtual API key) in the Authorization header.

Endpoints¶

Create Job¶

Create a new job for tracking multiple LLM calls.

Endpoint: POST /api/jobs/create

Authentication: Required (virtual key)

Request Body:

{
  "team_id": "string",
  "user_id": "string (optional)",
  "job_type": "string",
  "metadata": {
    "key": "value"
  }
}

Request Fields:

Field	Type	Required	Description
`team_id`	string	Yes	Team identifier (must match authenticated team)
`user_id`	string	No	Optional user identifier for tracking
`job_type`	string	Yes	Type of job (e.g., "document_analysis", "chat_session")
`metadata`	object	No	Custom metadata for the job

Response (200 OK):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "pending",
  "created_at": "2025-10-14T12:00:00.000Z"
}

Response Fields:

Field	Type	Description
`job_id`	string (UUID)	Unique identifier for the job
`status`	string	Initial status (always "pending")
`created_at`	string (ISO 8601)	Job creation timestamp

Example Request:

cURLPythonJavaScript

curl -X POST http://localhost:8003/api/jobs/create \
  -H "Authorization: Bearer sk-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "acme-corp",
    "user_id": "john@acme.com",
    "job_type": "resume_analysis",
    "metadata": {
      "document_id": "doc_123",
      "document_name": "resume.pdf"
    }
  }'

import requests

API_URL = "http://localhost:8003/api"
VIRTUAL_KEY = "sk-your-virtual-key"

headers = {
    "Authorization": f"Bearer {VIRTUAL_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(
    f"{API_URL}/jobs/create",
    headers=headers,
    json={
        "team_id": "acme-corp",
        "user_id": "john@acme.com",
        "job_type": "resume_analysis",
        "metadata": {
            "document_id": "doc_123",
            "document_name": "resume.pdf"
        }
    }
)

job = response.json()
print(f"Created job: {job['job_id']}")

const API_URL = "http://localhost:8003/api";
const VIRTUAL_KEY = "sk-your-virtual-key";

const response = await fetch(`${API_URL}/jobs/create`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${VIRTUAL_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    team_id: 'acme-corp',
    user_id: 'john@acme.com',
    job_type: 'resume_analysis',
    metadata: {
      document_id: 'doc_123',
      document_name: 'resume.pdf'
    }
  })
});

const job = await response.json();
console.log(`Created job: ${job.job_id}`);

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Virtual key does not belong to the specified team
422	Validation Error	Invalid request data
500	Internal Server Error	Server error

Example Error Response:

{
  "detail": "API key does not belong to team 'acme-corp'"
}

Get Job¶

Retrieve details about a specific job.

Endpoint: GET /api/jobs/{job_id}

Authentication: Required (virtual key)

Path Parameters:

Parameter	Type	Description
`job_id`	string (UUID)	The job identifier

Response (200 OK):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "team_id": "acme-corp",
  "user_id": "john@acme.com",
  "job_type": "resume_analysis",
  "status": "in_progress",
  "created_at": "2025-10-14T12:00:00.000Z",
  "started_at": "2025-10-14T12:00:05.000Z",
  "completed_at": null,
  "model_groups_used": ["ResumeAgent"],
  "credit_applied": false,
  "metadata": {
    "document_id": "doc_123",
    "document_name": "resume.pdf"
  }
}

Response Fields:

Field	Type	Description
`job_id`	string (UUID)	Unique job identifier
`team_id`	string	Team that owns this job
`user_id`	string	User who created the job (optional)
`job_type`	string	Type of job
`status`	string	Current status (pending, in_progress, completed, failed)
`created_at`	string (ISO 8601)	Job creation timestamp
`started_at`	string (ISO 8601)	When first LLM call was made (null if pending)
`completed_at`	string (ISO 8601)	When job was completed (null if not completed)
`model_groups_used`	array	List of model groups used in this job
`credit_applied`	boolean	Whether a credit was deducted for this job
`metadata`	object	Custom metadata

Example Request:

cURLPythonJavaScript

curl -X GET http://localhost:8003/api/jobs/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer sk-your-virtual-key"

response = requests.get(
    f"{API_URL}/jobs/{job_id}",
    headers=headers
)

job = response.json()
print(f"Job status: {job['status']}")

const response = await fetch(`${API_URL}/jobs/${jobId}`, {
  headers: {
    'Authorization': `Bearer ${VIRTUAL_KEY}`
  }
});

const job = await response.json();
console.log(`Job status: ${job.status}`);

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Job does not belong to your team
404	Not Found	Job not found

Complete Job¶

Mark a job as complete and trigger credit deduction (if applicable).

Endpoint: POST /api/jobs/{job_id}/complete

Authentication: Required (virtual key)

Path Parameters:

Parameter	Type	Description
`job_id`	string (UUID)	The job identifier

Request Body:

{
  "status": "completed",
  "metadata": {
    "result": "success",
    "output_file": "analysis_123.json"
  },
  "error_message": "Optional error message if status is 'failed'"
}

Request Fields:

Field	Type	Required	Description
`status`	string	Yes	Job final status: "completed" or "failed"
`metadata`	object	No	Additional metadata to merge with existing metadata
`error_message`	string	No	Error message (only if status is "failed")

Response (200 OK):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "completed_at": "2025-10-14T12:05:23.000Z",
  "costs": {
    "total_calls": 3,
    "successful_calls": 3,
    "failed_calls": 0,
    "total_tokens": 1350,
    "total_cost_usd": 0.0045,
    "avg_latency_ms": 1200,
    "credit_applied": true,
    "credits_remaining": 999
  },
  "calls": [
    {
      "call_id": "call-uuid-1",
      "purpose": "parse",
      "model_group": "ResumeAgent",
      "tokens": 450,
      "latency_ms": 1250,
      "error": null
    },
    {
      "call_id": "call-uuid-2",
      "purpose": "analyze",
      "model_group": "ResumeAgent",
      "tokens": 480,
      "latency_ms": 1180,
      "error": null
    },
    {
      "call_id": "call-uuid-3",
      "purpose": "summarize",
      "model_group": "ResumeAgent",
      "tokens": 420,
      "latency_ms": 1170,
      "error": null
    }
  ]
}

Response Fields:

Field	Type	Description
`job_id`	string (UUID)	Job identifier
`status`	string	Final job status
`completed_at`	string (ISO 8601)	Completion timestamp
`costs`	object	Aggregated cost information
`costs.total_calls`	integer	Total number of LLM calls
`costs.successful_calls`	integer	Number of successful calls
`costs.failed_calls`	integer	Number of failed calls
`costs.total_tokens`	integer	Total tokens used across all calls
`costs.total_cost_usd`	number	Total cost in USD (for internal tracking)
`costs.avg_latency_ms`	integer	Average latency in milliseconds
`costs.credit_applied`	boolean	Whether a credit was deducted
`costs.credits_remaining`	integer	Credits remaining for the team
`calls`	array	List of all LLM calls made in this job

Example Request:

cURLPythonJavaScript

curl -X POST http://localhost:8003/api/jobs/550e8400-e29b-41d4-a716-446655440000/complete \
  -H "Authorization: Bearer sk-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "completed",
    "metadata": {
      "result": "success",
      "output_file": "analysis_123.json"
    }
  }'

response = requests.post(
    f"{API_URL}/jobs/{job_id}/complete",
    headers=headers,
    json={
        "status": "completed",
        "metadata": {
            "result": "success",
            "output_file": "analysis_123.json"
        }
    }
)

result = response.json()
print(f"Total calls: {result['costs']['total_calls']}")
print(f"Total cost: ${result['costs']['total_cost_usd']:.4f}")
print(f"Credit deducted: {result['costs']['credit_applied']}")
print(f"Credits remaining: {result['costs']['credits_remaining']}")

const response = await fetch(`${API_URL}/jobs/${jobId}/complete`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${VIRTUAL_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    status: 'completed',
    metadata: {
      result: 'success',
      output_file: 'analysis_123.json'
    }
  })
});

const result = await response.json();
console.log(`Total calls: ${result.costs.total_calls}`);
console.log(`Credit deducted: ${result.costs.credit_applied}`);
console.log(`Credits remaining: ${result.costs.credits_remaining}`);

Credit Deduction Rules:

Credits are ONLY deducted when ALL of these conditions are met:

Job status is "completed" (not "failed")
All LLM calls succeeded (no failed calls)
Credit hasn't already been applied

Example: Completing Failed Job (No Credit Deduction):

curl -X POST http://localhost:8003/api/jobs/{job_id}/complete \
  -H "Authorization: Bearer sk-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "status": "failed",
    "error_message": "Document parsing failed",
    "metadata": {
      "error_type": "ParsingError"
    }
  }'

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Job does not belong to your team
404	Not Found	Job not found
422	Validation Error	Invalid status value

Update Job Metadata¶

Append metadata to a job during execution. Useful for enriching job context as work progresses.

Endpoint: PATCH /api/jobs/{job_id}/metadata

Authentication: Required (virtual key)

Path Parameters:

Parameter	Type	Description
`job_id`	string (UUID)	The job identifier

Request Body:

{
  "metadata": {
    "conversation_turn": 3,
    "user_sentiment": "positive",
    "tokens_so_far": 450
  }
}

Request Fields:

Field	Type	Required	Description
`metadata`	object	Yes	Metadata to merge with existing job metadata

Response (200 OK):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "metadata": {
    "document_id": "doc_123",
    "conversation_turn": 3,
    "user_sentiment": "positive",
    "tokens_so_far": 450
  },
  "updated_at": "2025-10-14T12:03:45.000Z"
}

Example Request:

cURLPythonJavaScript

curl -X PATCH http://localhost:8003/api/jobs/550e8400-e29b-41d4-a716-446655440000/metadata \
  -H "Authorization: Bearer sk-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "metadata": {
      "conversation_turn": 3,
      "user_sentiment": "positive"
    }
  }'

response = requests.patch(
    f"{API_URL}/jobs/{job_id}/metadata",
    headers=headers,
    json={
        "metadata": {
            "conversation_turn": 3,
            "user_sentiment": "positive"
        }
    }
)

result = response.json()
print(f"Updated metadata: {result['metadata']}")

const response = await fetch(`${API_URL}/jobs/${jobId}/metadata`, {
  method: 'PATCH',
  headers: {
    'Authorization': `Bearer ${VIRTUAL_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    metadata: {
      conversation_turn: 3,
      user_sentiment: 'positive'
    }
  })
});

const result = await response.json();
console.log(`Updated metadata:`, result.metadata);

Use Cases:

Chat Applications
Track conversation history
Record turn numbers
Store user sentiment or feedback
Multi-Step Workflows
Record intermediate results
Track progress through pipeline
Store decision points
Agent Workflows
Log tool calls and results
Track reasoning steps
Record agent state changes

Example: Chat Application Tracking

# Create job for chat session
job_response = requests.post(
    f"{API_URL}/jobs/create",
    headers=headers,
    json={
        "team_id": "acme-corp",
        "job_type": "chat_session",
        "metadata": {
            "session_id": "sess_123",
            "user_id": "user_456"
        }
    }
)
job_id = job_response.json()["job_id"]

# User turn 1
llm_response_1 = requests.post(
    f"{API_URL}/jobs/{job_id}/llm-call",
    headers=headers,
    json={
        "model": "gpt-4",
        "messages": messages_turn_1,
        "call_metadata": {
            "turn": 1,
            "user_message": "How do I deploy to production?"
        }
    }
)

# Update job metadata after turn 1
requests.patch(
    f"{API_URL}/jobs/{job_id}/metadata",
    headers=headers,
    json={
        "metadata": {
            "turns_completed": 1,
            "last_topic": "deployment"
        }
    }
)

# User turn 2
llm_response_2 = requests.post(
    f"{API_URL}/jobs/{job_id}/llm-call",
    headers=headers,
    json={
        "model": "gpt-4",
        "messages": messages_turn_2,
        "call_metadata": {
            "turn": 2,
            "user_message": "What about environment variables?"
        }
    }
)

# Update metadata after turn 2
requests.patch(
    f"{API_URL}/jobs/{job_id}/metadata",
    headers=headers,
    json={
        "metadata": {
            "turns_completed": 2,
            "last_topic": "environment_configuration"
        }
    }
)

# Complete the chat session
requests.post(
    f"{API_URL}/jobs/{job_id}/complete",
    headers=headers,
    json={
        "status": "completed",
        "metadata": {
            "satisfaction_rating": 5,
            "resolved": True
        }
    }
)

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Job does not belong to your team
404	Not Found	Job not found
422	Validation Error	Invalid metadata format

Notes:

Metadata is merged with existing job metadata (not replaced)
Use dot notation or nested objects to organize metadata
Metadata persists through job lifecycle and is returned in job details
Maximum metadata size: 10KB per job

Single-Call Job (Create, Call, and Complete)¶

Create a job, make a single LLM call, and complete the job in one request. This is a convenience endpoint for simple workflows that only need one LLM call.

Endpoint: POST /api/jobs/create-and-call

Authentication: Required (virtual key)

Performance Benefits: - ~66% latency reduction (1 API call vs 3) - Single network round-trip - Automatic job lifecycle management

Request Body:

{
  "team_id": "acme-corp",
  "job_type": "chat_response",
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "What is Python?"
    }
  ],
  "user_id": "optional-user-id",
  "job_metadata": {
    "session_id": "sess_123"
  },
  "purpose": "chat",
  "temperature": 0.7,
  "max_tokens": 500
}

Request Fields:

Field	Type	Required	Description
`team_id`	string	Yes	Team identifier (must match authenticated team)
`job_type`	string	Yes	Type of job (e.g., "chat_response", "text_generation")
`model`	string	Yes	Model alias or model group name
`messages`	array	Yes	OpenAI-compatible messages array
`user_id`	string	No	Optional user identifier for tracking
`job_metadata`	object	No	Custom metadata for the job
`purpose`	string	No	Optional label for the LLM call (e.g., "chat", "generation")
`temperature`	number	No	Sampling temperature (0.0-2.0, default: 0.7)
`max_tokens`	integer	No	Maximum tokens to generate
`response_format`	object	No	Structured output format (e.g., {"type": "json_object"})
`tools`	array	No	Function calling tools
`tool_choice`	any	No	Tool choice strategy ("auto", "none", or specific tool)
`top_p`	number	No	Nucleus sampling parameter
`frequency_penalty`	number	No	Frequency penalty (-2.0 to 2.0)
`presence_penalty`	number	No	Presence penalty (-2.0 to 2.0)
`stop`	array	No	Stop sequences

Response (200 OK):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "completed",
  "response": {
    "content": "Python is a high-level programming language...",
    "finish_reason": "stop"
  },
  "metadata": {
    "tokens_used": 256,
    "latency_ms": 1340,
    "model": "gpt-4"
  },
  "costs": {
    "total_calls": 1,
    "successful_calls": 1,
    "failed_calls": 0,
    "total_tokens": 256,
    "total_cost_usd": 0.0128,
    "avg_latency_ms": 1340,
    "credit_applied": true,
    "credits_remaining": 999
  },
  "completed_at": "2025-10-14T12:00:05.340Z"
}

Response Fields:

Field	Type	Description
`job_id`	string (UUID)	Unique job identifier
`status`	string	Always "completed" for successful calls
`response.content`	string	The generated response content
`response.finish_reason`	string	Why generation stopped: "stop", "length", or "content_filter"
`metadata.tokens_used`	integer	Total tokens used (prompt + completion)
`metadata.latency_ms`	integer	Call latency in milliseconds
`metadata.model`	string	Model alias or group that was requested
`costs`	object	Aggregated cost information
`costs.credit_applied`	boolean	Whether a credit was deducted (always true for successful calls)
`costs.credits_remaining`	integer	Credits remaining for the team
`completed_at`	string (ISO 8601)	Job completion timestamp

Example Request:

cURLPythonJavaScript

curl -X POST http://localhost:8003/api/jobs/create-and-call \
  -H "Authorization: Bearer sk-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "acme-corp",
    "job_type": "chat_response",
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "What is Python?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

import requests

API_URL = "http://localhost:8003/api"
VIRTUAL_KEY = "sk-your-virtual-key"

headers = {
    "Authorization": f"Bearer {VIRTUAL_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(
    f"{API_URL}/jobs/create-and-call",
    headers=headers,
    json={
        "team_id": "acme-corp",
        "job_type": "chat_response",
        "model": "gpt-4",
        "messages": [
            {"role": "user", "content": "What is Python?"}
        ],
        "temperature": 0.7,
        "max_tokens": 500
    }
)

result = response.json()
print(f"Response: {result['response']['content']}")
print(f"Tokens used: {result['metadata']['tokens_used']}")
print(f"Credits remaining: {result['costs']['credits_remaining']}")

const API_URL = "http://localhost:8003/api";
const VIRTUAL_KEY = "sk-your-virtual-key";

const response = await fetch(`${API_URL}/jobs/create-and-call`, {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${VIRTUAL_KEY}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    team_id: 'acme-corp',
    job_type: 'chat_response',
    model: 'gpt-4',
    messages: [
      {role: 'user', content: 'What is Python?'}
    ],
    temperature: 0.7,
    max_tokens: 500
  })
});

const result = await response.json();
console.log(`Response: ${result.response.content}`);
console.log(`Tokens used: ${result.metadata.tokens_used}`);
console.log(`Credits remaining: ${result.costs.credits_remaining}`);

When to Use This Endpoint:

✅ Best for: - Chat applications with single-turn responses - Simple text generation tasks - Quick question/answer scenarios - Any workflow with exactly one LLM call

❌ Not ideal for: - Complex workflows requiring multiple LLM calls - Agentic workflows with branching logic - Batch processing multiple documents - Scenarios where you need to inspect intermediate results

Performance Comparison:

Workflow Type	API Calls	Latency	Use Case
Single-Call	1	~1.5s	Chat apps, simple tasks
Multi-Step	3+	~4.5s+	Complex workflows

Error Handling:

If the LLM call fails, the job is automatically marked as failed (no credit deducted):

try:
    response = requests.post(
        f"{API_URL}/jobs/create-and-call",
        headers=headers,
        json={
            "team_id": "acme-corp",
            "job_type": "chat",
            "model": "gpt-4",
            "messages": messages
        },
        timeout=30
    )
    response.raise_for_status()
    result = response.json()

except requests.exceptions.HTTPError as e:
    if e.response.status_code == 403:
        print(f"Access error: {e.response.json()['detail']}")
    elif e.response.status_code == 500:
        print(f"LLM call failed: {e.response.json()['detail']}")
except requests.exceptions.Timeout:
    print("Request timed out")

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Virtual key does not belong to team, or model access denied
422	Validation Error	Invalid request data
500	Internal Server Error	LLM call failed (job marked as failed, no credit charged)

Single-Call Job with Streaming (Create, Call, and Complete)¶

Create a job, stream LLM response in real-time via Server-Sent Events (SSE), and complete the job automatically. This is the recommended endpoint for chat applications and any scenario requiring real-time token streaming.

Endpoint: POST /api/jobs/create-and-call-stream

Authentication: Required (virtual key)

Performance Benefits: - Real-time streaming responses (lowest time-to-first-token) - Single API call (automatic job lifecycle) - Server-Sent Events (SSE) protocol - Automatic credit deduction on completion

Request Body:

Same as /api/jobs/create-and-call - see above for full parameter list.

{
  "team_id": "acme-corp",
  "job_type": "chat_response",
  "model": "gpt-4",
  "messages": [
    {
      "role": "user",
      "content": "Tell me a short story"
    }
  ],
  "temperature": 0.7,
  "max_tokens": 500
}

Response:

Server-Sent Events (SSE) stream with Content-Type: text/event-stream

SSE Event Format:

Each event follows the SSE format: data: {JSON}\n\n

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":"Once"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" upon"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4","choices":[{"index":0,"delta":{"content":" a"},"finish_reason":null}]}

data: [DONE]

Stream Events:

Event	Description
Chunk with `delta.role`	First chunk, contains role ("assistant")
Chunk with `delta.content`	Text content chunk to append to response
Chunk with `finish_reason`	Final chunk, reason: "stop", "length", or "content_filter"
`[DONE]`	Stream complete (job auto-completed, credits deducted)

Example Request:

cURLPython (requests)Python (httpx async)JavaScript (fetch)

curl -N -X POST http://localhost:8003/api/jobs/create-and-call-stream \
  -H "Authorization: Bearer sk-your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "acme-corp",
    "job_type": "chat_response",
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Tell me a short story"}
    ],
    "temperature": 0.7
  }'

Note: -N flag disables buffering for real-time streaming

import requests
import json

API_URL = "http://localhost:8003/api"
VIRTUAL_KEY = "sk-your-virtual-key"

headers = {
    "Authorization": f"Bearer {VIRTUAL_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(
    f"{API_URL}/jobs/create-and-call-stream",
    headers=headers,
    json={
        "team_id": "acme-corp",
        "job_type": "chat_response",
        "model": "gpt-4",
        "messages": [
            {"role": "user", "content": "Tell me a short story"}
        ],
        "temperature": 0.7
    },
    stream=True  # Important: enable streaming
)

# Process Server-Sent Events
accumulated = ""
for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data_str = line[6:]  # Remove 'data: ' prefix

            if data_str == '[DONE]':
                print("\n\nStream complete!")
                break

            try:
                chunk = json.loads(data_str)
                if chunk.get("choices"):
                    delta = chunk["choices"][0].get("delta", {})
                    content = delta.get("content", "")
                    if content:
                        accumulated += content
                        print(content, end="", flush=True)
            except json.JSONDecodeError:
                continue

print(f"\n\nFull response: {accumulated}")

import httpx
import json
import asyncio

async def stream_chat():
    API_URL = "http://localhost:8003/api"
    VIRTUAL_KEY = "sk-your-virtual-key"

    headers = {
        "Authorization": f"Bearer {VIRTUAL_KEY}",
        "Content-Type": "application/json"
    }

    async with httpx.AsyncClient() as client:
        async with client.stream(
            "POST",
            f"{API_URL}/jobs/create-and-call-stream",
            headers=headers,
            json={
                "team_id": "acme-corp",
                "job_type": "chat",
                "model": "gpt-4",
                "messages": [
                    {"role": "user", "content": "Tell me a short story"}
                ]
            },
            timeout=60.0
        ) as response:
            accumulated = ""
            async for line in response.aiter_lines():
                if line.startswith('data: '):
                    data_str = line[6:]

                    if data_str == '[DONE]':
                        print("\n\nStream complete!")
                        break

                    try:
                        chunk = json.loads(data_str)
                        if chunk.get("choices"):
                            delta = chunk["choices"][0].get("delta", {})
                            content = delta.get("content", "")
                            if content:
                                accumulated += content
                                print(content, end="", flush=True)
                    except json.JSONDecodeError:
                        continue

            return accumulated

# Run the async function
response = asyncio.run(stream_chat())
print(f"\n\nFinal response: {response}")

const API_URL = "http://localhost:8003/api";
const VIRTUAL_KEY = "sk-your-virtual-key";

async function streamChat() {
  const response = await fetch(`${API_URL}/jobs/create-and-call-stream`, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${VIRTUAL_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      team_id: 'acme-corp',
      job_type: 'chat_response',
      model: 'gpt-4',
      messages: [
        {role: 'user', content: 'Tell me a short story'}
      ],
      temperature: 0.7
    })
  });

  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let accumulated = '';

  while (true) {
    const {done, value} = await reader.read();
    if (done) break;

    const chunk = decoder.decode(value);
    const lines = chunk.split('\n');

    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = line.slice(6);

        if (data === '[DONE]') {
          console.log('\n\nStream complete!');
          return accumulated;
        }

        try {
          const parsed = JSON.parse(data);
          if (parsed.choices && parsed.choices[0].delta.content) {
            const content = parsed.choices[0].delta.content;
            accumulated += content;
            process.stdout.write(content); // Real-time output
          }
        } catch (e) {
          // Skip invalid JSON
        }
      }
    }
  }

  return accumulated;
}

// Use the function
streamChat().then(response => {
  console.log(`\n\nFull response: ${response}`);
});

When to Use This Endpoint:

✅ Best for: - Chat applications (real-time user experience) - Any scenario requiring streaming responses - Single LLM call workflows - Lowest time-to-first-token latency

❌ Not ideal for: - Complex workflows with multiple LLM calls - When you need non-streaming responses (use /api/jobs/create-and-call) - Scenarios where streaming is not supported by the client

Streaming vs Non-Streaming Comparison:

Endpoint	Response Type	Time to First Token	Use Case
`/create-and-call-stream`	SSE Stream	~300ms	Chat apps, real-time UX
`/create-and-call`	JSON	~1500ms	Simple responses, batch processing

Error Handling:

Errors are sent as SSE events:

data: {"error": "Model not found"}

data: [DONE]

Example error handling in Python:

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data_str = line[6:]

            if data_str == '[DONE]':
                break

            try:
                chunk = json.loads(data_str)

                # Check for errors
                if "error" in chunk:
                    print(f"Error: {chunk['error']}")
                    break

                # Process normal chunk
                if chunk.get("choices"):
                    delta = chunk["choices"][0].get("delta", {})
                    content = delta.get("content", "")
                    if content:
                        print(content, end="", flush=True)
            except json.JSONDecodeError:
                continue

Automatic Job Completion:

When the stream completes: 1. Job is automatically marked as "completed" 2. Credits are deducted based on budget mode (see Teams API) 3. LLM call is stored in database 4. Cost summary is calculated

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Virtual key does not belong to team, or model access denied
422	Validation Error	Invalid request data
500	Internal Server Error	LLM streaming failed (job marked as failed, no credit charged)

SSE Client Requirements:

Set stream=True in requests library (Python)
Use response.body.getReader() in JavaScript fetch
Handle line-by-line parsing (\n\n delimiters)
Parse JSON after removing data: prefix

See Also:

Streaming Guide - Detailed streaming documentation
Non-Streaming Single-Call - JSON response version
Multi-Step Streaming - For complex workflows

Get Job Costs¶

Get detailed cost breakdown for a job (internal analytics).

Endpoint: GET /api/jobs/{job_id}/costs

Authentication: Required (virtual key)

Path Parameters:

Parameter	Type	Description
`job_id`	string (UUID)	The job identifier

Response (200 OK):

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "team_id": "acme-corp",
  "job_type": "resume_analysis",
  "status": "completed",
  "costs": {
    "total_cost_usd": 0.0045,
    "breakdown": [
      {
        "call_id": "call-uuid-1",
        "model": "gpt-4-turbo",
        "purpose": "parse",
        "prompt_tokens": 200,
        "completion_tokens": 250,
        "cost_usd": 0.0015,
        "created_at": "2025-10-14T12:00:10.000Z"
      },
      {
        "call_id": "call-uuid-2",
        "model": "gpt-4-turbo",
        "purpose": "analyze",
        "prompt_tokens": 220,
        "completion_tokens": 260,
        "cost_usd": 0.0016,
        "created_at": "2025-10-14T12:00:15.000Z"
      },
      {
        "call_id": "call-uuid-3",
        "model": "gpt-4-turbo",
        "purpose": "summarize",
        "prompt_tokens": 180,
        "completion_tokens": 240,
        "cost_usd": 0.0014,
        "created_at": "2025-10-14T12:00:20.000Z"
      }
    ]
  }
}

Example Request:

cURLPython

curl -X GET http://localhost:8003/api/jobs/{job_id}/costs \
  -H "Authorization: Bearer sk-your-virtual-key"

response = requests.get(
    f"{API_URL}/jobs/{job_id}/costs",
    headers=headers
)

costs = response.json()
print(f"Total cost: ${costs['costs']['total_cost_usd']:.4f}")
for call in costs['costs']['breakdown']:
    print(f"  {call['purpose']}: ${call['cost_usd']:.4f}")

Error Responses:

Status Code	Error	Description
401	Unauthorized	Invalid or missing virtual key
403	Forbidden	Job does not belong to your team
404	Not Found	Job not found

Job Lifecycle¶

Jobs progress through the following states:

stateDiagram-v2
    [*] --> pending: Create Job
    pending --> in_progress: First LLM Call
    in_progress --> in_progress: Additional LLM Calls
    in_progress --> completed: Complete (status: completed)
    in_progress --> failed: Complete (status: failed)
    completed --> [*]: 1 credit deducted
    failed --> [*]: No credit deducted

Job States¶

State	Description	Credit Impact
`pending`	Job created, no LLM calls yet	None
`in_progress`	At least one LLM call made	None (until completion)
`completed`	Job successfully completed	1 credit deducted
`failed`	Job failed or cancelled	No credit deducted

Complete Workflow Example¶

Here's a complete example showing the full job workflow:

import requests

API_URL = "http://localhost:8003/api"
VIRTUAL_KEY = "sk-your-virtual-key"

headers = {
    "Authorization": f"Bearer {VIRTUAL_KEY}",
    "Content-Type": "application/json"
}

# 1. Create job
job_response = requests.post(
    f"{API_URL}/jobs/create",
    headers=headers,
    json={
        "team_id": "acme-corp",
        "user_id": "john@acme.com",
        "job_type": "document_analysis",
        "metadata": {
            "document_id": "doc_123",
            "document_name": "report.pdf"
        }
    }
)
job = job_response.json()
job_id = job["job_id"]
print(f"Created job: {job_id}")

# 2. Make LLM calls (see LLM Calls API documentation)
# ...

# 3. Complete job
complete_response = requests.post(
    f"{API_URL}/jobs/{job_id}/complete",
    headers=headers,
    json={
        "status": "completed",
        "metadata": {
            "result": "success"
        }
    }
)

result = complete_response.json()
print(f"Job completed!")
print(f"Total calls: {result['costs']['total_calls']}")
print(f"Total cost: ${result['costs']['total_cost_usd']:.4f}")
print(f"Credit deducted: {result['costs']['credit_applied']}")
print(f"Credits remaining: {result['costs']['credits_remaining']}")

Rate Limiting¶

Jobs API endpoints are subject to rate limiting per team:

Requests per minute (RPM): Configurable per team
Default: 100 RPM

When rate limited, you'll receive a 429 Too Many Requests response. Implement exponential backoff for retries.

Best Practices¶

Always complete jobs - Even if failed, complete the job to prevent "zombie jobs"
Use meaningful job types - Use descriptive types like "resume_parsing" not "job1"
Add contextual metadata - Include relevant context for analytics and debugging
Handle errors gracefully - Complete failed jobs with status "failed"
Monitor credit balance - Check credits before creating jobs

Jobs API¶

Overview¶

Endpoints¶

Create Job¶

Get Job¶

Complete Job¶

Update Job Metadata¶

Single-Call Job (Create, Call, and Complete)¶

Single-Call Job with Streaming (Create, Call, and Complete)¶

Get Job Costs¶

Job Lifecycle¶

Job States¶

Complete Workflow Example¶

Rate Limiting¶

Best Practices¶

See Also¶