Skip to main content
The rLLM SDK is a lightweight toolkit for automatic LLM trace collection using session contexts and trajectory decorators. It enables you to track, manage, and analyze LLM calls across simple functions to complex multi-agent workflows.

Core Concepts

Sessions

Sessions track all LLM calls within a context for debugging and analysis. They automatically capture traces, metadata, and provide access to collected data.
from rllm.sdk import session, get_chat_client

llm = get_chat_client(api_key="sk-...")

# Create a session to track all LLM calls
with session(experiment="v1") as sess:
    response = llm.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
    # Access all traces from this session
    print(f"Collected {len(sess.llm_calls)} traces")

Trajectories

Trajectories represent multi-step workflows where each LLM call becomes a step with assignable rewards. Use the @trajectory decorator to automatically convert function execution into structured trajectories.
from rllm.sdk import trajectory, get_chat_client_async

llm = get_chat_client_async(api_key="sk-...")

@trajectory(name="solver")
async def solve_math_problem(problem: str):
    # Each LLM call automatically becomes a step
    response1 = await llm.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": f"Solve: {problem}"}]
    )
    response2 = await llm.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Is this correct?"}]
    )
    return response2.choices[0].message.content

# Returns TrajectoryView instead of string
traj = await solve_math_problem("What is 2+2?")
print(f"Steps: {len(traj.steps)}")  # 2
traj.steps[0].reward = 1.0  # Set rewards on each step
traj.reward = sum(s.reward for s in traj.steps)

Installation

The SDK is included in the rllm package:
pip install rllm
For OpenTelemetry support (distributed tracing):
pip install rllm[otel]

Quick Start

Basic Usage

from rllm.sdk import session, get_chat_client

# Initialize chat client
llm = get_chat_client(api_key="sk-...")

# Track LLM calls in a session
with session(experiment="v1", task="greeting"):
    response = llm.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Say hello"}]
    )
    print(response.choices[0].message.content)

Nested Sessions with Metadata Inheritance

Sessions can be nested, and metadata is automatically merged:
with session(experiment="v1"):
    with session(task="math"):
        # All traces get: {experiment: "v1", task: "math"}
        llm.chat.completions.create(...)

Architecture

rllm/sdk/
├── __init__.py              # Public exports
├── protocol.py              # Data models (Trace, StepView, TrajectoryView)
├── decorators.py            # @trajectory decorator
├── shortcuts.py             # session(), get_chat_client()
├── session/
│   ├── contextvar.py        # ContextVarSession (default backend)
│   ├── opentelemetry.py     # OpenTelemetrySession (W3C baggage-based)
│   ├── session_buffer.py    # SessionBuffer (ephemeral trace storage)
│   └── base.py              # SessionProtocol, wrap_with_session_context()
├── chat/
│   └── openai.py            # Tracked OpenAI chat clients
├── proxy/
│   ├── litellm_callbacks.py # TracingCallback, SamplingParametersCallback
│   ├── metadata_slug.py     # URL metadata encoding/decoding
│   └── middleware.py        # MetadataRoutingMiddleware (ASGI)
└── tracers/
    ├── memory.py            # InMemorySessionTracer
    └── sqlite.py            # SqliteTracer

Data Models

The SDK uses three primary data models:

Trace

Low-level trace from a single LLM call.
class Trace(BaseModel):
    trace_id: str
    session_name: str
    name: str
    input: LLMInput
    output: LLMOutput
    model: str
    latency_ms: float
    tokens: dict[str, int]
    metadata: dict = Field(default_factory=dict)
    timestamp: float
    parent_trace_id: str | None = None
    cost: float | None = None
    environment: str | None = None

StepView

Trace wrapper with a reward field for RL training.
class StepView(BaseModel):
    id: str                      # Trace ID
    input: Any | None = None     # LLM input
    output: Any | None = None    # LLM output
    action: Any | None = None    # Parsed action
    reward: float = 0.0          # Step reward
    metadata: dict | None = None

TrajectoryView

Collection of steps forming a complete workflow.
class TrajectoryView(BaseModel):
    name: str = "agent"
    steps: list[StepView] = Field(default_factory=list)
    reward: float = 0.0
    input: dict | None = None    # Function arguments
    output: Any = None           # Function return value
    metadata: dict | None = None

Core Functions

Session Management

from rllm.sdk import (
    session,
    get_current_session,
    get_current_session_name,
    get_current_metadata,
    get_active_session_uids,
)

# Create session with auto-generated name
session(**metadata) -> SessionContext

# Get current session (ContextVar backend only)
get_current_session() -> ContextVarSession | None

# Get session name (works with all backends)
get_current_session_name() -> str | None

# Get current metadata
get_current_metadata() -> dict

# Get active session UID chain
get_active_session_uids() -> list[str]

Chat Clients

from rllm.sdk import get_chat_client, get_chat_client_async

# Synchronous client
get_chat_client(
    provider="openai",
    use_proxy=True,
    api_key="sk-...",
    base_url="https://api.openai.com/v1",
) -> ProxyTrackedChatClient

# Async client
get_chat_client_async(
    provider="openai",
    use_proxy=True,
    **kwargs
) -> ProxyTrackedAsyncChatClient

Trajectory Decorator

from rllm.sdk import trajectory

@trajectory(name: str = "agent", **metadata)
def workflow_function(...):
    # Function body with LLM calls
    pass

Design Principles

  1. Minimal API surface: Simple, focused functions
  2. Context-based: Uses Python’s contextvars for automatic propagation
  3. Distributed-ready: OpenTelemetry backend for cross-process tracing
  4. Pluggable storage: Supports in-memory, SQLite, or custom backends
  5. Type-safe: Full type annotations with Pydantic models
  6. Async-native: First-class async/await support
  7. Proxy-integrated: Built-in support for LiteLLM proxy routing

Next Steps