Framework Integrations

The rLLM SDK integrates seamlessly with popular agent frameworks and infrastructure tools. This guide shows you how to use the SDK with LangGraph, SmolAgent, Strands, and the LiteLLM proxy.

LangGraph Integration

Deprecated for training use. The patterns below use the SDK’s traced chat clients (get_chat_client / get_chat_client_async) to capture LangChain LLM calls in-process. For training under AgentTrainer, the rLLM model gateway captures token-accurate traces by URL-routed sessions automatically — pointing ChatOpenAI at config.base_url is sufficient. See the langgraph_math cookbook for the AgentFlow-protocol replacement pattern.

LangGraph is a library for building stateful, multi-actor applications with LLMs. The rLLM SDK tracks all LLM calls made through LangGraph agents.

Basic Setup

import os
from langchain_openai import ChatOpenAI
from langgraph.graph import END, START, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode, tools_condition

from rllm.sdk import get_chat_client, get_chat_client_async, session
from rllm.sdk.session.base import _ensure_tracer_initialized

# Initialize rLLM SDK chat clients
sync_client = get_chat_client(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="http://localhost:4000/v1",  # LiteLLM proxy
    use_proxy=True,
)

async_client = get_chat_client_async(
    api_key=os.getenv("OPENAI_API_KEY"),
    base_url="http://localhost:4000/v1",
    use_proxy=True,
)

# Configure tracer (optional, for distributed tracing)
_ensure_tracer_initialized("search_agent")

# Pass both clients to ChatOpenAI
response_model = ChatOpenAI(
    model="gpt-4",
    temperature=0.7,
    max_tokens=2048,
    client=sync_client,        # For sync operations
    async_client=async_client, # For async operations
)

Complete LangGraph Agent Example

async def agent_step(state: MessagesState):
    """Agent decides whether to call tools or provide final answer."""
    response = await response_model.bind_tools([retriever_tool]).ainvoke(state["messages"])
    return {"messages": [response]}

# Build the workflow
workflow = StateGraph(MessagesState)
workflow.add_node("agent", agent_step)
workflow.add_node("tools", ToolNode([retriever_tool]))

workflow.add_edge(START, "agent")
workflow.add_conditional_edges(
    "agent",
    tools_condition,
    {
        "tools": "tools",
        END: END,
    },
)
workflow.add_edge("tools", "agent")

graph = workflow.compile()

# Run with session tracking
async def run_agent(question: str):
    with session(experiment="v1", task="qa") as sess:
        async for chunk in graph.astream(
            {"messages": [{"role": "user", "content": question}]},
            {"recursion_limit": 20},
        ):
            # Process chunks
            pass
        
        # Access collected traces
        print(f"Collected {len(sess.llm_calls)} traces")
        return sess.llm_calls

await run_agent("What is the capital of France?")

Key Points:

Pass both client and async_client to ChatOpenAI
Use get_chat_client() and get_chat_client_async() from rLLM SDK
Wrap execution in session() context to collect traces
All LLM calls are automatically tracked

Source: examples/sdk/langgraph/search_agent_langgraph.py

SmolAgent Integration

SmolAgent is HuggingFace’s lightweight agent framework. The rLLM SDK can track SmolAgent executions.

Basic Setup

from rllm.sdk import session, get_chat_client
from smolagents import CodeAgent, HfApiModel

# Initialize rLLM SDK client
llm = get_chat_client(
    api_key="hf_...",
    base_url="https://api-inference.huggingface.co/v1",
)

# Create SmolAgent with tracked client
agent = CodeAgent(
    tools=[],
    model=HfApiModel(client=llm),
)

# Track agent execution
with session(experiment="smol_v1", agent="code") as sess:
    result = agent.run("Calculate 2+2")
    print(f"Collected {len(sess.llm_calls)} traces")

Advanced SmolAgent Usage

from rllm.sdk import trajectory

@trajectory(name="smol_agent")
async def run_smol_agent(task: str):
    agent = CodeAgent(
        tools=[search_tool, calculator_tool],
        model=HfApiModel(client=llm),
    )
    
    result = agent.run(task)
    return result

# Returns TrajectoryView with all agent steps
traj = await run_smol_agent("Search for Python tutorials")
print(f"Steps: {len(traj.steps)}")
print(f"Result: {traj.output}")

# Assign rewards
for step in traj.steps:
    step.reward = evaluate_step(step)
traj.reward = sum(s.reward for s in traj.steps)

Strands Integration

Strands is a framework for building production-grade AI agents. The rLLM SDK integrates via tracked chat clients.

Basic Setup

from rllm.sdk import session, get_chat_client_async
from strands import Agent, Tool

# Initialize rLLM SDK client
llm = get_chat_client_async(
    api_key="sk-...",
    base_url="https://api.openai.com/v1",
)

# Create Strands agent with tracked client
agent = Agent(
    name="research_assistant",
    model=llm,
    tools=[search_tool, summarize_tool],
)

# Track execution
with session(experiment="strands_v1") as sess:
    result = await agent.run("Research quantum computing")
    print(f"Collected {len(sess.llm_calls)} traces")

Multi-Agent Strands Workflow

from rllm.sdk import trajectory

@trajectory(name="multi_agent_workflow")
async def run_multi_agent(task: str):
    # Create multiple agents
    researcher = Agent(name="researcher", model=llm, tools=[search_tool])
    writer = Agent(name="writer", model=llm, tools=[summarize_tool])
    
    # Research phase
    research_results = await researcher.run(f"Research: {task}")
    
    # Writing phase
    final_output = await writer.run(f"Write report: {research_results}")
    
    return final_output

# Track entire workflow
traj = await run_multi_agent("AI safety research")
print(f"Total steps: {len(traj.steps)}")
print(f"Final output: {traj.output}")

LiteLLM Proxy Integration

The rLLM SDK includes deep integration with LiteLLM proxy for metadata routing and trace collection.

Proxy Architecture

The SDK uses metadata slug encoding to route session context through the proxy:

Client (SDK) → Metadata Slug → LiteLLM Proxy → LLM Provider
                     ↓
              Session Context
              (session_name, experiment, etc.)

Metadata Slug Encoding

The SDK encodes metadata into the URL path for proxy routing:

from rllm.sdk.proxy import (
    encode_metadata_slug,
    decode_metadata_slug,
    build_proxied_base_url,
)

# Encode metadata into URL
metadata = {
    "session_name": "sess_123",
    "experiment": "v1",
    "task": "math"
}

slug = encode_metadata_slug(metadata)
print(slug)  # "rllm1:eyJleHBlcmltZW50IjoidjEiLCAic2Vzc2lvbl9uYW1lIjoic2Vzc18xMjMiLCAidGFzayI6Im1hdGgifQ"

# Build proxied base URL
proxied_url = build_proxied_base_url(
    "http://localhost:4000/v1",
    metadata
)
print(proxied_url)
# "http://localhost:4000/meta/rllm1:eyJleHBlcmltZW50IjoidjEiLCAic2Vzc2lvbl9uYW1lIjoic2Vzc18xMjMiLCAidGFzayI6Im1hdGgifQ/v1"

# Decode metadata from slug
decoded = decode_metadata_slug(slug)
print(decoded)  # {"session_name": "sess_123", "experiment": "v1", "task": "math"}

Automatic Metadata Routing

The SDK automatically routes metadata when use_proxy=True:

from rllm.sdk import session, get_chat_client

llm = get_chat_client(
    api_key="EMPTY",
    base_url="http://localhost:4000/v1",
    use_proxy=True,  # Enable automatic metadata routing
)

with session(experiment="v1", task="math"):
    # Metadata automatically encoded into URL:
    # http://localhost:4000/meta/rllm1:eyJleHBlcmltZW50IjoidjEiLCAidGFzayI6Im1hdGgifQ/v1/chat/completions
    response = llm.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )

Proxy Middleware

Add the MetadataRoutingMiddleware to your ASGI app to decode metadata slugs:

from fastapi import FastAPI
from rllm.sdk.proxy import MetadataRoutingMiddleware

app = FastAPI()

# Add middleware to extract metadata from URL
app = MetadataRoutingMiddleware(app)

@app.post("/v1/chat/completions")
async def chat_completions(request: Request):
    # Metadata automatically extracted from URL
    # and available in request scope
    pass

The middleware:

Extracts metadata slug from request path
Decodes metadata and injects into request scope
Cleans the path before forwarding to LiteLLM

LiteLLM Callbacks

The SDK provides callbacks for LiteLLM proxy integration:

from rllm.sdk.proxy import TracingCallback, SamplingParametersCallback

# TracingCallback: Collect traces from LLM calls
tracing_callback = TracingCallback()

# SamplingParametersCallback: Handle sampling parameters
sampling_callback = SamplingParametersCallback()

# Use with LiteLLM
callbacks = [tracing_callback, sampling_callback]

TracingCallback: Automatically collects traces and stores them based on session context. SamplingParametersCallback: Handles custom sampling parameters (temperature, top_p, etc.).

Starting the Proxy

Start the LiteLLM proxy with rLLM configuration:

#!/bin/bash

# Set ulimit and connection limits
ulimit -n 65536
export AIOHTTP_CONNECTOR_LIMIT=4096
export AIOHTTP_KEEPALIVE_TIMEOUT=60

# Start LiteLLM proxy
python -m rllm.sdk.proxy.litellm_server \
  --config litellm_proxy_config.yaml \
  --host 127.0.0.1 \
  --port 4000 \
  --state-dir /tmp/litellm_proxy \
  --cs-endpoint http://localhost:8000 \
  --cs-api-key "your-api-key" \
  --project rllm-agent-sdk \
  --admin-token my-shared-secret

Proxy Configuration (litellm_proxy_config.yaml):

model_list:
  - model_name: gpt-4
    litellm_params:
      model: openai/gpt-4
      api_key: sk-...
  
  - model_name: claude-3-opus
    litellm_params:
      model: anthropic/claude-3-opus-20240229
      api_key: sk-ant-...

general_settings:
  master_key: my-shared-secret
  database_url: sqlite:///litellm_proxy.db

Proxy Manager (Subprocess Mode)

The SDK can automatically manage the proxy lifecycle:

from rllm.sdk.proxy import ProxyManager

# Automatic proxy management
with ProxyManager(
    config_path="litellm_proxy_config.yaml",
    host="127.0.0.1",
    port=4000,
) as proxy:
    # Proxy automatically started
    llm = get_chat_client(
        base_url="http://localhost:4000/v1",
        use_proxy=True,
    )
    
    with session(experiment="v1"):
        llm.chat.completions.create(...)
    
    # Proxy automatically stopped on exit

This is useful for training scripts that need temporary proxy instances. Source: examples/sdk/README.md

OpenTelemetry Integration

For distributed tracing across multiple services, use the OpenTelemetry backend:

Configuration

Edit rllm/sdk/config.yaml:

session_backend: "opentelemetry"

Install dependencies:

pip install rllm[otel]

Basic Usage

from rllm.sdk.session import otel_session, configure_default_tracer
from rllm.sdk import get_chat_client_async

# Configure tracer once per process
configure_default_tracer(service_name="my-agent")

llm = get_chat_client_async(
    api_key="sk-...",
    base_url="https://api.openai.com/v1",
)

# Client service
with otel_session(name="client") as client_sess:
    # Make HTTP call to server
    # Session context automatically propagated via W3C baggage headers
    httpx.post("http://server/api", ...)

# Server service
with otel_session(name="handler") as server_sess:
    # Automatically inherits client's session context
    await llm.chat.completions.create(...)
    
    # Access traces (async)
    traces = await server_sess.llm_calls_async()

Cross-Service Context Propagation

OpenTelemetry sessions use W3C baggage for automatic context propagation:

# Service A (client)
with otel_session(experiment="v1", user="alice"):
    # HTTP request automatically includes baggage headers:
    # baggage: rllm-session={"metadata":{"experiment":"v1","user":"alice"},...}
    httpx.post("http://service-b/api", ...)

# Service B (server)
with otel_session(task="processing"):
    # Automatically inherits experiment="v1", user="alice"
    # Merges with task="processing"
    metadata = get_current_metadata()
    print(metadata)  # {"experiment": "v1", "user": "alice", "task": "processing"}

Key Features:

Baggage is the single source of truth
Automatic HTTP header propagation
Compatible with OpenTelemetry observability tools
Works across process boundaries

Framework Comparison

Framework	Integration Method	Session Support	Trajectory Support	Use Case
LangGraph	Tracked chat client	✅	✅	Multi-agent workflows
SmolAgent	Tracked chat client	✅	✅	Code generation agents
Strands	Tracked chat client	✅	✅	Production AI apps
LiteLLM Proxy	Metadata routing	✅	✅	Infrastructure layer
OpenTelemetry	W3C baggage	✅	✅	Distributed tracing

Best Practices

Use tracked clients: Always initialize LLM clients with get_chat_client() or get_chat_client_async()
Enable proxy for training: Set use_proxy=True when routing through LiteLLM proxy
Wrap execution in sessions: Use session() context manager to collect traces
Leverage trajectories for RL: Use @trajectory decorator for reinforcement learning workflows
Configure OpenTelemetry for distributed systems: Use otel_session for multi-service architectures
Set metadata strategically: Add experiment tracking metadata at the session level

Example: End-to-End LangGraph Training

Complete example combining LangGraph, LiteLLM proxy, and trajectory tracking:

import asyncio
from langchain_openai import ChatOpenAI
from langgraph.graph import END, START, MessagesState, StateGraph

from rllm.sdk import get_chat_client_async, trajectory
from rllm.sdk.session.base import _ensure_tracer_initialized

# Initialize
_ensure_tracer_initialized("rag_agent")
llm = get_chat_client_async(
    base_url="http://localhost:4000/v1",
    api_key="EMPTY",
    use_proxy=True,
)

model = ChatOpenAI(model="gpt-4", async_client=llm)

# Build LangGraph workflow
workflow = StateGraph(MessagesState)
workflow.add_node("agent", lambda s: {"messages": [model.ainvoke(s["messages"])]})
workflow.add_edge(START, "agent")
workflow.add_edge("agent", END)
graph = workflow.compile()

# Trajectory-tracked execution
@trajectory(name="rag_agent")
async def run_rag_agent(question: str):
    result = await graph.ainvoke(
        {"messages": [{"role": "user", "content": question}]}
    )
    return result["messages"][-1].content

# Execute and collect training data
async def main():
    questions = [
        "What is quantum computing?",
        "Explain neural networks",
        "What is reinforcement learning?",
    ]
    
    trajectories = []
    for q in questions:
        traj = await run_rag_agent(q)
        
        # Assign rewards
        for step in traj.steps:
            step.reward = evaluate_quality(step.output)
        traj.reward = sum(s.reward for s in traj.steps) / len(traj.steps)
        
        trajectories.append(traj)
    
    # Use trajectories for training
    return trajectories

trajectories = asyncio.run(main())
print(f"Collected {len(trajectories)} trajectories for training")

Get started

Tutorials

rLLM CLI & UI

Core concepts

Agent runtimes

Training backends

Guides

Unified workflow trainer

Advanced algorithms

Framework Integrations

LangGraph Integration

Basic Setup

Complete LangGraph Agent Example

SmolAgent Integration

Basic Setup

Advanced SmolAgent Usage

Strands Integration

Basic Setup

Multi-Agent Strands Workflow

LiteLLM Proxy Integration

Proxy Architecture

Metadata Slug Encoding

Automatic Metadata Routing

Proxy Middleware

LiteLLM Callbacks

Starting the Proxy

Proxy Manager (Subprocess Mode)

OpenTelemetry Integration

Configuration

Basic Usage

Cross-Service Context Propagation

Framework Comparison

Best Practices

Example: End-to-End LangGraph Training

Next Steps

Training

API Reference

Get started

Tutorials

rLLM CLI & UI

Core concepts

Agent runtimes

Training backends

Guides

Unified workflow trainer

Advanced algorithms

Documentation Index

​LangGraph Integration

​Basic Setup

​Complete LangGraph Agent Example

​SmolAgent Integration

​Basic Setup

​Advanced SmolAgent Usage

​Strands Integration

​Basic Setup

​Multi-Agent Strands Workflow

​LiteLLM Proxy Integration

​Proxy Architecture

​Metadata Slug Encoding

​Automatic Metadata Routing

​Proxy Middleware

​LiteLLM Callbacks

​Starting the Proxy

​Proxy Manager (Subprocess Mode)

​OpenTelemetry Integration

​Configuration

​Basic Usage

​Cross-Service Context Propagation

​Framework Comparison

​Best Practices

​Example: End-to-End LangGraph Training

​Next Steps

Training

API Reference

LangGraph Integration

Basic Setup

Complete LangGraph Agent Example

SmolAgent Integration

Basic Setup

Advanced SmolAgent Usage

Strands Integration

Basic Setup

Multi-Agent Strands Workflow

LiteLLM Proxy Integration

Proxy Architecture

Metadata Slug Encoding

Automatic Metadata Routing

Proxy Middleware

LiteLLM Callbacks

Starting the Proxy

Proxy Manager (Subprocess Mode)

OpenTelemetry Integration

Configuration

Basic Usage

Cross-Service Context Propagation

Framework Comparison

Best Practices

Example: End-to-End LangGraph Training

Next Steps