Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt

Use this file to discover all available pages before exploring further.

A multi-turn math agent authored with LangGraph’s create_react_agent, trained end-to-end with rLLM. Demonstrates that any LangGraph agent integrates with rLLM training without writing a callback handler — pointing LangChain’s ChatOpenAI at config.base_url is enough, because the rLLM model gateway captures every LLM call. This cookbook is the AgentFlow-protocol counterpart of the legacy rllm.sdk.integrations.langgraph callback-handler approach (now deprecated). It is intentionally a near-clone of cookbooks/math_tool_agent so you can compare a hand-rolled tool loop against a LangGraph one on the same dataset.

Pattern

AspectValue
Loop shapeMulti-turn (LangGraph recursion_limit=25)
ToolsOne: calculate — AST-based safe arithmetic interpreter
TerminationLangGraph stops when the LLM emits no further tool calls
Reward shape1.0 if final answer matches ground truth (mathd + sympy), else 0.0
Return typeNone — the gateway captures everything; the framework auto-builds the Episode

Why so little code?

@rllm.rollout(name="langgraph-math")
async def langgraph_math(task: Task, config: AgentConfig) -> None:
    llm = ChatOpenAI(
        model=config.model,
        base_url=config.base_url,
        api_key="EMPTY",
        temperature=1.0,
    )
    agent = create_react_agent(llm, tools=[calculate], prompt=SYSTEM_PROMPT)
    await agent.ainvoke({"messages": [("user", task.instruction)]})
    return None
That’s the whole agent. No callback handler, no message format conversion, no manual Step / Trajectory construction. The mechanism:
  • LangChain’s ChatOpenAI(base_url=…) issues OpenAI Chat Completions requests against the gateway session URL the trainer provides.
  • The gateway middleware extracts the session id from the URL path (/sessions/{sid}/v1/...) and persists every request/response as a TraceRecord keyed by that session.
  • The flow returns None. The framework’s coercion (rllm.types._coerce_to_episode) builds an empty single-trajectory Episode. During enrichment the gateway’s traces become the trajectory’s Steps, populated with prompt/response token IDs and per-token logprobs ready for training.
  • The evaluator reads the agent’s final assistant message from episode.trajectories[-1].steps[-1].model_response and grades it against ground truth.
Because the trajectory’s name is "langgraph-math" (set on @rllm.rollout), all rollouts of the same task hash to the same f"{task_id}:langgraph-math" key when the trainer builds TrajectoryGroups for GRPO advantage computation.

Architecture

Install

uv pip install -e ".[tinker]"                          # rllm + tinker backend
uv pip install --no-deps -e cookbooks/langgraph_math   # this cookbook + LangGraph deps
rllm agent list                                         # should show "langgraph_math"

Datasets

Same datasets as math_tool_agent so you can compare learning curves between a hand-rolled tool loop and a LangGraph one:
rllm dataset pull deepscaler_math   # ~40K AIME/AMC/Omni-MATH/STILL competition math (train)
rllm dataset pull math500           # 500-problem test benchmark

Eval

rllm eval math500 \
    --agent langgraph_math \
    --evaluator langgraph_math \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --base-url http://localhost:8000/v1 \
    --max-examples 20

Training

# Tinker (single-machine LoRA)
bash cookbooks/langgraph_math/train_tinker.sh

# Verl (distributed GPU)
bash cookbooks/langgraph_math/train_verl.sh
The tool-call training uses verl’s vLLM tool-call parser (already wired into train_verl.sh):
+actor_rollout_ref.rollout.engine_kwargs.vllm.enable_auto_tool_choice=true
+actor_rollout_ref.rollout.engine_kwargs.vllm.tool_call_parser=hermes

Evaluator

The flow returned None, so episode.artifacts is empty. The evaluator extracts the answer directly from the gateway-captured trajectory — that’s the canonical “evaluator parses the Trajectory” pattern under AgentFlow:
def _last_assistant_text(episode: Episode) -> str:
    """Walk back through Steps to find the last assistant message."""
    if not episode.trajectories:
        return ""
    for step in reversed(episode.trajectories[-1].steps):
        if step.model_response:
            return step.model_response
    return ""

@rllm.evaluator
def langgraph_math_evaluator(task: dict, episode: Episode) -> EvalOutput:
    answer_text = _extract_answer(_last_assistant_text(episode))
    ground_truth = str(task.get("answer") or task.get("ground_truth") or "")
    is_correct = grade_answer_mathd(answer_text, ground_truth) or grade_answer_sympy(answer_text, ground_truth)
    return EvalOutput(
        reward=1.0 if is_correct else 0.0,
        is_correct=is_correct,
        signals=[Signal(name="accuracy", value=1.0 if is_correct else 0.0)],
    )
The walk-back loop matters: in a tool-using ReAct agent, the trajectory ends with an assistant turn, but tool-message Steps in between have empty model_response. The evaluator walks backwards until it finds the last assistant turn.

Files

FileDescription
langgraph_math.pyThe create_react_agent AgentFlow + safe calculator
evaluator.pyReads answer from gateway-captured trajectory
train.py + train_{tinker,verl}.shHydra entry points
pyproject.tomlPlugin entry-point declarations
test.pyUnit tests for calculator, parsing, and evaluation

Migration from rllm.sdk.integrations.langgraph

If you previously used RLLMTrajectoryCallbackHandler to capture LangChain’s LLM calls into rLLM Trajectory objects in-process, that path is deprecated for training. The gateway now provides token-accurate trace capture out of band, so for any LangGraph agent that uses ChatOpenAI (or any other LangChain client that accepts a base_url) the integration collapses to:
llm = ChatOpenAI(base_url=config.base_url, api_key="EMPTY", model=config.model)
# ...build your graph as usual...
return None
The callback handler still works for non-rLLM-training contexts where you need in-process trajectory snapshots, but for training under AgentTrainer you should use this cookbook’s pattern instead.

On GitHub

cookbooks/langgraph_math

Full source, README, and runnable launch scripts