A multi-turn math agent authored with LangGraph’sDocumentation Index
Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt
Use this file to discover all available pages before exploring further.
create_react_agent, trained end-to-end with rLLM. Demonstrates that any LangGraph agent integrates with rLLM training without writing a callback handler — pointing LangChain’s ChatOpenAI at config.base_url is enough, because the rLLM model gateway captures every LLM call.
This cookbook is the AgentFlow-protocol counterpart of the legacy rllm.sdk.integrations.langgraph callback-handler approach (now deprecated). It is intentionally a near-clone of cookbooks/math_tool_agent so you can compare a hand-rolled tool loop against a LangGraph one on the same dataset.
Pattern
| Aspect | Value |
|---|---|
| Loop shape | Multi-turn (LangGraph recursion_limit=25) |
| Tools | One: calculate — AST-based safe arithmetic interpreter |
| Termination | LangGraph stops when the LLM emits no further tool calls |
| Reward shape | 1.0 if final answer matches ground truth (mathd + sympy), else 0.0 |
| Return type | None — the gateway captures everything; the framework auto-builds the Episode |
Why so little code?
Step / Trajectory construction. The mechanism:
- LangChain’s
ChatOpenAI(base_url=…)issues OpenAI Chat Completions requests against the gateway session URL the trainer provides. - The gateway middleware extracts the session id from the URL path (
/sessions/{sid}/v1/...) and persists every request/response as aTraceRecordkeyed by that session. - The flow returns
None. The framework’s coercion (rllm.types._coerce_to_episode) builds an empty single-trajectoryEpisode. During enrichment the gateway’s traces become the trajectory’sSteps, populated with prompt/response token IDs and per-token logprobs ready for training. - The evaluator reads the agent’s final assistant message from
episode.trajectories[-1].steps[-1].model_responseand grades it against ground truth.
name is "langgraph-math" (set on @rllm.rollout), all rollouts of the same task hash to the same f"{task_id}:langgraph-math" key when the trainer builds TrajectoryGroups for GRPO advantage computation.
Architecture
Install
Datasets
Same datasets asmath_tool_agent so you can compare learning curves between a hand-rolled tool loop and a LangGraph one:
Eval
Training
train_verl.sh):
Evaluator
The flow returnedNone, so episode.artifacts is empty. The evaluator extracts the answer directly from the gateway-captured trajectory — that’s the canonical “evaluator parses the Trajectory” pattern under AgentFlow:
model_response. The evaluator walks backwards until it finds the last assistant turn.
Files
| File | Description |
|---|---|
langgraph_math.py | The create_react_agent AgentFlow + safe calculator |
evaluator.py | Reads answer from gateway-captured trajectory |
train.py + train_{tinker,verl}.sh | Hydra entry points |
pyproject.toml | Plugin entry-point declarations |
test.py | Unit tests for calculator, parsing, and evaluation |
Migration from rllm.sdk.integrations.langgraph
If you previously used RLLMTrajectoryCallbackHandler to capture LangChain’s LLM calls into rLLM Trajectory objects in-process, that path is deprecated for training. The gateway now provides token-accurate trace capture out of band, so for any LangGraph agent that uses ChatOpenAI (or any other LangChain client that accepts a base_url) the integration collapses to:
AgentTrainer you should use this cookbook’s pattern instead.
On GitHub
cookbooks/langgraph_math
Full source, README, and runnable launch scripts

