Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt

Use this file to discover all available pages before exploring further.

A cookbook is a small Python package that ships an AgentFlow + an Evaluator together with a prepare_data.py and a pair of train_{tinker,verl}.sh scripts. Each cookbook is self-contained: install it once with pip install -e cookbooks/<name>, and the rLLM CLI discovers the agent and the evaluator by name through Python entry points.
rllm eval  <dataset> --agent <name> --evaluator <name>
rllm train <dataset> --agent <name> --evaluator <name>
Each cookbook is a plain async function that calls an OpenAI-compatible endpoint and returns an Episode. See AgentFlow & Evaluator for the protocol.

Available cookbooks

CookbookPatternDescription
frozenlakeMulti-turn gymNavigates procedurally-generated FrozenLake grids; demonstrates direct integration with a Gymnasium environment.
mathSingle-turnSolves competition / textbook math with \boxed{} answer extraction.
math_tool_agentMulti-turn + toolSolves math with a calculator tool via native OpenAI function calling.
deepcoderSingle-turnCompetition coding with \“python```` answer extraction; the evaluator runs the code against hidden tests.
finqaMulti-turn + 4 toolsFinancial QA over SEC 10-K tables: get_table_names / get_table_info / sql_query / calculator, judge-LLM grading.
geo3kVLM single-turnMultimodal geometry on the Geometry3K dataset.
solver_judge_flowMulti-agentSolver–judge pattern on the countdown task.
Click into any row for the deep-dive page — install flow, dataset, eval and train commands, and key code snippets. The full source, including the launch scripts, lives at cookbooks/ on GitHub.

Anatomy of a cookbook

Every cookbook follows the same shape:
cookbooks/<name>/
├── README.md           # walkthrough
├── pyproject.toml      # entry-point declarations
├── <name>_flow.py      # the AgentFlow function
├── <name>_eval.py      # the Evaluator function (or evaluator.py)
├── prepare_data.py     # DatasetRegistry registration / HF download
├── train.py            # Hydra entry point used by train_*.sh
├── train_tinker.sh     # single-machine LoRA training
├── train_verl.sh       # distributed multi-GPU training
└── test.py             # unit tests

Entry-point declaration

In pyproject.toml, the cookbook registers its flow and evaluator under two well-known groups:
[project.entry-points."rllm.agents"]
my_agent = "my_flow:my_flow"

[project.entry-points."rllm.evaluators"]
my_eval = "my_eval:my_evaluator"
The CLI’s --agent <name> and --evaluator <name> flags resolve through these groups (see rllm.eval.agent_loader and rllm.eval.evaluator_loader).

Module-name collision gotcha

Top-level Python module names must be unique across all installed cookbooks — pip install -e puts each cookbook’s modules at the import root. If two cookbooks both ship a top-level evaluator.py, only one wins. Convention: prefix module names with the cookbook name. cookbooks/math/ ships math_flow.py + math_eval.py, not flow.py + evaluator.py. cookbooks/finqa/ ships finqa_flow.py + finqa_eval.py + finqa_tools.py + finqa_constants.py.

Install + run

# 1. Install rLLM with the backend you want to train on
uv pip install -e ".[tinker]"      # or .[verl]

# 2. Install the cookbook (--no-deps reuses rllm's pinned deps)
uv pip install --no-deps -e cookbooks/math

# 3. Verify discovery
rllm agent list                    # should list "math"

# 4. Pull the dataset and run
rllm dataset pull math500
rllm eval math500 --agent math --evaluator math --max-examples 20
The same flow works for any cookbook — substitute the cookbook name and dataset.

Authoring a new cookbook

The cleanest starting point is to copy an existing cookbook that matches your interaction shape:
Your agent looks like…Copy from
One LLM call, parse answercookbooks/math
Multi-turn with a toolcookbooks/math_tool_agent
Multi-turn with multiple tools + complex gradingcookbooks/finqa
Drives a Gym environmentcookbooks/frozenlake
Solver + judge multi-agentcookbooks/solver_judge_flow
Then:
  1. Rename the modules (prefix with your cookbook name to avoid collisions).
  2. Rewrite the flow body — call the LLM, drive your loop, return an Episode with the model’s final answer in episode.artifacts["answer"].
  3. Rewrite the evaluator — read artifacts["answer"], return an EvalOutput.
  4. Update the entry-point names in pyproject.toml.
  5. pip install --no-deps -e cookbooks/<name> and test with rllm eval.