\boxed{ANSWER}; the evaluator pulls the boxed value and grades it via mathd + sympy equivalence.
This is the no-tool counterpart to math_tool_agent, which uses a calculator tool. Use math for chain-of-thought-only training.
Pattern
| Aspect | Value |
|---|---|
| Loop shape | Single-turn (one LLM call per task) |
| Tools | None — answer is parsed out of the response text |
| Termination | Single LLM call returns; evaluator grades the boxed answer |
| Reward shape | 1.0 if \boxed{…} matches ground truth (mathd + sympy), else 0.0 |
Architecture
Install
Datasets
Eval
gpt-5.4-mini: 100% (10/10) on a smoke run.
Training
gsm8k_lora use case), set --lora-rank higher and pass --train-dataset gsm8k:
Key code
rllm.eval.reward_fns.math.evaluate:
Files
| File | Description |
|---|---|
math_flow.py | The single-turn AgentFlow |
math_eval.py | Wraps rllm.eval.reward_fns.math |
train.py + train_{tinker,verl}.sh | Hydra entry points |
pyproject.toml | Plugin entry-point declarations |
test.py | 7 unit tests covering correct / wrong / no-fence / latex equivalence |
On GitHub
cookbooks/math
Full source, README, and runnable launch scripts

