math.
Pattern
| Aspect | Value |
|---|---|
| Loop shape | Multi-turn (up to 5 calculator calls per task) |
| Tools | One: calculate — asteval-based safe arithmetic interpreter |
| Termination | Model emits <answer>NUMBER</answer> (no tool call) or hits MAX_TURNS |
| Reward shape | 1.0 if final answer matches ground truth (mathd + sympy), else 0.0 |
Architecture
The evaluator checks the final<answer> against the ground truth via numeric comparison.
Install
Datasets
Eval
Training
train_verl.sh.)
Key code
The flow drives a fixed-iteration tool-calling loop:sqrt, log, sin, cos, factorial, comb, gcd, lcm, pi, e, tau, … — anything outside the whitelist raises an error string the model sees.
Files
| File | Description |
|---|---|
math_tool_agent.py | The multi-turn AgentFlow + safe calculator |
evaluator.py | Numeric / symbolic answer comparison |
train.py + train_{tinker,verl}.sh | Hydra entry points |
pyproject.toml | Plugin entry-point declarations |
test.py | Unit tests covering calculator and answer parsing |
On GitHub
cookbooks/math_tool_agent
Full source, README, and runnable launch scripts

