Overview
The math tool agent demonstrates:- How to use rLLM’s
ToolAgentfor tool-based reasoning - Integration with Python interpreter for code execution
- Training on mathematical competition datasets (AIME 2024)
- Evaluating performance with Pass@K metrics
Prerequisites
- rLLM framework installed
- vLLM or SGLang for model serving
- Base model:
Qwen/Qwen3-4B(or similar)
Setup
Prepare the dataset
First, download and prepare the AIME 2024 and DeepScaleR math datasets:This will:
- Download AIME 2024 dataset from HuggingFace
- Download DeepScaleR math dataset for training
- Register both datasets with rLLM’s DatasetRegistry
Running the Agent
Once your model server is running and datasets are prepared, run inference:Code Implementation
Here’s the core implementation fromrun_math_with_tool.py:
Expected Output
The script will:- Load the AIME 2024 test dataset
- Repeat each problem 8 times for Pass@K evaluation
- Run parallel inference using the async agent execution engine
- Evaluate results and report accuracy metrics
Training
To train your own math reasoning agent with tool usage:Key Training Parameters
- Model: Qwen/Qwen3-4B
- Algorithm: GRPO (Group Relative Policy Optimization)
- Training Dataset: DeepScaleR math dataset
- Evaluation Dataset: AIME 2024
- Batch Size: 64
- Learning Rate: 1e-6
- Max Response Length: 16,384 tokens
Configuration Options
You can modify these parameters in the inference script:n_parallel_agents: Number of parallel agents (default: 64)model_name: Model to use (default: “Qwen/Qwen3-4B”)base_url: API server URL (default: “http://localhost:30000/v1”)max_response_length: Maximum response length (default: 16384)max_prompt_length: Maximum prompt length (default: 2048)temperature: Sampling temperature (default: 0.6)top_p: Top-p sampling (default: 0.95)
Next Steps
- Try the FrozenLake example for classic RL environments
- Explore SDK examples for simplified training workflows
- Learn about DeepScaleR for advanced math reasoning

