Documentation Index
Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt
Use this file to discover all available pages before exploring further.
Read the full write-up
Notion blog post with full details
Results
| Model | Parameters | AIME Pass@1 |
|---|---|---|
| DeepScaleR | 1.5B | 43.1% |
| O1-Preview | Unknown | 42.0% |
Approach
DeepScaleR iteratively scales Deepseek’s GRPO algorithm from 8K to 16K to 24K context length for thinking, trained on top of DeepSeek-R1-Distill-1.5B on math competition problems. See thecookbooks/math cookbook (single-turn math with \boxed{} answers) for the AgentFlow-based reproducer.
Released: February 2025
