Skip to main content

Read the full write-up

Notion blog post with full details
DeepScaleR is a 1.5B math reasoning model that achieves 43.1% Pass@1 on AIME, surpassing O1-Preview. It demonstrates that small models can match frontier-level performance when RL training is scaled effectively.

Results

ModelParametersAIME Pass@1
DeepScaleR1.5B43.1%
O1-PreviewUnknown42.0%

Approach

DeepScaleR iteratively scales Deepseek’s GRPO algorithm from 8K to 16K to 24K context length for thinking, trained on top of DeepSeek-R1-Distill-1.5B on math competition problems. See the DeepScaleR example for instructions on reproducing this with rLLM. Released: February 2025