rLLM
Train your AI agents with RL. Any framework. Minimal code changes.
Why rLLM?
rLLM works with any agent framework — LangGraph, SmolAgents, Strands, OpenAI Agents SDK, Google ADK, or plainopenai.OpenAI. Just swap the client. Add @rllm.rollout to wrap your agent code, and rLLM traces every LLM call automatically. The framework has powered state-of-the-art agents including:
- rLLM-FinQA-4B: A 4B financial analysis agent that outperforms Qwen3-235B (59.7% vs 51.4%) and rivals Gemini 2.5 Pro on Snorkel Finance Benchmark
- DeepSWE: A 32B software engineering agent achieving 59% on SWEBench-Verified
- DeepCoder-14B: A 14B coding model achieving 60.6% on LiveCodeBench, matching o3-mini performance
- DeepScaleR-1.5B: A 1.5B model surpassing O1-Preview with 43.1% on AIME
Tongyi DeepResearch
Open-source AI research assistant by Alibaba NLP
PettingLLMs
Multi-agent RL for language systems
V1
Pairwise self-verification for parallel reasoners
SETA
Scaling environments for terminal agent training
Terminal-Bench-RL
Training long-horizon terminal agents with RL
LLM-in-Sandbox
Building general agents by running LLMs in a sandbox
Experiential RL
Experience-reflection-consolidation training loop
Cogito, Ergo Ludo
An agent that learns to play by reasoning and planning
View all projects
See the full list of projects and case studies built with rLLM
The rLLM CLI
The fastest way to use rLLM is through its command-line interface. Evaluate any model on 50+ benchmarks and launch RL training with a single command:CLI quick start
Go from zero to evaluation and training in 5 minutes
Key features
Works with any agent framework
LangGraph, SmolAgents, Strands, OpenAI Agents SDK, Google ADK, or plain
openai.OpenAI — just swap the clientNear-zero code changes
Add
@rllm.rollout to wrap your agent code, and rLLM traces every LLM call automaticallyCLI-first workflow
Evaluate, train, and scaffold agents from the command line with 50+ built-in benchmarks
Battle-tested results
rLLM-trained agents beat models 50x their size — 4B outperforms 235B on finance, 1.5B surpasses O1-Preview on math
Multiple RL algorithms
GRPO, REINFORCE, RLOO, rejection sampling, and more — pick the algorithm that fits your task
Two training backends
verl for distributed multi-GPU training, tinker for single-machine / CPU setups — same API either wayAdvanced training features
LoRA training, VLM support, rejection sampling, and multi-agent workflows out of the box
Production ready
Battle-tested in real-world deployments with Docker support and comprehensive documentation
How it works
rLLM follows a simple pipeline: run your agent → collect traces → compute rewards → update the model. Your agent runs as-is — rLLM’s SDK intercepts LLM calls and structures them into Episodes (one task) containing Trajectories (one agent run) made of Steps (one LLM call). A reward function scores the result, and the RL algorithm updates the model weights. The same agent code works for both eval and training. Under the hood:- Workflow Engine runs N parallel agent instances to collect rollouts
- LiteLLM Proxy routes requests and captures token IDs + logprobs
- Transform Pipeline groups trajectories for advantage computation
- Training Backend (
verlortinker) handles the policy update
Get started
Installation
Install rLLM with pip, uv, or Docker in minutes
CLI quick start
Evaluate and train from the command line in 5 minutes
Core concepts
Learn about agents, environments, and the execution engine
Examples
Build a math reasoning agent with tools
Community and support
Slack
Join our community to ask questions and share your projects
GitHub
Contribute to the project, report issues, or browse the source code
Blog
Read about the latest releases, research, and use cases
Twitter/X
Follow us for updates and announcements

