rLLM

Train your AI agents with RL. Any framework. Minimal code changes.

rLLM is an open-source framework for training AI agents with reinforcement learning. Swap in a tracked client, define a reward function, and let RL handle the rest — no matter what agent framework you use.

Why rLLM?

rLLM works with any agent framework — LangGraph, SmolAgents, Strands, OpenAI Agents SDK, Google ADK, or plain openai.OpenAI. Just swap the client. Add @rllm.rollout to wrap your agent code, and rLLM traces every LLM call automatically. The framework has powered state-of-the-art agents including:

rLLM-FinQA-4B: A 4B financial analysis agent that outperforms Qwen3-235B (59.7% vs 51.4%) and rivals Gemini 2.5 Pro on Snorkel Finance Benchmark
DeepSWE: A 32B software engineering agent achieving 59% on SWEBench-Verified
DeepCoder-14B: A 14B coding model achieving 60.6% on LiveCodeBench, matching o3-mini performance
DeepScaleR-1.5B: A 1.5B model surpassing O1-Preview with 43.1% on AIME

rLLM is also used by a growing community of researchers and teams building on top of the framework:

Tongyi DeepResearch

Open-source AI research assistant by Alibaba NLP

PettingLLMs

Multi-agent RL for language systems

V1

Pairwise self-verification for parallel reasoners

SETA

Scaling environments for terminal agent training

Terminal-Bench-RL

Training long-horizon terminal agents with RL

LLM-in-Sandbox

Building general agents by running LLMs in a sandbox

Experiential RL

Experience-reflection-consolidation training loop

Cogito, Ergo Ludo

An agent that learns to play by reasoning and planning

View all projects

See the full list of projects and case studies built with rLLM

The rLLM CLI

The fastest way to use rLLM is through its command-line interface. Evaluate any model on 50+ benchmarks and launch RL training with a single command:

rllm model setup              # Configure your model provider
rllm eval gsm8k                # Evaluate on a benchmark
rllm train gsm8k --model Qwen/Qwen3-8B   # Train with RL
rllm init my-agent             # Scaffold a custom agent

Datasets are auto-pulled from HuggingFace, agents and evaluators are resolved from the catalog, and a local LiteLLM proxy handles API routing — no boilerplate required.

CLI quick start

Go from zero to evaluation and training in 5 minutes

Key features

Works with any agent framework

LangGraph, SmolAgents, Strands, OpenAI Agents SDK, Google ADK, or plain openai.OpenAI — just swap the client

Near-zero code changes

Add @rllm.rollout to wrap your agent code, and rLLM traces every LLM call automatically

CLI-first workflow

Evaluate, train, and scaffold agents from the command line with 50+ built-in benchmarks

Battle-tested results

rLLM-trained agents beat models 50x their size — 4B outperforms 235B on finance, 1.5B surpasses O1-Preview on math

Multiple RL algorithms

GRPO, REINFORCE, RLOO, rejection sampling, and more — pick the algorithm that fits your task

Two training backends

verl for distributed multi-GPU training, tinker for single-machine / CPU setups — same API either way

Advanced training features

LoRA training, VLM support, rejection sampling, and multi-agent workflows out of the box

Production ready

Battle-tested in real-world deployments with Docker support and comprehensive documentation

How it works

rLLM follows a simple pipeline: run your agent → collect traces → compute rewards → update the model. Your agent runs as-is — rLLM’s SDK intercepts LLM calls and structures them into Episodes (one task) containing Trajectories (one agent run) made of Steps (one LLM call). A reward function scores the result, and the RL algorithm updates the model weights. The same agent code works for both eval and training. Under the hood:

Workflow Engine runs N parallel agent instances to collect rollouts
LiteLLM Proxy routes requests and captures token IDs + logprobs
Transform Pipeline groups trajectories for advantage computation
Training Backend (verl or tinker) handles the policy update

Get started

Installation

Install rLLM with pip, uv, or Docker in minutes

CLI quick start

Evaluate and train from the command line in 5 minutes

Core concepts

Learn about agents, environments, and the execution engine

Examples

Build a math reasoning agent with tools

Community and support

Slack

Join our community to ask questions and share your projects

GitHub

Contribute to the project, report issues, or browse the source code

Blog

Read about the latest releases, research, and use cases

Twitter/X

Built by Berkeley Sky Computing Lab

rLLM is developed as part of Berkeley Sky Computing Lab and generously supported by grants from Laude Institute, AWS, Hyperbolic, Fireworks AI, and Modal, with special thanks to Together AI for research partnership and compute support. We also thank the Mintlify OSS Program for making this beautiful documentation website possible.

Get started

Tutorials

rLLM CLI & UI

Core concepts

Agent runtimes

Training backends

Guides

Unified workflow trainer

Advanced algorithms

​rLLM

​Why rLLM?

Tongyi DeepResearch

PettingLLMs

V1

SETA

Terminal-Bench-RL

LLM-in-Sandbox

Experiential RL

Cogito, Ergo Ludo

View all projects

​The rLLM CLI

CLI quick start

​Key features

Works with any agent framework

Near-zero code changes

CLI-first workflow

Battle-tested results

Multiple RL algorithms

Two training backends

Advanced training features

Production ready

​How it works

​Get started

Installation

CLI quick start

Core concepts

Examples

​Community and support

Slack

GitHub

Blog

Twitter/X

​Built by Berkeley Sky Computing Lab

rLLM

Why rLLM?

The rLLM CLI

Key features

How it works

Get started

Community and support

Built by Berkeley Sky Computing Lab