# rLLM

## Docs

- [AWS Bedrock AgentCore](https://docs.rllm-project.com/agent-runtimes/agentcore.md): Train agents using AWS Bedrock AgentCore Runtime for secure, massively parallel rollouts without managing any infra.
- [AgentFlow](https://docs.rllm-project.com/api/agentflow.md): The protocol for authoring agents that run identically at eval time and at training time
- [Data](https://docs.rllm-project.com/api/data.md): Dataset management and preprocessing utilities
- [Parsers](https://docs.rllm-project.com/api/parsers.md): Parsers for tool calling and chat formatting
- [Rewards](https://docs.rllm-project.com/api/rewards.md): Reward functions for evaluating agent performance
- [RolloutEngine](https://docs.rllm-project.com/api/rollout-engine.md): Model inference engine for agent rollouts
- [Tools](https://docs.rllm-project.com/api/tools.md): Tool system for enabling agent capabilities
- [Trainer](https://docs.rllm-project.com/api/trainer.md): Training infrastructure for RL-based agent learning
- [AgentWorkflowEngine](https://docs.rllm-project.com/api/workflow-engine.md): Engine for executing workflow-based agent training and evaluation
- [Workflows](https://docs.rllm-project.com/api/workflows.md): Workflow orchestration for agent execution and training
- [Backend Comparison](https://docs.rllm-project.com/backends/comparison.md): Compare verl and tinker backends to choose the right one for your use case
- [Tinker Backend](https://docs.rllm-project.com/backends/tinker.md): Training with tinker - async-first RL training with unified architecture
- [verl Backend](https://docs.rllm-project.com/backends/verl.md): Training with verl - distributed RL training with vLLM and SGLang support
- [Deepcoder](https://docs.rllm-project.com/cookbooks/deepcoder.md): Single-turn coding agent with hidden-test grading
- [FinQA](https://docs.rllm-project.com/cookbooks/finqa.md): Multi-turn financial-QA agent with 4 tools over SEC 10-K tables
- [FrozenLake](https://docs.rllm-project.com/cookbooks/frozenlake.md): Multi-turn AgentFlow that drives a Gymnasium environment
- [Geo3K](https://docs.rllm-project.com/cookbooks/geo3k.md): Single-turn VLM geometry agent on the Geometry3K dataset
- [Math](https://docs.rllm-project.com/cookbooks/math.md): Single-turn math agent with \boxed{} answer extraction
- [Math Tool Agent](https://docs.rllm-project.com/cookbooks/math_tool_agent.md): Multi-turn math agent with calculator tool via OpenAI function calling
- [Cookbooks](https://docs.rllm-project.com/cookbooks/overview.md): End-to-end AgentFlow examples that ship as installable plugins
- [Solver-Judge Flow](https://docs.rllm-project.com/cookbooks/solver_judge_flow.md): Multi-agent solver-judge system on the countdown task
- [AgentFlow and Evaluator](https://docs.rllm-project.com/core-concepts/agentflow-evaluator.md): The two protocols that define how agents run tasks and how results are scored
- [rLLM CLI](https://docs.rllm-project.com/core-concepts/cli-and-ui.md): Evaluate, train, and monitor agents from the command line and web dashboard
- [Episodes, trajectories, and steps](https://docs.rllm-project.com/core-concepts/episodes-trajectories-steps.md): The core data structures that represent agent interactions in rLLM
- [AgentTrainer and the training loop](https://docs.rllm-project.com/core-concepts/training.md): How rLLM closes the RL loop using AgentFlow and Evaluator
- [Supported datasets](https://docs.rllm-project.com/datasets.md): All benchmark datasets available for evaluation and training with rllm eval
- [Advantage estimator](https://docs.rllm-project.com/experimental/advantage-estimator.md): How the rLLM advantage estimator works, its role-level customization capabilities, and how to register custom estimators.
- [Backend protocol](https://docs.rllm-project.com/experimental/backend-protocol.md): The abstract interface that decouples the unified trainer from specific training infrastructure, enabling pluggable backends.
- [Configuration](https://docs.rllm-project.com/experimental/configuration.md): The unified configuration system that separates backend-agnostic settings from backend-specific configurations.
- [Pre-computing advantage](https://docs.rllm-project.com/experimental/precompute-advantage.md): How to set step.advantage during workflow rollout and let the unified trainer consume it directly, enabling mixed-mode training.
- [rLLM UI](https://docs.rllm-project.com/experimental/ui.md): Web interface for monitoring and analyzing rLLM training runs in real time.
- [Unified trainer](https://docs.rllm-project.com/experimental/unified-trainer.md): The central orchestrator for backend-agnostic training in rLLM, managing the full training loop from episode generation to policy updates.
- [Customizing the training loop](https://docs.rllm-project.com/guides/customizing-training.md): How to customize advantage computation, trajectory grouping, rejection sampling, and other training behavior without modifying rLLM source code
- [Distributed Training](https://docs.rllm-project.com/guides/distributed-training.md): Scale your RL training across multiple GPUs and nodes with Ray
- [Introduction to rLLM](https://docs.rllm-project.com/index.md): Train your AI agents with RL. Any framework. Minimal code changes.
- [Installation](https://docs.rllm-project.com/installation.md): Install rLLM with pip, uv, or Docker
- [Cogito, Ergo Ludo](https://docs.rllm-project.com/projects/cogito-ergo-ludo.md): An agent that learns to play games by reasoning and planning
- [Cut the Bill, Keep the Turns](https://docs.rllm-project.com/projects/cut-the-bill.md): Cost-efficient multi-turn search with reinforcement learning
- [DeepCoder](https://docs.rllm-project.com/projects/deep-coder.md): A fully open-source 14B coder matching O3-mini on competitive programming
- [DeepScaleR](https://docs.rllm-project.com/projects/deep-scaler.md): A 1.5B model that surpasses O1-Preview by scaling RL on math reasoning
- [DeepSWE](https://docs.rllm-project.com/projects/deep-swe.md): A 32B software engineering agent achieving 59% on SWE-Bench-Verified
- [Experiential Reinforcement Learning](https://docs.rllm-project.com/projects/experiential-rl.md): Reinforcement learning with an experience-reflection-consolidation loop
- [LLM-in-Sandbox](https://docs.rllm-project.com/projects/llm-in-sandbox.md): Building general agents by running LLMs in a virtual computer
- [PettingLLMs](https://docs.rllm-project.com/projects/petting-llms.md): On-policy reinforcement learning for multi-agent language systems
- [SETA](https://docs.rllm-project.com/projects/seta.md): Scaling environments for terminal agent training by CAMEL-AI
- [Terminal-Bench-RL](https://docs.rllm-project.com/projects/terminal-bench-rl.md): Training long-horizon terminal agents with reinforcement learning
- [Tongyi DeepResearch](https://docs.rllm-project.com/projects/tongyi-deep-research.md): Open-source AI research assistant by Alibaba NLP
- [V1: Parallel Self-Verification](https://docs.rllm-project.com/projects/v1-parallel-reasoners.md): Unifying generation and pairwise self-verification for parallel reasoners
- [Quick start (CLI)](https://docs.rllm-project.com/quickstart-cli.md): Evaluate and train your first agent using the rllm command line in minutes
- [Building a solver-judge workflow](https://docs.rllm-project.com/tutorials/solver-judge-workflow.md): A hands-on tutorial for building a multi-agent solver-judge AgentFlow in rLLM and training it end-to-end with the unified trainer.

## Optional

- [Blog](https://rllm-project.com/blog.html)