# rLLM ## Docs - [AWS Bedrock AgentCore](https://docs.rllm-project.com/agent-runtimes/agentcore.md): Train agents using AWS Bedrock AgentCore Runtime for secure, massively parallel rollouts without managing any infra. - [AgentFlow](https://docs.rllm-project.com/api/agentflow.md): The protocol for authoring agents that run identically at eval time and at training time - [Data](https://docs.rllm-project.com/api/data.md): Dataset management and preprocessing utilities - [Parsers](https://docs.rllm-project.com/api/parsers.md): Parsers for tool calling and chat formatting - [Rewards](https://docs.rllm-project.com/api/rewards.md): Reward functions for evaluating agent performance - [RolloutEngine](https://docs.rllm-project.com/api/rollout-engine.md): Model inference engine for agent rollouts - [Tools](https://docs.rllm-project.com/api/tools.md): Tool system for enabling agent capabilities - [Trainer](https://docs.rllm-project.com/api/trainer.md): Training infrastructure for RL-based agent learning - [AgentWorkflowEngine](https://docs.rllm-project.com/api/workflow-engine.md): Engine for executing workflow-based agent training and evaluation - [Workflows](https://docs.rllm-project.com/api/workflows.md): Workflow orchestration for agent execution and training - [Backend Comparison](https://docs.rllm-project.com/backends/comparison.md): Compare verl and tinker backends to choose the right one for your use case - [Tinker Backend](https://docs.rllm-project.com/backends/tinker.md): Training with tinker - async-first RL training with unified architecture - [verl Backend](https://docs.rllm-project.com/backends/verl.md): Training with verl - distributed RL training with vLLM and SGLang support - [Deepcoder](https://docs.rllm-project.com/cookbooks/deepcoder.md): Single-turn coding agent with hidden-test grading - [FinQA](https://docs.rllm-project.com/cookbooks/finqa.md): Multi-turn financial-QA agent with 4 tools over SEC 10-K tables - [FrozenLake](https://docs.rllm-project.com/cookbooks/frozenlake.md): Multi-turn AgentFlow that drives a Gymnasium environment - [Geo3K](https://docs.rllm-project.com/cookbooks/geo3k.md): Single-turn VLM geometry agent on the Geometry3K dataset - [Math](https://docs.rllm-project.com/cookbooks/math.md): Single-turn math agent with \boxed{} answer extraction - [Math Tool Agent](https://docs.rllm-project.com/cookbooks/math_tool_agent.md): Multi-turn math agent with calculator tool via OpenAI function calling - [Cookbooks](https://docs.rllm-project.com/cookbooks/overview.md): End-to-end AgentFlow examples that ship as installable plugins - [Solver-Judge Flow](https://docs.rllm-project.com/cookbooks/solver_judge_flow.md): Multi-agent solver-judge system on the countdown task - [AgentFlow and Evaluator](https://docs.rllm-project.com/core-concepts/agentflow-evaluator.md): The two protocols that define how agents run tasks and how results are scored - [rLLM CLI](https://docs.rllm-project.com/core-concepts/cli-and-ui.md): Evaluate, train, and monitor agents from the command line and web dashboard - [Episodes, trajectories, and steps](https://docs.rllm-project.com/core-concepts/episodes-trajectories-steps.md): The core data structures that represent agent interactions in rLLM - [AgentTrainer and the training loop](https://docs.rllm-project.com/core-concepts/training.md): How rLLM closes the RL loop using AgentFlow and Evaluator - [Supported datasets](https://docs.rllm-project.com/datasets.md): All benchmark datasets available for evaluation and training with rllm eval - [Advantage estimator](https://docs.rllm-project.com/experimental/advantage-estimator.md): How the rLLM advantage estimator works, its role-level customization capabilities, and how to register custom estimators. - [Backend protocol](https://docs.rllm-project.com/experimental/backend-protocol.md): The abstract interface that decouples the unified trainer from specific training infrastructure, enabling pluggable backends. - [Configuration](https://docs.rllm-project.com/experimental/configuration.md): The unified configuration system that separates backend-agnostic settings from backend-specific configurations. - [Pre-computing advantage](https://docs.rllm-project.com/experimental/precompute-advantage.md): How to set step.advantage during workflow rollout and let the unified trainer consume it directly, enabling mixed-mode training. - [rLLM UI](https://docs.rllm-project.com/experimental/ui.md): Web interface for monitoring and analyzing rLLM training runs in real time. - [Unified trainer](https://docs.rllm-project.com/experimental/unified-trainer.md): The central orchestrator for backend-agnostic training in rLLM, managing the full training loop from episode generation to policy updates. - [Customizing the training loop](https://docs.rllm-project.com/guides/customizing-training.md): How to customize advantage computation, trajectory grouping, rejection sampling, and other training behavior without modifying rLLM source code - [Distributed Training](https://docs.rllm-project.com/guides/distributed-training.md): Scale your RL training across multiple GPUs and nodes with Ray - [Introduction to rLLM](https://docs.rllm-project.com/index.md): Train your AI agents with RL. Any framework. Minimal code changes. - [Installation](https://docs.rllm-project.com/installation.md): Install rLLM with pip, uv, or Docker - [Cogito, Ergo Ludo](https://docs.rllm-project.com/projects/cogito-ergo-ludo.md): An agent that learns to play games by reasoning and planning - [Cut the Bill, Keep the Turns](https://docs.rllm-project.com/projects/cut-the-bill.md): Cost-efficient multi-turn search with reinforcement learning - [DeepCoder](https://docs.rllm-project.com/projects/deep-coder.md): A fully open-source 14B coder matching O3-mini on competitive programming - [DeepScaleR](https://docs.rllm-project.com/projects/deep-scaler.md): A 1.5B model that surpasses O1-Preview by scaling RL on math reasoning - [DeepSWE](https://docs.rllm-project.com/projects/deep-swe.md): A 32B software engineering agent achieving 59% on SWE-Bench-Verified - [Experiential Reinforcement Learning](https://docs.rllm-project.com/projects/experiential-rl.md): Reinforcement learning with an experience-reflection-consolidation loop - [LLM-in-Sandbox](https://docs.rllm-project.com/projects/llm-in-sandbox.md): Building general agents by running LLMs in a virtual computer - [PettingLLMs](https://docs.rllm-project.com/projects/petting-llms.md): On-policy reinforcement learning for multi-agent language systems - [SETA](https://docs.rllm-project.com/projects/seta.md): Scaling environments for terminal agent training by CAMEL-AI - [Terminal-Bench-RL](https://docs.rllm-project.com/projects/terminal-bench-rl.md): Training long-horizon terminal agents with reinforcement learning - [Tongyi DeepResearch](https://docs.rllm-project.com/projects/tongyi-deep-research.md): Open-source AI research assistant by Alibaba NLP - [V1: Parallel Self-Verification](https://docs.rllm-project.com/projects/v1-parallel-reasoners.md): Unifying generation and pairwise self-verification for parallel reasoners - [Quick start (CLI)](https://docs.rllm-project.com/quickstart-cli.md): Evaluate and train your first agent using the rllm command line in minutes - [Building a solver-judge workflow](https://docs.rllm-project.com/tutorials/solver-judge-workflow.md): A hands-on tutorial for building a multi-agent solver-judge AgentFlow in rLLM and training it end-to-end with the unified trainer. ## Optional - [Blog](https://rllm-project.com/blog.html)