Skip to main content
rLLM supports two training backends: verl and tinker. This guide helps you choose the right backend for your project.

Quick Comparison

verl

Distributed, production-ready backend for large-scale training

tinker

Async-first backend for flexible and rapid development

Feature Comparison

Featureverltinker
Python Version>= 3.10>= 3.11
ArchitectureRay-based distributedAsync-first service-based
Multi-GPU✅ Full support⚠️ Limited
Multi-Node✅ Full support❌ Not supported
LoRA✅ Via configuration✅ Native support
VLM Support✅ Qwen2-VL, Qwen3-VL⚠️ Limited
Distributed Training✅ FSDP, tensor parallel⚠️ Single node
Inference EnginevLLM, SGLangtinker service
ConfigurationComplex (Hydra + verl)Simple (Hydra)
Learning CurveSteeperGentler
Async SupportBuilt-inNative
CheckpointingAdvanced (Ray)Standard
Resource ManagementRay resource poolsService-based
Production Ready✅ Yes⚠️ Development

Detailed Comparison

Architecture

Ray-Based Distributed Systemverl uses Ray for orchestrating distributed worker groups:
┌─────────────────────────────────────────┐
│          Ray Cluster                    │
│                                         │
│  ┌──────────────┐  ┌──────────────┐   │
│  │ Actor-Rollout│  │    Critic    │   │
│  │   Workers    │  │   Workers    │   │
│  └──────────────┘  └──────────────┘   │
│                                         │
│  ┌──────────────┐  ┌──────────────┐   │
│  │  Reference   │  │    vLLM/     │   │
│  │   Policy     │  │   SGLang     │   │
│  └──────────────┘  └──────────────┘   │
└─────────────────────────────────────────┘
Key Components:
  • Actor-Rollout Workers: Combined training and generation
  • Critic Workers: Value function estimation
  • Reference Policy: Frozen policy for KL divergence
  • Hybrid Engine: Efficient async trajectory generation
Use Cases:
  • Large-scale distributed training
  • Multi-node GPU clusters
  • Production deployments
  • Vision-language models

Installation & Dependencies

# Python >= 3.10
uv pip install "rllm[verl] @ git+https://github.com/rllm-org/rllm.git"

# Dependencies:
# - verl==0.6.1
# - vllm>=0.10.2,<=0.11.0
# - torch>=2.8.0
# - flash-attn>=2.8.1
# - qwen-vl-utils (for VLM)

Configuration Complexity

More Complex Configurationverl requires configuring Ray resources, worker groups, and FSDP:
actor_rollout_ref:
  model:
    path: Qwen/Qwen2.5-Math-7B-Instruct
    lora:
      rank: 64
      alpha: 128
  actor:
    fsdp_config:
      param_offload: false
      grad_offload: false
  rollout:
    mode: async  # Required
    n: 16
  
resource_pool_config:
  actor_rollout_gpu: 4
  critic_gpu: 2
  ref_policy_gpu: 2

data:
  train_batch_size: 32
  max_prompt_length: 2048
  max_response_length: 2048

algorithm:
  adv_estimator: grpo
  gamma: 1.0

trainer:
  total_epochs: 3
  save_freq: 100
Pros:
  • Fine-grained control over resources
  • Advanced features (FSDP, tensor parallel)
  • Production-tested configurations
Cons:
  • Steeper learning curve
  • More configuration options
  • Requires Ray knowledge

LoRA Support

Configuration-Based LoRA
actor_rollout_ref:
  model:
    path: Qwen/Qwen2.5-Math-7B-Instruct
    lora:
      rank: 64
      alpha: 128
      target_modules:
        - q_proj
        - k_proj
        - v_proj
        - o_proj
        - gate_proj
        - up_proj
        - down_proj
python train_agent.py \
  actor_rollout_ref.model.lora.rank=64 \
  actor_rollout_ref.model.lora.alpha=128
Features:
  • Full control over target modules
  • Integrated with FSDP
  • Reference policy without LoRA

Vision-Language Models (VLM)

Full VLM Supportverl supports Qwen2-VL and Qwen3-VL with multimodal processing:
import hydra
from rllm.trainer.agent_trainer import AgentTrainer

@hydra.main(
    config_path="pkg://rllm.trainer.config",
    config_name="agent_ppo_trainer",
    version_base=None
)
def main(config):
    trainer = AgentTrainer(
        workflow_class=Geo3KWorkflow,
        workflow_args={"reward_function": f1_reward_fn},
        config=config,
        train_dataset=train_dataset,
        val_dataset=test_dataset,
    )
    trainer.train()
python train_vlm.py \
  actor_rollout_ref.model.path=Qwen/Qwen2-VL-7B-Instruct \
  data.return_multi_modal_inputs=true
Supported Models:
  • Qwen2-VL-7B-Instruct
  • Qwen2-VL-72B-Instruct
  • Qwen3-VL models
Features:
  • Image grid position IDs
  • Multimodal processors
  • Vision-aware tokenization

Distributed Training

Full Distributed Supportverl supports multi-GPU and multi-node training:
# Multi-GPU on single node
python train_agent.py \
  resource_pool_config.actor_rollout_gpu=8 \
  actor_rollout_ref.actor.fsdp_config.param_offload=false

# Multi-node cluster
ray start --head --port=6379
# On other nodes:
ray start --address=head-node-ip:6379

python train_agent.py \
  resource_pool_config.actor_rollout_gpu=32
Features:
  • FSDP (Fully Sharded Data Parallel)
  • Tensor parallelism via vLLM
  • Resource pool management
  • Ray cluster orchestration
Resource Configuration:
resource_pool_config:
  actor_rollout_gpu: 8  # Actor-rollout workers
  critic_gpu: 2         # Critic workers
  ref_policy_gpu: 2     # Reference policy workers

When to Use Each Backend

Use verl When:

  • Training on multiple GPUs or nodes
  • Production deployments requiring reliability
  • Large models (> 7B parameters) needing FSDP
  • High-throughput training pipelines
  • Training Qwen2-VL or Qwen3-VL models
  • Multimodal agent training
  • Image-based reasoning tasks
  • OCR and visual question answering
  • Custom advantage estimators
  • Critic network training
  • Reference policy with KL divergence
  • Complex reward shaping
  • Multi-node GPU clusters
  • Tensor parallel inference
  • Memory-constrained large models
  • High-throughput rollout generation

Use tinker When:

  • Quick experiments and iteration
  • Testing new agent architectures
  • Developing custom workflows
  • Learning rLLM framework
  • Parameter-efficient fine-tuning
  • Limited GPU memory (single GPU)
  • Fast adaptation of pretrained models
  • Deployment to Fireworks AI
  • Training on a single machine
  • Small to medium models (< 7B)
  • Development environments
  • Limited computational resources
  • Building custom agent workflows
  • Multi-step reasoning tasks
  • Tool-using agents
  • Async-first architectures

Performance Characteristics

Training Speed

Metricverltinker
Single GPUFastFast
Multi-GPUVery Fast (scaling)Not supported
Startup TimeSlower (Ray init)Faster
ThroughputHigh (distributed)Medium (single node)
Memory EfficiencyHigh (FSDP)Medium

Resource Requirements

Minimum Requirements:
  • 1 GPU with 24GB+ VRAM (for 7B models)
  • 32GB+ system RAM
  • Python >= 3.10
  • CUDA 11.8+ or 12.1+
Recommended for Production:
  • 4-8 GPUs (A100 or H100)
  • 128GB+ system RAM
  • NVMe storage for checkpoints
  • Multi-node Ray cluster
Memory Usage (7B model, batch_size=32):
  • Full fine-tuning: ~40GB VRAM
  • LoRA (rank=64): ~28GB VRAM
  • With FSDP: ~20GB per GPU (4 GPUs)

Migration Between Backends

From tinker to verl

1

Update Configuration

Convert tinker config to verl format:
- model:
-   name: "Qwen/Qwen2.5-Math-7B-Instruct"
-   lora_rank: 32
+ actor_rollout_ref:
+   model:
+     path: "Qwen/Qwen2.5-Math-7B-Instruct"
+     lora:
+       rank: 32
2

Update Training Script

Change backend parameter:
trainer = AgentTrainer(
    config=config,
    agent_class=MathAgent,
    env_class=SingleTurnEnvironment,
    backend="verl",  # Changed from "tinker"
    # ...
)
3

Install verl Backend

uv pip install -e .[verl]
4

Update Hydra Config

Use verl config file:
@hydra.main(
    config_path="pkg://rllm.trainer.config",
    config_name="agent_ppo_trainer",  # verl config
    version_base=None
)

From verl to tinker

1

Check Python Version

Ensure Python >= 3.11:
python --version  # Must be 3.11+
uv venv --python 3.11
2

Simplify Configuration

Convert verl config to tinker format:
- actor_rollout_ref:
-   model:
-     path: "Qwen/Qwen2.5-Math-7B-Instruct"
-     lora:
-       rank: 32
+ model:
+   name: "Qwen/Qwen2.5-Math-7B-Instruct"
+   lora_rank: 32
3

Update Training Script

trainer = AgentTrainer(
    config=config,
    agent_class=MathAgent,
    env_class=SingleTurnEnvironment,
    backend="tinker",  # Changed from default/verl
    # ...
)
4

Install tinker Backend

uv pip install -e .[tinker]

Recommendations by Use Case

Research & Experimentation

Recommendation: Start with tinker, scale to verl if needed
  • Begin with tinker for rapid iteration
  • Switch to verl when:
    • Need multi-GPU training
    • Training VLM models
    • Scaling to larger datasets

Production Deployment

Recommendation: Use verl
  • Production-tested infrastructure
  • Scalable to multi-node clusters
  • Better resource management
  • Advanced checkpointing

LoRA Fine-Tuning

Recommendation: tinker or verl (equal)
  • tinker: Simpler configuration
  • verl: Better for distributed LoRA

Vision-Language Tasks

Recommendation: Use verl
  • Full Qwen-VL support
  • Multimodal processors
  • Tested on vision datasets

Summary

Choose verl for:

  • Production deployments
  • Multi-GPU/multi-node training
  • Vision-language models
  • Large-scale experiments

Choose tinker for:

  • Rapid prototyping
  • Single-node training
  • LoRA fine-tuning
  • Workflow development
Both backends are actively maintained and share the same core rLLM framework. Your choice depends on scale and requirements, not quality.

See Also