The tinker backend is rLLM’s async-first training backend that provides a unified architecture for both agent and workflow training. It’s designed for flexibility and ease of use with built-in support for LoRA and seamless integration with the tinker service.Documentation Index
Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
tinker backend features:- Async-First Design: Native async/await support throughout the training pipeline
- Unified Architecture: Single codebase for agent and workflow training
- Service-Based: Uses tinker service for model serving and training
- Simplified API: Cleaner configuration and easier setup
Python Version: Requires Python >= 3.11 for tinker backend
Installation
Install rLLM with the tinker backend:Dependencies
The tinker backend includes (frompyproject.toml):
Basic Usage
Agent Training
Train a math agent with tinker backend. The recommended path is to use thecookbooks/math cookbook,
which already wires an AgentFlow + Evaluator against the unified trainer:
train.py
Workflow Training
Tinker backend also supports workflow-based training:train_workflow_tinker.py
Configuration
The tinker backend usestinker_rl_trainer.yaml configuration:
Model Configuration
Model path (HuggingFace or local)
LoRA rank (parameter-efficient fine-tuning)
Train LoRA on output embedding layer
Train LoRA on attention layers
Train LoRA on MLP layers
Training Configuration
Number of rollouts per prompt (for GRPO)
Number of rollouts per validation prompt
Learning rate for optimizer
Maximum sequence length (prompt + response)
Number of minibatches per update (currently only 1 is fully tested)
Algorithm Configuration
Advantage estimator: “grpo”, “reinforce”, or “distill”
Discount factor for rewards
Grouping level: “trajectory” or “step”
Normalize advantages by standard deviation in GRPO
Data Configuration
Training batch size
Validation batch size
Maximum prompt length in tokens
Maximum response length in tokens
Trainer Configuration
Number of training epochs
Validation frequency (in steps)
Checkpoint save frequency (in steps)
Checkpoint directory
LoRA Training
tinker backend has native LoRA support built-in:Tinker Service
Local Service
By default, tinker backend uses a local service:Remote Service
Connect to a remote tinker service:Sampling Configuration
Configure sampling parameters:Sampling temperature
Top-p (nucleus) sampling parameter
Rollout Engine Configuration
Reasoning effort level: “low”, “medium”, “high”
Accumulate reasoning tokens across steps
Disable thinking tokens in responses
Bypass renderer and use parser directly
Checkpointing
tinker backend provides flexible checkpointing:Automatic Checkpointing
Resume from Checkpoint
Resume from a tinker checkpoint:Manual Checkpoint Loading
Distillation Support
tinker backend supports knowledge distillation from teacher models:Advanced Features
Fused Forward-Backward and Optimizer Step
For better performance, tinker can fuse forward-backward pass with optimizer step:This optimization reduces overhead by combining gradient computation and parameter updates into a single operation.
Multi-Step Agents
For multi-turn agent interactions:Workflow Parallel Tasks
Control parallelism in workflow execution:Monitoring
Configure logging backends:Example Configuration
Complete configuration for MATH dataset training:config.yaml
Performance Optimization
Increase Batch Size
Tune
data.train_batch_size and training.group_size for better GPU utilizationUse LoRA
Enable LoRA for faster training and lower memory usage
Fuse Operations
Set
fuse_forward_backward_and_optim_step=true for reduced overheadParallel Workflows
Increase
workflow.n_parallel_tasks for workflow-based trainingTroubleshooting
Python Version Error
Python Version Error
tinker requires Python >= 3.11. Upgrade your Python version:
Sampling Parameter Warning
Sampling Parameter Warning
If you see warnings about Setting these away from 1.0 can cause logprob issues.
temperature or top_p:Minibatch Warning
Minibatch Warning
Currently only
num_minibatches=1 is fully tested:Checkpoint Not Found
Checkpoint Not Found
Ensure the checkpoint directory exists:
Tinker Service Connection Failed
Tinker Service Connection Failed
If using remote service, verify the URL:
Comparison with verl
Key differences from verl backend:| Feature | tinker | verl |
|---|---|---|
| Python Version | >= 3.11 | >= 3.10 |
| Architecture | Async-first | Ray-based |
| LoRA Support | Native | Via config |
| VLM Support | Limited | Full (Qwen2-VL, Qwen3-VL) |
| Distributed Training | Limited | Multi-node Ray |
| Configuration | Simpler | More complex |
| Service Model | tinker service | vLLM/SGLang |
See Also
verl Backend
Distributed training with verl
Backend Comparison
Compare tinker vs verl features
tinker Cookbook
Official tinker cookbook repository
Agent Trainer
Learn about AgentTrainer API

