FireworksEngine (DeploymentSampler), and policy updates use FireworksPolicyTrainer with WeightSyncer for hot-loading weights into the inference deployment.
Overview
Fireworks backend features:- Managed infrastructure: Trainer jobs and inference deployments are provisioned and torn down automatically at startup/shutdown
- Async-first design: Native async/await support inherited from the tinker backend path
- Unified architecture: Same
AgentTrainerAPI for agent and workflow training - Server-side losses: Builtin GRPO, DAPO, CISPO, and GSPO kernels on Firetitan
- Weight hot-loading:
WeightSyncersyncs trainer weights to the rollout deployment after each step
Python version: Requires Python >= 3.11 (same as tinker; Fireworks training SDK depends on tinker types).
API key: Set
FIREWORKS_API_KEY in your environment before training. The trainer job and inference deployment are created on Fireworks at startup and deleted on shutdown.Installation
Install rLLM with the Fireworks backend:Dependencies
The Fireworks backend includes (frompyproject.toml):
training.provision.init_fireworks_infra and the RL loss utilities used by FireworksPolicyTrainer.
Models and training shapes
Fireworks RL training uses two kinds of IDs from the shared public catalog under thefireworks account:
| Concept | ID format | rLLM config field |
|---|---|---|
| Base model | accounts/fireworks/models/<model> | model.name |
| Training shape | accounts/fireworks/trainingShapes/<shape> | fireworks_config.policy_trainer_shape_id |
accounts/fireworks/models/qwen3-4b). A training shape is a pre-configured GPU and runtime profile — you pass the full path (for example accounts/fireworks/trainingShapes/qwen3-4b-minimum-lora) and the SDK resolves the pinned version, image tag, GPU layout, and linked deployment shape for you. See the Fireworks training shapes documentation for the searchable catalog, per-model RFT support matrix, and shape roles.
For rLLM you only need to pick a compatible model + training shape pair from that catalog. You do not need to specify versioned shape refs, image tags, or GPU counts manually — the shape owns that infrastructure.
Shape roles (RFT / RL)
During reinforcement fine-tuning (RFT), Fireworks deploys separate trainer and inference resources. The catalog lists shapes by role:| Role | Use in rLLM | When |
|---|---|---|
| LoRA Policy | fireworks_config.policy_trainer_shape_id with model.lora_rank > 0 | Default path — parameter-efficient RL; policy trainer also serves as frozen reference |
| Policy | fireworks_config.policy_trainer_shape_id with model.lora_rank=0 | Full-parameter training |
| Forward-only | fireworks_config.reference_trainer_shape_id | Separate frozen reference for full-parameter RL with KL (kl_beta > 0) |
fireworks_infra.deployments.rollout.deployment_id.
Picking a model for RL
In the training shapes catalog, select a model and check the RFT LoRA or RFT Full-Param row in the training method support matrix. Only models marked supported there can be used with the Fireworks backend for RL. Examples of models with RFT LoRA support (see the live catalog for GPU totals and context limits):| Model | Base model ID | Example LoRA training shape |
|---|---|---|
| Qwen 3 4B | accounts/fireworks/models/qwen3-4b | accounts/fireworks/trainingShapes/qwen3-4b-minimum-lora |
| Qwen 3 8B | accounts/fireworks/models/qwen3-8b | accounts/fireworks/trainingShapes/qwen3-8b-256k-h200-lora |
| Qwen 3.5 9B | accounts/fireworks/models/qwen3p5-9b | accounts/fireworks/trainingShapes/qwen3p5-9b-256k-lora |
| Qwen 3.5 35B A3B | accounts/fireworks/models/qwen3p5-35b-a3b | accounts/fireworks/trainingShapes/qwen3p5-35b-a3b-256k-lora |
| Llama 3.3 70B Instruct | accounts/fireworks/models/llama-v3p3-70b-instruct | accounts/fireworks/trainingShapes/llama-v3p3-70b-instruct-128k-lora-b200 |
Basic Usage
Workflow Training
Train a countdown workflow on Fireworks using the unified trainer. Seeexamples/countdown/unified_trainer/ for the full example:
train_countdown_unified_fireworks.py
Agent Training
Use the sameAgentTrainer API with an AgentFlow and Evaluator (see the math cookbook):
Architecture
Fireworks backend extendsTinkerBackend and overrides only what differs:
FireworksEngine, FireworksPolicyTrainer, DCP checkpointing, weight sync, and model promotion.
Configuration
The Fireworks backend usesfireworks.yaml (selected when rllm/backend=fireworks):
Model Configuration
Fireworks model ID (accounts path)
HuggingFace tokenizer model for chat template rendering
LoRA rank. Set to
0 for full-parameter training (requires reference trainer for KL)Train LoRA on output embedding layer
Train LoRA on attention layers
Train LoRA on MLP layers
Training Configuration
Number of rollouts per prompt (for GRPO)
Learning rate for Adam optimizer
LR schedule:
"constant", "linear", or "cosine"Warmup steps as a ratio of total steps (0 to 1)
Maximum sequence length. Auto-derived from the training shape when null
Timeout for forward / forward_backward / optim_step calls (seconds)
Source trainer job ID for loading a DCP checkpoint from another job
Explicit DCP checkpoint name to load; null uses the latest on the source/current job
Fireworks Trainer / Deployment Shapes
Training shape for the policy trainer job
Replica count for the policy trainer
Replica count for the inference deployment used during rollouts
Reference trainer shape (only needed for full-parameter RFT with
kl_beta > 0)Reference trainer replica count. Leave at
0 for LoRA (policy serves as frozen reference)Validation Configuration
Number of rollouts per validation prompt
Data Configuration
Training batch size
Validation batch size
Maximum prompt length in tokens
Maximum response length in tokens
Trainer Configuration
Number of training epochs
Validation frequency (in steps)
Checkpoint save frequency (in steps). In async mode, must be a multiple of
trigger_parameter_sync_stepExperiment name (used for promoted model IDs:
{experiment_name}-step-{step})Fireworks Infrastructure
Thefireworks_infra section controls provisioning. rLLM mirrors key training knobs into this document before calling the cookbook’s init_fireworks_infra. You typically configure shapes and replica counts via fireworks_config rather than editing fireworks_infra directly.
Key provisioning fields:
Attach to an existing deployment (
null creates a new one per run)Attach to an existing trainer job (
null creates a new job)Timeout for weight hot-loading into the deployment (seconds)
Fireworks API base URL
cleanup_on_close=True, cleanup_existing=True). Each run provisions fresh trainer and deployment resources unless you explicitly set job_id or deployment_id to reattach.
LoRA Training
LoRA is the default path (model.lora_rank=32). With LoRA, the frozen reference policy is reused from the policy trainer — leave reference_trainer_replica_count=0.
model.lora_rank=0), enable a reference trainer for KL divergence:
Sampling Configuration
Rollout sampling usesrllm.rollout (not a separate sampling block):
Training sampling temperature
Training top-p (nucleus) sampling
Validation sampling temperature
Validation top-p sampling
Rollout Engine Configuration
Reasoning effort level:
"low", "medium", "high"Accumulate reasoning tokens across steps
Disable thinking tokens in responses
Bypass renderer and use parser directly
Optional renderer name for chat template rendering
Concurrency
Fireworks rollout concurrency is controlled via the cookbookConcurrencyConfig:
Concurrency mode:
"adaptive" or "fixed"Starting window for adaptive mode (
null = 8 × replica count)Maximum concurrent requests
Target prefill queue duration (seconds) for adaptive mode
Algorithm Configuration
Fireworks uses server-side builtin loss kernels. Configure viarllm.algorithm:
Advantage estimator:
"grpo", "reinforce", etc.Policy loss:
null (GRPO default), "dapo", "cispo", or "gspo"Loss aggregation:
null (backend default), "token-mean", "seq-mean-token-sum", or "seq-mean-token-mean"Router replay mode:
"disabled" or "R3" ("R2" is not supported)When
true, rollout logprobs are used as proximal policy (decoupled PPO bypass). Set false for active TIS correctionTruncated importance sampling mode:
null, "token", or "sequence" (requires bypass_mode=false)Async Training
Async training overlaps rollouts with policy updates for higher throughput:When async training is enabled,
save_freq must be a multiple of trigger_parameter_sync_step. Checkpoint promotion requires a sampler snapshot created at sync time.Checkpointing
Fireworks checkpoints are stored as DCP (Distributed Checkpoint) on the trainer job. After each sync step, weights are hot-loaded to the inference deployment. When saving, checkpoints can be promoted to a Fireworks model ID.Automatic Checkpointing
Resume from Checkpoint
Resume from the latest DCP on the current or source job:{experiment_name}-step-{global_step}.
Limitations
| Feature | Fireworks support |
|---|---|
fuse_forward_backward_and_optim_step | ❌ Not supported |
router_replay: R2 | ❌ Use R3 or disabled |
router_replay: R3 | ✅ Supported |
Distillation (adv_estimator: distill) | ❌ Not documented / tested |
| Full-parameter + KL | ✅ Requires reference trainer |
| LoRA + KL | ✅ Policy reused as reference |
Example Configuration
Complete configuration for countdown workflow training:config.yaml
Performance Optimization
Enable async training
Set
rllm.async_training.enable=true to overlap rollouts and policy updatesTune concurrency
Increase
concurrency.max_window and deployment replica_count for higher rollout throughputUse LoRA
LoRA training (
model.lora_rank > 0) is faster and avoids provisioning a reference trainerParallel workflows
Increase
rllm.workflow.n_parallel_tasks for workflow-based trainingTroubleshooting
FIREWORKS_API_KEY not set
FIREWORKS_API_KEY not set
Export your API key before launching training:
fuse_forward_backward_and_optim_step error
fuse_forward_backward_and_optim_step error
The Fireworks backend does not support fused optimizer steps:
save_freq / sync interval mismatch
save_freq / sync interval mismatch
In async mode,
save_freq must be a multiple of trigger_parameter_sync_step:Sampling parameter warning
Sampling parameter warning
Keep rollout temperature and top_p at 1.0:
Resources still running after crash
Resources still running after crash
If training exits abnormally before
shutdown(), trainer jobs and deployments may keep running on Fireworks. Reattach with job_id / deployment_id or delete them from the Fireworks console.router_replay R2 not supported
router_replay R2 not supported
Use
R3 or disabled:Comparison with tinker
| Feature | Fireworks | tinker |
|---|---|---|
| Infrastructure | Managed (trainer job + deployment) | Local or remote tinker service |
| Weight sync | WeightSyncer hot-load to deployment | Tinker sampler paths |
| Checkpoint format | DCP on trainer job + model promotion | Tinker checkpoint URIs |
| Fused optim step | ❌ Not supported | ✅ Supported |
| API key / account | FIREWORKS_API_KEY required | Optional (local service) |
| Server-side losses | Firetitan builtin kernels | Tinker forward-backward |
| Async training | ✅ Recommended | ✅ Supported |
See Also
tinker Backend
Local or remote tinker service training
verl Backend
Distributed training with verl
Backend Comparison
Compare training backends
Unified Trainer
Learn about the unified trainer architecture
Fireworks Training Cookbook
Official Fireworks training cookbook

