Overview
The DeepCoder example demonstrates:- How to use rLLM’s
CompetitionCodingAgentfor programming tasks - How to train agents with iterative context lengthening (16K → 32K)
- How to evaluate coding performance on LiveCodeBench
- Scaling RL for competitive programming
Prerequisites
- rLLM framework installed
- vLLM or SGLang for model serving
- Pre-trained model:
agentica-org/DeepCoder-14B-Preview - GPU with sufficient memory for 16K-32K context lengths
Setup
Prepare coding datasets
Download and prepare coding competition datasets:This will download:
- LiveCodeBench (evaluation)
- Competitive programming problems (training)
Running DeepCoder
Execute the coding agent for evaluation:Code Implementation
Expected Results
DeepCoder-14B-Preview on LiveCodeBench v5:| Metric | Performance |
|---|---|
| Pass@1 | 60.6% |
| Improvement over base | +8.0% |
Training DeepCoder
Train your own DeepCoder agent with iterative context lengthening:Step 1: Train with 16K context
Step 2: Train with 32K context
ModifyMODEL_PATH in the script to point to your 16K checkpoint:
Training Configuration
Key hyperparameters:- Base Model: DeepSeek-R1-Distill-Qwen-14B
- Algorithm: GRPO (Group Relative Policy Optimization)
- Training Dataset: Competitive programming problems
- Evaluation Dataset: LiveCodeBench v5
- Batch Size: 32
- Learning Rate: 1e-6
- Context Progression: 16K → 32K
- Sampling: n=8 candidates per problem
- Temperature: 0.6
Training Script Structure
Iterative Context Lengthening
DeepCoder uses curriculum learning:- 16K Phase: Learn basic problem-solving patterns
- 32K Phase: Handle complex multi-function implementations
Key Features
Long-Form Code Generation
The model generates complete, executable solutions:Test-Time Scaling
DeepCoder improves with more samples:Code Execution and Validation
Thecode_reward_fn automatically:
- Extracts code from the response
- Executes against test cases
- Returns pass/fail reward signal
Monitoring Training
Training logs to WandB. Key metrics:| Metric | Description |
|---|---|
critic/score/mean | Average pass rate per batch |
val/pass@1 | LiveCodeBench Pass@1 accuracy |
train/response_length | Average code length |
train/compilation_rate | Fraction of syntactically valid code |
Evaluation on LiveCodeBench
For comprehensive evaluation:- Run the agent on full LiveCodeBench
- Execute generated code against test cases
- Compute Pass@1 and Pass@K metrics
Next Steps
- Explore DeepSWE for software engineering tasks
- Try DeepScaleR for mathematical reasoning
- Learn about RL algorithms

