Documentation Index Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt
Use this file to discover all available pages before exploring further.
A single-turn coding agent for competition-style programming problems. The model emits reasoning followed by a fenced ```python block; the evaluator extracts the last block and runs it against hidden test cases.
Pattern
Aspect Value Loop shape Single-turn (one LLM call per task) Tools None — code is parsed out of the response Termination Single LLM call returns; evaluator runs hidden tests Reward shape 1.0 if all hidden tests pass, 0.0 otherwise
Architecture
AgentFlow.run(task, config)
│
├── one LLM call via OpenAI(base_url=config.base_url)
│ model outputs reasoning + ```python ... ```
│
└── store full response in episode.artifacts["answer"]
Evaluator.evaluate(task, episode)
│
└── RewardCodeFn extracts last ```python``` block, runs against
task.metadata["ground_truth"] (hidden tests)
Long chain-of-thought reasoning happens inside the assistant message — there is no multi-turn revise/feedback loop. This matches the original deepcoder training setup.
Install
uv pip install -e ".[tinker]" # rllm + tinker backend
uv pip install --no-deps -e cookbooks/deepcoder # this cookbook
rllm agent list # should show "deepcoder"
Dataset
python cookbooks/deepcoder/prepare_data.py
# Smoke-size:
python cookbooks/deepcoder/prepare_data.py --train-size 200 --test-size 50
This pulls agentica-org/DeepCoder-Preview-Dataset (primeintellect + taco + lcbv5 train; codeforces + lcbv5 test) and normalizes the test schemas (TACO’s nested dict → flat list).
Eval
rllm eval deepcoder \
--agent deepcoder \
--evaluator deepcoder \
--model agentica-org/DeepCoder-14B-Preview \
--base-url http://localhost:8000/v1 \
--split test \
--max-examples 20
Verified end-to-end: rllm eval deepcoder --max-examples 10 against gpt-5.4-mini reports 5/10 correct (50% accuracy) with per-item rewards split mixed 1.0 / 0.0.
Training
# Tinker (single-machine LoRA)
bash cookbooks/deepcoder/train_tinker.sh
# Verl (distributed GPU)
bash cookbooks/deepcoder/train_verl.sh
Key code
The flow is a single LLM call — all the reasoning lives inside the one assistant message:
@rllm.rollout ( name = "deepcoder" )
async def deepcoder_flow ( task : Task, config : AgentConfig) -> Episode:
question = str (task.metadata.get( "question" ) or task.instruction or "" )
client = AsyncOpenAI( base_url = config.base_url, api_key = "EMPTY" )
messages = [
{ "role" : "system" , "content" : SYSTEM_PROMPT },
{ "role" : "user" , "content" : question},
]
resp = await client.chat.completions.create(
model = config.model, messages = messages,
temperature = 0.6 , max_tokens = 16384 , timeout = 600 ,
)
content = resp.choices[ 0 ].message.content or ""
messages.append({ "role" : "assistant" , "content" : content})
step = Step( chat_completions = list (messages), model_response = content, action = content, thought = content)
return Episode(
trajectories = [Trajectory( name = "deepcoder" , steps = [step])],
artifacts = { "answer" : content},
)
The evaluator delegates to rllm.rewards.code_reward.RewardCodeFn, which runs the extracted code against the hidden tests in a sandboxed subprocess.
Files
File Description deepcoder_flow.pyThe single-turn AgentFlow evaluator.pyWraps RewardCodeFn for hidden-test grading prepare_data.pyPull + normalize Deepcoder splits via DatasetRegistry train.py + train_{tinker,verl}.shHydra entry points pyproject.tomlPlugin entry-point declarations test.py5 unit tests covering correct / wrong / no-fence / multi-block / Task vs dict
On GitHub
cookbooks/deepcoder Full source, README, and runnable launch scripts