Documentation Index Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt
Use this file to discover all available pages before exploring further.
A multi-turn agent flow that trains a model to navigate procedurally-generated FrozenLake puzzles via the AgentFlow protocol . This is the cookbook to copy if your agent drives a Gym-style environment.
Pattern
Aspect Value Loop shape Multi-turn (up to max_steps per puzzle) Tools None — the gym env IS the action space (Up/Down/Left/Right) State Per-task: a freshly-seeded gymnasium.make("FrozenLake-v1", desc=…) Termination Goal reached, hole reached, or max_steps exhausted Reward shape Per-task scalar — 1.0 if won, 0.0 otherwise
Architecture
AgentFlow.run(task, config)
│
├── generate_random_map(seed, size, p) # deterministic, in-process
├── env = gymnasium.make("FrozenLake-v1", …)
└── Multi-turn loop (up to max_steps turns)
│
├── client.chat.completions.create(...) # render grid as text, ask for action
├── parse_action() → env.step(action)
└── repeat until terminated / truncated / max_steps
│
└── episode.artifacts = {"won": bool, "turns": int, "last_action": str}
The cookbook is fully self-contained — there’s no dependency on rllm.environments. The map is regenerated deterministically from (seed, size, p) every time the flow runs, so the dataset stores only those parameters.
Install
uv pip install -e ".[tinker]" # rllm + tinker backend
uv pip install --no-deps -e cookbooks/frozenlake # this cookbook (gymnasium pulled in transitively)
rllm agent list # should show "frozenlake"
Dataset
Procedurally generated — no download. Run once:
python cookbooks/frozenlake/prepare_data.py
# Or with custom sizes:
python cookbooks/frozenlake/prepare_data.py --train-size 5000 --test-size 200 --slippery
Registers frozenlake/{train, test} with DatasetRegistry.
Eval
rllm eval frozenlake \
--agent frozenlake \
--evaluator frozenlake \
--model Qwen/Qwen3-4B-Instruct-2507 \
--base-url http://localhost:8000/v1 \
--split test \
--max-examples 20
Training
# Single-machine LoRA (tinker)
bash cookbooks/frozenlake/train_tinker.sh
# Distributed multi-GPU (verl)
bash cookbooks/frozenlake/train_verl.sh
Or via the CLI with default knobs:
rllm train frozenlake \
--agent frozenlake \
--evaluator frozenlake \
--model Qwen/Qwen3-4B-Instruct-2507 \
--group-size 8 \
--batch-size 32 \
--lora-rank 32
Key code
The flow body is straightforward — drive env.step with whatever the model emits in triple-backticks:
@rllm.rollout ( name = "frozenlake" )
async def frozenlake_flow ( task : Task, config : AgentConfig) -> Episode:
meta = task.metadata or {}
desc = generate_random_map( size = meta[ "size" ], p = meta[ "p" ], seed = meta[ "seed" ])
env = gym.make( "FrozenLake-v1" , desc = desc, is_slippery = meta[ "is_slippery" ])
env.reset( seed = meta[ "seed" ])
client = AsyncOpenAI( base_url = config.base_url, api_key = "EMPTY" )
messages = [
{ "role" : "system" , "content" : SYSTEM_PROMPT },
{ "role" : "user" , "content" : render_first_turn(env, max_turns)},
]
steps, won = [], False
for turn in range (max_turns):
resp = await client.chat.completions.create( model = config.model, messages = messages, ... )
content = resp.choices[ 0 ].message.content or ""
action = parse_action(content) # e.g. "```Up```" → 3
messages.append({ "role" : "assistant" , "content" : content})
steps.append(Step( chat_completions = list (messages), action = _ACTION_LABELS .get(action), …))
if action is None :
messages.append({ "role" : "user" , "content" : "Please reply with a valid action…" })
continue
_, reward, terminated, truncated, _ = env.step(action)
if terminated:
won = float (reward) > 0
break
if truncated:
break
messages.append({ "role" : "user" , "content" : render_next_turn(env, turn + 1 )})
return Episode(
trajectories = [Trajectory( name = "frozenlake" , steps = steps)],
artifacts = { "won" : won, "turns" : len (steps)},
is_correct = won,
)
Files
File Description frozenlake_flow.pyThe AgentFlow + map generator + action parser evaluator.pyReads artifacts["won"] → EvalOutput prepare_data.pyGenerates (seed, size, p) rows + registers via DatasetRegistry train.py + train_{tinker,verl}.shHydra entry points pyproject.tomlPlugin entry-point declarations test.py12 unit tests (map gen, parsing, rendering, evaluator)
On GitHub
cookbooks/frozenlake Full source, README, and runnable launch scripts