Skip to main content
This example demonstrates how to train an RL agent on the classic FrozenLake environment. FrozenLake is a simple gridworld navigation task that serves as an excellent introduction to reinforcement learning concepts in rLLM.

Overview

FrozenLake is a classic RL environment where:
  • Agent navigates a frozen lake grid (4x4 or 8x8)
  • Goal is to reach the frisbee without falling into holes
  • Slippery surface adds stochasticity to actions
  • Discrete action space: UP, DOWN, LEFT, RIGHT
This example demonstrates:
  • How to use rLLM’s FrozenLakeAgent for gridworld navigation
  • Training with discrete action spaces
  • Handling stochastic environments
  • Evaluating RL agents with success rate metrics

Prerequisites

  • rLLM framework installed
  • vLLM or SGLang for model serving
  • Base model: Qwen/Qwen3-4B (or similar)

Setup

1

Prepare the environment data

Generate FrozenLake environment configurations:
cd examples/frozenlake
python prepare_frozenlake_data.py
This will create train and test datasets with different FrozenLake configurations.
2

Start the model server

Launch a vLLM server for the base model:
python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-4B \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16
The server will be accessible at http://localhost:30000/v1

Running the Agent

Once the dataset is prepared and model server is running:
cd examples/frozenlake
python run_frozenlake_agent.py

Code Implementation

Here’s the core implementation from run_frozenlake_agent.py:
import asyncio
from transformers import AutoTokenizer
from rllm.agents.frozenlake_agent import FrozenLakeAgent
from rllm.data.dataset import DatasetRegistry
from rllm.engine.agent_execution_engine import AgentExecutionEngine
from rllm.environments.frozenlake.frozenlake import FrozenLakeEnv
from rllm.utils import compute_pass_at_k

model_name = "Qwen/Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Agent configuration
agent_args = {
    "max_steps": 10,
    "use_accumulate_history": True,
}

# Environment configuration
env_args = {
    "max_steps": 8,
    "is_slippery": False,  # Set to True for stochastic version
}

sampling_params = {"temperature": 0.6, "top_p": 0.95, "model": model_name}

# Create execution engine
engine = AgentExecutionEngine(
    agent_class=FrozenLakeAgent,
    env_class=FrozenLakeEnv,
    agent_args=agent_args,
    env_args=env_args,
    engine_name="openai",
    tokenizer=tokenizer,
    sampling_params=sampling_params,
    rollout_engine_args={
        "base_url": "http://localhost:30000/v1",
        "api_key": "None",
    },
    max_response_length=16384,
    max_prompt_length=4096,
    n_parallel_agents=256,
)

# Load test data and execute
test_dataset = DatasetRegistry.load_dataset("frozenlake", "test")
tasks = test_dataset.get_data()

results = asyncio.run(engine.execute_tasks(tasks))
compute_pass_at_k(results)

Expected Output

The agent will attempt to navigate the FrozenLake grid:
Evaluating on 100 FrozenLake environments...

Results:
  Total tasks: 100
  Success rate: 0.85
  Average steps: 6.3
  Timeout rate: 0.02

Training the Agent

To train your own FrozenLake agent:
bash examples/frozenlake/train_frozenlake_agent.sh

Training Configuration

  • Model: Qwen/Qwen3-4B
  • Algorithm: PPO (Proximal Policy Optimization)
  • Max Steps: 10 per episode
  • Environment: 4x4 FrozenLake grid
  • Slippery: False (deterministic) or True (stochastic)

Training Script

The training script uses rLLM’s standard training pipeline:
import hydra
from rllm.data.dataset import DatasetRegistry
from rllm.trainer.agent_trainer import AgentTrainer
from rllm.agents.frozenlake_agent import FrozenLakeAgent
from rllm.environments.frozenlake.frozenlake import FrozenLakeEnv

@hydra.main(
    config_path="pkg://rllm.trainer.config",
    config_name="agent_ppo_trainer",
    version_base=None
)
def main(config):
    train_dataset = DatasetRegistry.load_dataset("frozenlake", "train")
    test_dataset = DatasetRegistry.load_dataset("frozenlake", "test")

    trainer = AgentTrainer(
        config=config,
        train_dataset=train_dataset,
        val_dataset=test_dataset,
        agent_class=FrozenLakeAgent,
        env_class=FrozenLakeEnv,
    )
    trainer.train()

if __name__ == "__main__":
    main()

Environment Variations

Deterministic FrozenLake

env_args = {
    "max_steps": 8,
    "is_slippery": False,  # Actions always succeed as intended
}

Stochastic FrozenLake

env_args = {
    "max_steps": 8,
    "is_slippery": True,  # Actions may slip in random directions
}

Key Concepts

Gridworld Navigation

The FrozenLake environment teaches agents:
  • Sequential decision making
  • Planning optimal paths
  • Handling stochasticity (when is_slippery=True)
  • Sparse reward signals (only rewarded at goal)

Action Space

The agent has 4 discrete actions:
  • 0: Move LEFT
  • 1: Move DOWN
  • 2: Move RIGHT
  • 3: Move UP

Observation Space

The agent receives:
  • Current position in the grid
  • Grid layout (frozen tiles, holes, goal)
  • Previous action history (if use_accumulate_history=True)

Advanced Usage

Eval Protocol Integration

For more advanced FrozenLake workflows with Eval Protocol integration, see the Eval Protocol FrozenLake example.

Next Steps

Resources