FrozenLake RL Environment

This example demonstrates how to train an RL agent on the classic FrozenLake environment. FrozenLake is a simple gridworld navigation task that serves as an excellent introduction to reinforcement learning concepts in rLLM.

Overview

FrozenLake is a classic RL environment where:

Agent navigates a frozen lake grid (4x4 or 8x8)
Goal is to reach the frisbee without falling into holes
Slippery surface adds stochasticity to actions
Discrete action space: UP, DOWN, LEFT, RIGHT

This example demonstrates:

How to use rLLM’s FrozenLakeAgent for gridworld navigation
Training with discrete action spaces
Handling stochastic environments
Evaluating RL agents with success rate metrics

Prerequisites

rLLM framework installed
vLLM or SGLang for model serving
Base model: Qwen/Qwen3-4B (or similar)

Setup

Prepare the environment data

Generate FrozenLake environment configurations:

cd examples/frozenlake
python prepare_frozenlake_data.py

This will create train and test datasets with different FrozenLake configurations.

Start the model server

Launch a vLLM server for the base model:

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-4B \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16

The server will be accessible at http://localhost:30000/v1

Running the Agent

Once the dataset is prepared and model server is running:

cd examples/frozenlake
python run_frozenlake_agent.py

Code Implementation

Here’s the core implementation from run_frozenlake_agent.py:

import asyncio
from transformers import AutoTokenizer
from rllm.agents.frozenlake_agent import FrozenLakeAgent
from rllm.data.dataset import DatasetRegistry
from rllm.engine.agent_execution_engine import AgentExecutionEngine
from rllm.environments.frozenlake.frozenlake import FrozenLakeEnv
from rllm.utils import compute_pass_at_k

model_name = "Qwen/Qwen3-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Agent configuration
agent_args = {
    "max_steps": 10,
    "use_accumulate_history": True,
}

# Environment configuration
env_args = {
    "max_steps": 8,
    "is_slippery": False,  # Set to True for stochastic version
}

sampling_params = {"temperature": 0.6, "top_p": 0.95, "model": model_name}

# Create execution engine
engine = AgentExecutionEngine(
    agent_class=FrozenLakeAgent,
    env_class=FrozenLakeEnv,
    agent_args=agent_args,
    env_args=env_args,
    engine_name="openai",
    tokenizer=tokenizer,
    sampling_params=sampling_params,
    rollout_engine_args={
        "base_url": "http://localhost:30000/v1",
        "api_key": "None",
    },
    max_response_length=16384,
    max_prompt_length=4096,
    n_parallel_agents=256,
)

# Load test data and execute
test_dataset = DatasetRegistry.load_dataset("frozenlake", "test")
tasks = test_dataset.get_data()

results = asyncio.run(engine.execute_tasks(tasks))
compute_pass_at_k(results)

Expected Output

The agent will attempt to navigate the FrozenLake grid:

Evaluating on 100 FrozenLake environments...

Results:
  Total tasks: 100
  Success rate: 0.85
  Average steps: 6.3
  Timeout rate: 0.02

Training the Agent

To train your own FrozenLake agent:

bash examples/frozenlake/train_frozenlake_agent.sh

Training Configuration

Model: Qwen/Qwen3-4B
Algorithm: PPO (Proximal Policy Optimization)
Max Steps: 10 per episode
Environment: 4x4 FrozenLake grid
Slippery: False (deterministic) or True (stochastic)

Training Script

The training script uses rLLM’s standard training pipeline:

import hydra
from rllm.data.dataset import DatasetRegistry
from rllm.trainer.agent_trainer import AgentTrainer
from rllm.agents.frozenlake_agent import FrozenLakeAgent
from rllm.environments.frozenlake.frozenlake import FrozenLakeEnv

@hydra.main(
    config_path="pkg://rllm.trainer.config",
    config_name="agent_ppo_trainer",
    version_base=None
)
def main(config):
    train_dataset = DatasetRegistry.load_dataset("frozenlake", "train")
    test_dataset = DatasetRegistry.load_dataset("frozenlake", "test")

    trainer = AgentTrainer(
        config=config,
        train_dataset=train_dataset,
        val_dataset=test_dataset,
        agent_class=FrozenLakeAgent,
        env_class=FrozenLakeEnv,
    )
    trainer.train()

if __name__ == "__main__":
    main()

Environment Variations

Deterministic FrozenLake

env_args = {
    "max_steps": 8,
    "is_slippery": False,  # Actions always succeed as intended
}

Stochastic FrozenLake

env_args = {
    "max_steps": 8,
    "is_slippery": True,  # Actions may slip in random directions
}

Key Concepts

The FrozenLake environment teaches agents:

Sequential decision making
Planning optimal paths
Handling stochasticity (when is_slippery=True)
Sparse reward signals (only rewarded at goal)

Action Space

The agent has 4 discrete actions:

0: Move LEFT
1: Move DOWN
2: Move RIGHT
3: Move UP

Observation Space

The agent receives:

Current position in the grid
Grid layout (frozen tiles, holes, goal)
Previous action history (if use_accumulate_history=True)

Advanced Usage

Eval Protocol Integration

For more advanced FrozenLake workflows with Eval Protocol integration, see the Eval Protocol FrozenLake example.

Next Steps

Try the Math Agent example for tool-based reasoning
Explore SDK examples for simplified workflows
Learn about building custom agents

Getting started

Advanced examples

FrozenLake RL Environment

Overview

Prerequisites

Setup

Running the Agent

Code Implementation

Expected Output

Training the Agent

Training Configuration

Training Script

Environment Variations

Deterministic FrozenLake

Stochastic FrozenLake

Key Concepts

Gridworld Navigation

Action Space

Observation Space

Advanced Usage

Eval Protocol Integration

Next Steps

Resources

Getting started

Advanced examples

​Overview

​Prerequisites

​Setup

​Running the Agent

​Code Implementation

​Expected Output

​Training the Agent

​Training Configuration

​Training Script

​Environment Variations

​Deterministic FrozenLake

​Stochastic FrozenLake

​Key Concepts

​Gridworld Navigation

​Action Space

​Observation Space

​Advanced Usage

​Eval Protocol Integration

​Next Steps

​Resources

Overview

Prerequisites

Setup

Running the Agent

Code Implementation

Expected Output

Training the Agent

Training Configuration

Training Script

Environment Variations

Deterministic FrozenLake

Stochastic FrozenLake

Key Concepts

Gridworld Navigation

Action Space

Observation Space

Advanced Usage

Eval Protocol Integration

Next Steps

Resources