Overview
FrozenLake is a classic RL environment where:- Agent navigates a frozen lake grid (4x4 or 8x8)
- Goal is to reach the frisbee without falling into holes
- Slippery surface adds stochasticity to actions
- Discrete action space: UP, DOWN, LEFT, RIGHT
- How to use rLLM’s
FrozenLakeAgentfor gridworld navigation - Training with discrete action spaces
- Handling stochastic environments
- Evaluating RL agents with success rate metrics
Prerequisites
- rLLM framework installed
- vLLM or SGLang for model serving
- Base model:
Qwen/Qwen3-4B(or similar)
Setup
Prepare the environment data
Generate FrozenLake environment configurations:This will create train and test datasets with different FrozenLake configurations.
Running the Agent
Once the dataset is prepared and model server is running:Code Implementation
Here’s the core implementation fromrun_frozenlake_agent.py:
Expected Output
The agent will attempt to navigate the FrozenLake grid:Training the Agent
To train your own FrozenLake agent:Training Configuration
- Model: Qwen/Qwen3-4B
- Algorithm: PPO (Proximal Policy Optimization)
- Max Steps: 10 per episode
- Environment: 4x4 FrozenLake grid
- Slippery: False (deterministic) or True (stochastic)
Training Script
The training script uses rLLM’s standard training pipeline:Environment Variations
Deterministic FrozenLake
Stochastic FrozenLake
Key Concepts
Gridworld Navigation
The FrozenLake environment teaches agents:- Sequential decision making
- Planning optimal paths
- Handling stochasticity (when
is_slippery=True) - Sparse reward signals (only rewarded at goal)
Action Space
The agent has 4 discrete actions:- 0: Move LEFT
- 1: Move DOWN
- 2: Move RIGHT
- 3: Move UP
Observation Space
The agent receives:- Current position in the grid
- Grid layout (frozen tiles, holes, goal)
- Previous action history (if
use_accumulate_history=True)
Advanced Usage
Eval Protocol Integration
For more advanced FrozenLake workflows with Eval Protocol integration, see the Eval Protocol FrozenLake example.Next Steps
- Try the Math Agent example for tool-based reasoning
- Explore SDK examples for simplified workflows
- Learn about building custom agents

