A vision-language agent that solves geometry problems with diagram images via the AgentFlow protocol. Trains on the Geometry3K dataset.Documentation Index
Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt
Use this file to discover all available pages before exploring further.
Pattern
| Aspect | Value |
|---|---|
| Loop shape | Single-turn (one VLM call per task) |
| Tools | None — answer is parsed out of the response |
| Inputs | Multimodal — text question + base64-encoded diagram image |
| Termination | Single LLM call returns; evaluator extracts \boxed{…} |
| Reward shape | 1.0 if boxed answer matches ground truth (symbolic math), else 0.0 |
Architecture
messages list contains a {"type": "image_url", "image_url": {"url": f"data:image/png;base64,…"}} block alongside the text content.
Install
Dataset
Eval
Training
Files
| File | Description |
|---|---|
geo3k_flow.py | Single-turn VLM AgentFlow with multimodal content blocks |
evaluator.py | \boxed{} extraction + symbolic math grading |
train.py + train_{tinker,verl}.sh | Hydra entry points |
pyproject.toml | Plugin entry-point declarations |
test.py | Unit tests |
On GitHub
cookbooks/geo3k
Full source, README, and runnable launch scripts

