Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt

Use this file to discover all available pages before exploring further.

A vision-language agent that solves geometry problems with diagram images via the AgentFlow protocol. Trains on the Geometry3K dataset.

Pattern

AspectValue
Loop shapeSingle-turn (one VLM call per task)
ToolsNone — answer is parsed out of the response
InputsMultimodal — text question + base64-encoded diagram image
TerminationSingle LLM call returns; evaluator extracts \boxed{…}
Reward shape1.0 if boxed answer matches ground truth (symbolic math), else 0.0

Architecture

AgentFlow.run(task, config)

  └── Solver
        └── OpenAI(base_url=config.base_url).chat.completions.create(
                messages=[system_prompt, {images + question}]
            )
            → Trajectory(name="solver", steps=[Step(action=response)])

  └── Episode(trajectories=[solver], artifacts={"answer": response})
The cookbook demonstrates the multimodal content-block pattern in an AgentFlow — the messages list contains a {"type": "image_url", "image_url": {"url": f"data:image/png;base64,…"}} block alongside the text content.

Install

uv pip install -e ".[tinker]"                    # rllm + tinker backend
uv pip install --no-deps -e cookbooks/geo3k      # this cookbook
rllm agent list                                  # should show "geo3k"

Dataset

rllm dataset pull geo3k

Eval

rllm eval geo3k \
    --agent geo3k \
    --evaluator geo3k \
    --model Qwen/Qwen3-VL-8B-Instruct \
    --base-url http://localhost:8000/v1 \
    --max-examples 20

Training

# Tinker (single-machine LoRA)
bash cookbooks/geo3k/train_tinker.sh

# Verl (distributed GPU)
bash cookbooks/geo3k/train_verl.sh

Files

FileDescription
geo3k_flow.pySingle-turn VLM AgentFlow with multimodal content blocks
evaluator.py\boxed{} extraction + symbolic math grading
train.py + train_{tinker,verl}.shHydra entry points
pyproject.tomlPlugin entry-point declarations
test.pyUnit tests

On GitHub

cookbooks/geo3k

Full source, README, and runnable launch scripts