Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt

Use this file to discover all available pages before exploring further.

Paper

arXiv:2603.04304
V1 is a framework that improves how models verify multiple solution candidates during inference. Instead of scoring solutions individually, V1 leverages pairwise self-verification — where models compare two candidates head-to-head — combined with a tournament-based ranking algorithm to efficiently allocate verification compute. The training method jointly develops generation and verification capabilities, achieving improvements of up to 10% on code generation and math reasoning benchmarks.