Skip to main content

Paper

arXiv:2603.04304
V1 is a framework that improves how models verify multiple solution candidates during inference. Instead of scoring solutions individually, V1 leverages pairwise self-verification — where models compare two candidates head-to-head — combined with a tournament-based ranking algorithm to efficiently allocate verification compute. The training method jointly develops generation and verification capabilities, achieving improvements of up to 10% on code generation and math reasoning benchmarks.