V1: Parallel Self-Verification

Paper

arXiv:2603.04304

V1 is a framework that improves how models verify multiple solution candidates during inference. Instead of scoring solutions individually, V1 leverages pairwise self-verification — where models compare two candidates head-to-head — combined with a tournament-based ranking algorithm to efficiently allocate verification compute. The training method jointly develops generation and verification capabilities, achieving improvements of up to 10% on code generation and math reasoning benchmarks.

Experiential Reinforcement Learning

⌘I

Case studies

Documentation Index

Paper