Model
rLLM-FinQA-4B on HuggingFace
Dataset
5,110 Q&A pairs across 207 companies
Blog
Read the full announcement
Performance
rLLM-FinQA-4B achieves 59.7% accuracy on Snorkel Finance Benchmark, demonstrating that small models trained with RL can outperform much larger models:| Model | Parameters | Accuracy |
|---|---|---|
| rLLM-FinQA-4B | 4B | 59.7% |
| Gemini 2.5 Pro | Unknown | 60.6% |
| Qwen3-235B | 235B | 51.4% |
The 4B agent outperforms Qwen3-235B by 8.3 percentage points and rivals Gemini 2.5 Pro on Snorkel AI’s expert-curated agentic financial benchmark.
Overview
The FinQA project demonstrates:- How to use rLLM’s
ToolAgentandToolEnvironmentfor multi-step financial reasoning - How to build domain-specific tools in rLLM
- How to train agents with GRPO using LLM-as-judge rewards
- How to achieve state-of-the-art performance with small models using RL
Agent Architecture
The FinQA agent is a ReAct-style tool agent that answers financial questions by querying structured tables extracted from SEC 10-K filings. The agent has access to 4 specialized tools:| Tool | Description |
|---|---|
get_table_names | List available tables for a given company |
get_table_info | Get table metadata, columns, dtypes, and sample values |
sql_query | Execute SQL queries on in-memory SQLite tables |
calculator | Evaluate mathematical expressions |
Quick Start
Installation
Follow the installation guide, then install FinQA dependencies:Dataset Preparation
Download the rLLM/finqa dataset and prepare it for training and evaluation:- Download the dataset from HuggingFace (5,110 Q&A pairs)
- Extract company tables to
projects/finqa/data/company_tables/(207 companies, 6,923 tables) - Create train/val/test splits (4,030 / 522 / 558 examples)
- Register all splits with the rLLM DatasetRegistry
Inference
Start a vLLM server and run the agent:Training
Set the required environment variables before training:| Variable | Description |
|---|---|
OPENAI_API_KEY | OpenAI API key for the reward judge |
PORTKEY_API_KEY | Portkey gateway key for reward judge caching |
Training with verl Backend
Train the 4B model with the verl backend:Training with tinker Backend
Train with LoRA on the 30B model using the tinker backend:Implementation Details
Base Model
- Qwen3-4B-Instruct-2507
- Alternative: Qwen3-30B-A3B-Instruct-2507 with LoRA
Dataset
- Source: rLLM/finqa on HuggingFace
- Size: 5,110 Q&A pairs across 207 companies
- Tables: 6,923 tables extracted from SEC 10-K filings
- Splits: 4,030 train / 522 validation / 558 test examples
Training Configuration
- Algorithm: GRPO (Group Relative Policy Optimization)
- Reward: LLM-as-judge using GPT-5-nano
- Caching: Portkey gateway for reward caching
- Backend: verl (default) or tinker (for LoRA)
Code Reference
Financial Agent Runner
Main script for running financial reasoning:projects/finqa/run_finqa.py
Training Script
FinQA training configuration:projects/finqa/train_finqa.py
Resources
Model on HuggingFace
Download rLLM-FinQA-4B weights
Dataset on HuggingFace
Access the FinQA dataset
Blog Post
Read the announcement blog
GitHub Project
View complete source code

