Skip to main content
This project demonstrates training and deploying rLLM-FinQA-4B, a specialized financial question-answering agent fine-tuned from Qwen3-4B-Instruct-2507 using rLLM. The agent uses specialized tools (SQL queries, table lookup, calculators) to perform multi-step reasoning over SEC 10-K financial statements.

Performance

rLLM-FinQA-4B achieves 59.7% accuracy on Snorkel Finance Benchmark, demonstrating that small models trained with RL can outperform much larger models:
ModelParametersAccuracy
rLLM-FinQA-4B4B59.7%
Gemini 2.5 ProUnknown60.6%
Qwen3-235B235B51.4%
The 4B agent outperforms Qwen3-235B by 8.3 percentage points and rivals Gemini 2.5 Pro on Snorkel AI’s expert-curated agentic financial benchmark.

Overview

The FinQA project demonstrates:
  • How to use rLLM’s ToolAgent and ToolEnvironment for multi-step financial reasoning
  • How to build domain-specific tools in rLLM
  • How to train agents with GRPO using LLM-as-judge rewards
  • How to achieve state-of-the-art performance with small models using RL

Agent Architecture

The FinQA agent is a ReAct-style tool agent that answers financial questions by querying structured tables extracted from SEC 10-K filings. The agent has access to 4 specialized tools:
ToolDescription
get_table_namesList available tables for a given company
get_table_infoGet table metadata, columns, dtypes, and sample values
sql_queryExecute SQL queries on in-memory SQLite tables
calculatorEvaluate mathematical expressions
All table data is preloaded into in-memory SQLite for low latency runtime access.

Quick Start

Installation

Follow the installation guide, then install FinQA dependencies:
uv pip install -r projects/finqa/requirements.txt

Dataset Preparation

Download the rLLM/finqa dataset and prepare it for training and evaluation:
python -m projects.finqa.prepare_finqa_data
This will:
  • Download the dataset from HuggingFace (5,110 Q&A pairs)
  • Extract company tables to projects/finqa/data/company_tables/ (207 companies, 6,923 tables)
  • Create train/val/test splits (4,030 / 522 / 558 examples)
  • Register all splits with the rLLM DatasetRegistry

Inference

Start a vLLM server and run the agent:
python -m vllm.entrypoints.openai.api_server \
    --model rLLM/rLLM-FinQA-4B \
    --host 0.0.0.0 \
    --port 30000 \
    --dtype bfloat16

python -m projects.finqa.run_finqa

Training

Set the required environment variables before training:
VariableDescription
OPENAI_API_KEYOpenAI API key for the reward judge
PORTKEY_API_KEYPortkey gateway key for reward judge caching

Training with verl Backend

Train the 4B model with the verl backend:
bash projects/finqa/train_finqa.sh

Training with tinker Backend

Train with LoRA on the 30B model using the tinker backend:
bash projects/finqa/train_finqa_tinker.sh
The training uses GPT-5-nano as a reward judge with Portkey gateway for caching to reduce API costs.

Implementation Details

Base Model

Dataset

  • Source: rLLM/finqa on HuggingFace
  • Size: 5,110 Q&A pairs across 207 companies
  • Tables: 6,923 tables extracted from SEC 10-K filings
  • Splits: 4,030 train / 522 validation / 558 test examples

Training Configuration

  • Algorithm: GRPO (Group Relative Policy Optimization)
  • Reward: LLM-as-judge using GPT-5-nano
  • Caching: Portkey gateway for reward caching
  • Backend: verl (default) or tinker (for LoRA)

Code Reference

Financial Agent Runner

Main script for running financial reasoning:
projects/finqa/run_finqa.py
--8<-- "projects/finqa/run_finqa.py"

Training Script

FinQA training configuration:
projects/finqa/train_finqa.py
--8<-- "projects/finqa/train_finqa.py"

Resources

Next Steps