Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.rllm-project.com/llms.txt

Use this file to discover all available pages before exploring further.

A multi-turn ReAct-style financial-QA agent that answers questions about SEC 10-K financial statements by querying structured tables. Four tools are exposed via native OpenAI function calling.

Pattern

AspectValue
Loop shapeMulti-turn (up to 20 tool calls per task)
Tools4: get_table_names, get_table_info, sql_query, calculator
StateProcess-wide SQLite :memory: DB pre-loaded with ~6,900 tables across 207 companies
TerminationModel emits FINAL ANSWER: block (no tool call) or hits max turns
Reward shapeJudge-LLM rubric (gpt-5-nano single-table, gpt-5-mini multi-table) + table-access bonus

Architecture

AgentFlow.run(task, config)

  └── Multi-turn loop (up to 20 turns, native OpenAI tool calls)

        ├── client.chat.completions.create(messages, tools=TOOL_SPECS)

        ├── If msg.tool_calls is empty → that's the final answer, break.

        └── Else: dispatch each tool call → append a `tool` message → repeat.
              (track table_name in `accessed_tables` whenever
               `get_table_info` is invoked, for the table-access bonus)

  └── episode.artifacts = {"answer": full_response, "accessed_tables": [...], "turns": N}

Evaluator.evaluate(task, episode)

  └── extract FINAL ANSWER from artifacts["answer"], grade via judge LLM,
      add table-access bonus, return EvalOutput.
Model Weights | Dataset

Install

uv pip install -e ".[tinker]"                      # rllm + tinker backend
uv pip install --no-deps -e cookbooks/finqa        # this cookbook
rllm agent list                                    # should show "finqa"

Dataset

python cookbooks/finqa/prepare_data.py
Downloads the rLLM/finqa tarball, extracts company tables to cookbooks/finqa/data/company_tables/, and registers finqa/{train, val, test} (4,030 / 522 / 558 rows). The data tree is large (~6,900 tables) — use FINQA_TABLES_ROOT env var to point at a shared mount if you have one.

Eval

The judge calls gpt-5-nano / gpt-5-mini directly via openai.OpenAI() — set your OPENAI_API_KEY first:
export OPENAI_API_KEY=sk-

rllm eval finqa \
    --agent finqa \
    --evaluator finqa \
    --model rLLM/rLLM-FinQA-4B \
    --base-url http://localhost:30000/v1 \
    --split test \
    --max-examples 20
If OPENAI_API_KEY is missing the evaluator silently returns reward=0 rather than crashing — useful for smoke tests without the gateway.

Training

export OPENAI_API_KEY=sk-

# Tinker (LoRA on 30B)
bash cookbooks/finqa/train_tinker.sh

# Verl (distributed multi-GPU)
bash cookbooks/finqa/train_verl.sh

Key code

The flow is the canonical multi-turn-tool-call template:
@rllm.rollout(name="finqa")
async def finqa_flow(task: Task, config: AgentConfig) -> Episode:
    client = AsyncOpenAI(base_url=config.base_url, api_key="EMPTY")
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": task.metadata.get("question")},
    ]

    accessed_tables, steps, final = [], [], ""
    for turn in range(MAX_TURNS):
        resp = await client.chat.completions.create(
            model=config.model, messages=messages, tools=TOOL_SPECS, ...,
        )
        msg = resp.choices[0].message
        messages.append(_msg_to_dict(msg))
        steps.append(Step(chat_completions=list(messages), model_response=msg.content or "", ...))

        if not msg.tool_calls:
            final = msg.content or ""
            break

        for tc in msg.tool_calls:
            output = _exec_tool_call(tc, accessed_tables)
            messages.append({"role": "tool", "tool_call_id": tc.id, "content": output})

    return Episode(
        trajectories=[Trajectory(name="finqa", steps=steps)],
        artifacts={"answer": final, "accessed_tables": accessed_tables, "turns": len(steps)},
    )
Tools are plain Python callables paired with an OpenAI function spec — no Tool base class, no registry. The 4 tools share a process-wide SQLite store loaded once at module import:
TOOL_FNS = {
    "get_table_names": get_table_names,    # list company → tables
    "get_table_info": get_table_info,      # column metadata + sample values
    "sql_query": sql_query,                 # filtered SELECT against in-memory DB
    "calculator": calculator,               # asteval safe arithmetic
}

TOOL_SPECS = [
    {"type": "function", "function": {"name": "get_table_names", ...}},
    {"type": "function", "function": {"name": "get_table_info", ...}},
    ...
]

Files

FileDescription
finqa_flow.pyMulti-turn AgentFlow with native tool calling
finqa_tools.pyThe 4 tools as plain functions + OpenAI tool specs
finqa_eval.pyJudge-LLM correctness + table-access bonus
finqa_constants.pyPath constants
prepare_data.pyHF download + DatasetRegistry registration
train.py + train_{tinker,verl}.shHydra entry points
pyproject.tomlPlugin entry-point declarations
test.py17 unit tests (calculator, tool-spec/fn alignment, FINAL ANSWER parsing, table-access scoring)
prompts/System + correctness rubric prompts

On GitHub

cookbooks/finqa

Full source, README, and runnable launch scripts