Overview
The search agent example demonstrates:- How to use rLLM’s
ToolAgentwith search capabilities - How to write custom retrieval tools in rLLM
- Training agents on multi-hop question answering (HotpotQA)
- Combining reasoning with information retrieval
Prerequisites
- rLLM framework installed
- vLLM or SGLang for model serving
- Base model:
Qwen/Qwen3-4B(or similar) - Retrieval server for document search
- FAISS indices for Wikipedia corpus
Setup
Prepare search datasets
Download and prepare the HotpotQA dataset:This will:
- Download HotpotQA dataset from HuggingFace
- Process multi-hop QA pairs
- Register dataset with rLLM’s DatasetRegistry
Download search indices
Download pre-built FAISS indices and Wikipedia corpus:This downloads:
- Pre-built FAISS indices for E5 embeddings
- Wikipedia corpus (wiki-18.jsonl)
- Embedding model files
Start retrieval server
Install dependencies for the retrieval server:Start the retrieval server:The server will be accessible at
http://127.0.0.1:8000Running the Search Agent
Once all servers are running:Code Implementation
Custom Retrieval Tool
TheLocalRetrievalTool provides document search functionality:
Expected Behavior
The agent will:- Receive question: “What is the capital of the country where the Eiffel Tower is located?”
- First search: “Eiffel Tower location”
- Process results: Extract “France” from documents
- Second search: “capital of France”
- Process results: Extract “Paris” from documents
- Generate answer: “\boxed”
Training the Search Agent
Train your own search agent:Training Configuration
- Model: Qwen/Qwen3-4B
- Algorithm: RLOO (Reinforcement Learning with Offline Optimization)
- Training Dataset: HotpotQA train split (3000 examples)
- Evaluation Dataset: HotpotQA test split (100 examples)
- Batch Size: 64
- Learning Rate: 1e-6
- Max Steps: 10 per episode
Search System Prompt
The agent uses a specialized prompt:Multi-Hop Question Answering
HotpotQA requires multi-hop reasoning:Monitoring Training
Key metrics:| Metric | Description |
|---|---|
val/pass@1 | Answer accuracy on test set |
train/avg_searches | Average searches per question |
train/avg_steps | Average reasoning steps |
critic/score/mean | Average reward per batch |
Advanced Features
Custom Reward Function
Thesearch_reward_fn checks:
- Exact string match (case-insensitive)
- Fuzzy string matching
- Alias matching for entities
Tool Use Patterns
The agent learns:- When to search vs. when to answer
- How to formulate effective search queries
- How to synthesize information from multiple sources
Next Steps
- Try the LangGraph RAG example for SDK-based search
- Explore ToolAgent documentation
- Learn about custom tools

