The RolloutEngine provides an abstraction for model inference across different backends (OpenAI, Fireworks, vLLM, etc.).
RolloutEngine
Base class for all rollout engines.
from rllm.engine.rollout import RolloutEngine
Methods
get_model_response
Generate a model response for the given messages.
output = await engine.get_model_response(
messages=[{"role": "user", "content": "Hello"}],
application_id="task_0",
temperature=0.7,
max_tokens=2048
)
List of chat messages in OpenAI format.
Unique identifier for tracking requests.
Additional sampling parameters (temperature, top_p, max_tokens, etc.).
Model output containing text, tokens, and metadata.
wake_up
Initialize or warm up the engine (implementation-specific).
sleep
Shutdown or clean up the engine (implementation-specific).
ModelOutput
Dataclass containing model generation output.
from rllm.engine.rollout import ModelOutput
Fields
Complete generated text (may include reasoning).
Content portion of the response (excluding reasoning).
Reasoning or thought process (if model supports it).
List of tool calls made by the model.
Token IDs for the input prompt.
Token IDs for the completion.
Multimodal inputs (e.g., images).
Log probabilities for completion tokens.
Log probabilities for prompt tokens (aligned to prompt_ids).
Length of prompt in tokens.
Length of completion in tokens.
Reason generation stopped (“stop”, “length”, etc.).
Methods
# Serialize
output_dict = model_output.to_dict()
# Deserialize
model_output = ModelOutput.from_dict(output_dict)
OpenAIEngine
Rollout engine using OpenAI-compatible APIs.
from rllm.engine.rollout import OpenAIEngine
engine = OpenAIEngine(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
model="Qwen/Qwen3-4B"
)
Constructor
Base URL for the API endpoint.
API key for authentication.
FireworksEngine
Rollout engine using Fireworks AI API.
from rllm.engine.rollout import FireworksEngine
engine = FireworksEngine(
api_key="your_fireworks_key",
model="accounts/fireworks/models/deepseek-r1"
)
Example: Basic Usage
import asyncio
from rllm.engine.rollout import OpenAIEngine
engine = OpenAIEngine(
base_url="http://localhost:4000/v1",
api_key="EMPTY",
model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
)
async def generate():
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
]
output = await engine.get_model_response(
messages,
temperature=0.7,
max_tokens=512
)
print(f"Content: {output.content}")
print(f"Reasoning: {output.reasoning}")
print(f"Tokens: {output.completion_length}")
print(f"Finish reason: {output.finish_reason}")
asyncio.run(generate())
Example: Batch Generation
import asyncio
from rllm.engine.rollout import OpenAIEngine
engine = OpenAIEngine(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
model="Qwen/Qwen3-4B"
)
async def generate_batch():
tasks = [
[{"role": "user", "content": "What is 2+2?"}],
[{"role": "user", "content": "What is the speed of light?"}],
[{"role": "user", "content": "Who wrote Romeo and Juliet?"}]
]
# Generate concurrently
results = await asyncio.gather(*[
engine.get_model_response(messages, application_id=f"task_{i}")
for i, messages in enumerate(tasks)
])
for i, output in enumerate(results):
print(f"\nTask {i}:")
print(f"Response: {output.content}")
asyncio.run(generate_batch())