russo¶
Testing framework for LLM tool-call accuracy — audio & text
Documentation: https://mohit2152sharma.github.io/russo
Source Code: https://github.com/mohit2152sharma/russo
russo is a testing framework for verifying that LLM agents make the correct tool calls when given audio (or text) input. Think of it as pytest for voice AI tool-calling accuracy.
Why russo?¶
Voice AI agents powered by LLMs increasingly use tool calling (function calling) to take actions — booking flights, controlling smart homes, querying databases. But how do you verify the agent calls the right tool with the right arguments when it hears a spoken command?
russo solves this with a simple pipeline:
Key Features¶
- Provider-agnostic — works with Gemini, OpenAI, or any custom agent via structural typing (protocols)
- Audio-first — synthesize text prompts to audio, send to your agent, evaluate tool calls
- pytest integration — use markers, fixtures, and familiar test patterns
- Built-in caching — skip TTS on repeated runs, saving time and money
- Extensible — swap synthesizers, agents, evaluators, and parsers without inheritance
Quick Example¶
import russo
from russo.synthesizers import GoogleSynthesizer
from russo.adapters import GeminiLiveAgent
from russo.evaluators import ExactEvaluator
result = await russo.run(
prompt="Book a flight from Berlin to Rome for tomorrow",
synthesizer=GoogleSynthesizer(api_key="..."),
agent=GeminiLiveAgent(api_key="...", tools=[...]),
evaluator=ExactEvaluator(),
expect=[
russo.tool_call("book_flight", from_city="Berlin", to_city="Rome"),
],
)
assert result.passed
Or with pytest:
import pytest
import russo
@pytest.mark.russo(
prompt="Book a flight from NYC to LA",
expect=[russo.tool_call("book_flight", from_city="NYC", to_city="LA")],
)
async def test_book_flight(russo_result):
russo.assert_tool_calls(russo_result)
Architecture¶
russo uses a protocol-based design. You never need to inherit from base classes — if your object has the right methods, it works:
graph LR
A[Text Prompt] --> B[Synthesizer]
B --> C[Audio]
C --> D[Agent]
D --> E[AgentResponse]
E --> F[Evaluator]
F --> G[EvalResult]
| Protocol | Method | Purpose |
|---|---|---|
Synthesizer |
async synthesize(text) → Audio |
Convert text to audio |
Agent |
async run(audio) → AgentResponse |
Run the agent under test |
Evaluator |
evaluate(expected, actual) → EvalResult |
Compare tool calls |
ResponseParser |
parse(raw) → AgentResponse |
Normalize provider responses |
Next Steps¶
- Installation — install russo and optional dependencies
- First Test — write your first tool-call test
- Tutorial — deep dive into every component
- API Reference — full API documentation