Architecture¶
russo is designed around a simple pipeline with pluggable components connected by protocols.
The Pipeline¶
graph TD
P["Text Prompt"]
S["Synthesizer"]
C["Cache Layer"]
AG["Agent Under Test"]
PR["Response Parser"]
E["Evaluator"]
R["EvalResult"]
P --> S
S <-.-> C
S -->|Audio| AG
AG -->|Raw Response| PR
PR -->|AgentResponse| E
E --> R
Flow¶
- Text Prompt → The natural language instruction (e.g., "Book a flight from Berlin to Rome")
- Synthesizer → Converts text to audio using a TTS provider. The cache layer intercepts here to avoid redundant API calls.
- Agent → The LLM agent under test receives the audio and returns tool calls. Agents may use a ResponseParser internally to normalize provider-specific formats.
- Evaluator → Compares expected tool calls against actual ones, producing a detailed
EvalResult.
Data Types¶
All data flows through Pydantic models:
classDiagram
class Audio {
+bytes data
+str format
+int sample_rate
+int channels
+save(path)
}
class ToolCall {
+str name
+dict arguments
}
class AgentResponse {
+list~ToolCall~ tool_calls
+Any raw
}
class EvalResult {
+bool passed
+list~ToolCall~ expected
+list~ToolCall~ actual
+list~ToolCallMatch~ matches
+match_rate() float
+summary() str
}
class ToolCallMatch {
+ToolCall expected
+ToolCall actual
+bool matched
+str details
}
AgentResponse --> ToolCall
EvalResult --> ToolCall
EvalResult --> ToolCallMatch
ToolCallMatch --> ToolCall
Design Principles¶
Protocol-based (Structural Typing)¶
russo uses typing.Protocol for all extension points. You never inherit from a base class — if your object has the right methods, it works:
# This is a valid Synthesizer — no inheritance needed
class MySynth:
async def synthesize(self, text: str) -> Audio:
...
Async-first¶
The pipeline is fully async. Synthesizers and agents are async methods, making it natural to call external APIs without blocking.
Provider-agnostic¶
The core pipeline knows nothing about Gemini, OpenAI, or any specific provider. Provider-specific logic lives in adapters and parsers.
Pydantic Models¶
All data types are Pydantic models, giving you:
- Automatic validation
- Serialization / deserialization
- Rich
reprfor debugging - Type safety
Module Layout¶
russo/
├── __init__.py # Public API surface
├── _types.py # Pydantic data models
├── _protocols.py # Protocol definitions
├── _pipeline.py # The run() function
├── _cache.py # AudioCache + CachedSynthesizer
├── _helpers.py # tool_call() + @agent decorator
├── _assertions.py # assert_tool_calls()
├── adapters/ # Agent adapters (Gemini, OpenAI, HTTP, WS)
├── synthesizers/ # TTS providers (Google)
├── evaluators/ # Matching logic (ExactEvaluator)
├── parsers/ # Response normalizers (Gemini, OpenAI)
├── report/ # Terminal + HTML reporting
├── pytest_plugin.py # pytest integration
├── cli.py # CLI runner
├── config.py # Config file loading
├── models.py # Extended models for CLI/config mode
├── interfaces.py # ABC interfaces for CLI/config mode
├── pipeline.py # CLI pipeline runner
└── registry.py # Component registry for config mode
Private modules (prefixed with _) contain the core API. Public modules contain provider-specific implementations and integrations.