Pipeline¶

The pipeline is the core of russo. It chains three components together: Synthesizer → Agent → Evaluator.

`russo.run()`¶

The run function is the main entry point:

result = await russo.run(
    prompt="Book a flight from Berlin to Rome",
    synthesizer=my_synthesizer,
    agent=my_agent,
    evaluator=ExactEvaluator(),
    expect=[russo.tool_call("book_flight", from_city="Berlin", to_city="Rome")],
)

All arguments are keyword-only. Here's what each does:

Argument	Type	Purpose
`prompt`	`str`	Text to synthesize into audio
`synthesizer`	`Synthesizer`	Converts text → audio
`agent`	`Agent`	The agent under test (audio → tool calls)
`evaluator`	`Evaluator`	Compares expected vs actual tool calls
`expect`	`list[ToolCall]`	The tool calls you expect the agent to make

What Happens Inside¶

async def run(*, prompt, synthesizer, agent, evaluator, expect):
    audio = await synthesizer.synthesize(prompt)    # 1. Text → Audio
    response = await agent.run(audio)               # 2. Audio → AgentResponse
    return evaluator.evaluate(                      # 3. Compare
        expected=expect,
        actual=response.tool_calls,
    )

That's it. Three steps. Each step is pluggable.

The Result¶

russo.run() returns an EvalResult:

result.passed       # bool — did all expected tool calls match?
result.match_rate   # float — fraction of expected calls that matched (0.0–1.0)
result.expected     # list[ToolCall] — what you expected
result.actual       # list[ToolCall] — what the agent returned
result.matches      # list[ToolCallMatch] — per-call match details
result.summary()    # str — human-readable summary

Assertions¶

Use russo.assert_tool_calls() for rich error messages:

russo.assert_tool_calls(result)
# Raises ToolCallAssertionError with detailed diff if it fails

Or use standard assertions:

assert result.passed
assert result.match_rate >= 0.8  # at least 80% match

API Reference¶

See russo.run() for the full API docs.