Pipeline¶
The run() function orchestrates the full russo pipeline.
_pipeline
¶
Pipeline orchestrator — chains Synthesizer -> Agent -> Evaluator.
run
async
¶
run(*, prompt: str, synthesizer: Synthesizer, agent: Agent, evaluator: Evaluator, expect: list[ToolCall]) -> EvalResult
Run the full russo pipeline.
- Synthesize audio from the text prompt.
- Pass audio to the agent under test.
- Evaluate the agent's tool calls against expectations.
| PARAMETER | DESCRIPTION |
|---|---|
prompt
|
The text prompt to synthesize into audio.
TYPE:
|
synthesizer
|
Converts text to audio.
TYPE:
|
agent
|
The agent under test.
TYPE:
|
evaluator
|
Compares expected vs actual tool calls.
TYPE:
|
expect
|
The expected tool calls.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
EvalResult
|
EvalResult with pass/fail and per-call match details. |
Source code in src/russo/_pipeline.py
run_concurrent
async
¶
run_concurrent(*, prompts: str | list[str], synthesizer: Synthesizer, agent: Agent, evaluator: Evaluator, expect: list[ToolCall], runs: int = 1, max_concurrency: int | None = None) -> BatchResult
Run the pipeline concurrently for multiple prompts and/or multiple runs.
Three scenarios
- Single prompt, N runs:
prompts="text", runs=N - Multiple prompts, 1 run each:
prompts=["a", "b", "c"] - Multiple prompts, N runs each:
prompts=["a", "b"], runs=N
| PARAMETER | DESCRIPTION |
|---|---|
prompts
|
One or more text prompts to test.
TYPE:
|
synthesizer
|
Converts text to audio.
TYPE:
|
agent
|
The agent under test.
TYPE:
|
evaluator
|
Compares expected vs actual tool calls.
TYPE:
|
expect
|
The expected tool calls (same for every prompt).
TYPE:
|
runs
|
Number of times to run each prompt (default 1).
TYPE:
|
max_concurrency
|
Cap on simultaneous pipeline runs (
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
BatchResult
|
BatchResult with per-run details and aggregate statistics. |