Skip to content

Types

Core data types for russo. All types are Pydantic models.

_types

Core data types for russo.

All data flowing through the pipeline is a Pydantic model, giving us validation, serialization, and rich repr for free.

Audio

Bases: BaseModel

Audio data with format metadata.

save

save(path: str | Path) -> Path

Save audio to a file. Wraps raw PCM in a WAV container if needed.

Usage

audio.save("output.wav")

Source code in src/russo/_types.py
def save(self, path: str | Path) -> Path:
    """Save audio to a file. Wraps raw PCM in a WAV container if needed.

    Usage:
        audio.save("output.wav")
    """
    import wave

    p = Path(path)
    p.parent.mkdir(parents=True, exist_ok=True)

    if p.suffix.lower() == ".wav":
        with wave.open(str(p), "wb") as wf:
            wf.setnchannels(self.channels)
            wf.setsampwidth(self.sample_width)
            wf.setframerate(self.sample_rate)
            wf.writeframes(self.data)
    else:
        # For non-WAV formats, write raw bytes
        p.write_bytes(self.data)
    return p

ToolCall

Bases: BaseModel

A normalized tool/function call representation.

Provider-agnostic — parsers convert provider-specific formats into this.

AgentResponse

Bases: BaseModel

Normalized response from an agent, containing extracted tool calls.

raw class-attribute instance-attribute

raw: Any | None = None

The raw, unparsed response from the provider (for debugging).

ToolCallMatch

Bases: BaseModel

Result of comparing a single expected tool call against actuals.

EvalResult

Bases: BaseModel

Full evaluation result for a test scenario.

match_rate property

match_rate: float

Fraction of expected tool calls that matched.

summary

summary() -> str

Human-readable summary of the evaluation.

Source code in src/russo/_types.py
def summary(self) -> str:
    """Human-readable summary of the evaluation."""
    status = "PASSED" if self.passed else "FAILED"
    lines = [f"{status} ({self.match_rate:.0%} match rate)"]
    for m in self.matches:
        icon = "+" if m.matched else "-"
        actual_str = f" -> {m.actual.name}({m.actual.arguments})" if m.actual else " -> (no match)"
        lines.append(f"  [{icon}] {m.expected.name}({m.expected.arguments}){actual_str}")
        if m.details:
            lines.append(f"      {m.details}")
    return "\n".join(lines)

SingleRunResult

Bases: BaseModel

Result of a single pipeline run within a batch.

BatchResult

Bases: BaseModel

Aggregated results from running the pipeline multiple times.

Covers three scenarios: - Single prompt, N runs (reliability testing) - Multiple prompts, 1 run each (variant testing) - Multiple prompts, N runs each (full matrix)

passed property

passed: bool

True only if every single run passed.

pass_rate property

pass_rate: float

Fraction of runs that passed.

match_rate property

match_rate: float

Average match rate across all runs.

summary

summary() -> str

Human-readable summary grouped by prompt.

Source code in src/russo/_types.py
def summary(self) -> str:
    """Human-readable summary grouped by prompt."""
    status = "PASSED" if self.passed else "FAILED"
    lines = [
        f"{status} ({self.pass_rate:.0%} pass rate, {self.total} runs)",
        f"  Passed: {self.passed_count}/{self.total}",
    ]

    prompts: dict[str, list[SingleRunResult]] = {}
    for r in self.runs:
        prompts.setdefault(r.prompt, []).append(r)

    for prompt, results in prompts.items():
        prompt_passed = sum(1 for r in results if r.eval_result.passed)
        lines.append(f"  Prompt: {prompt!r}")
        lines.append(f"    {prompt_passed}/{len(results)} passed")
        for r in results:
            icon = "+" if r.eval_result.passed else "-"
            lines.append(f"    [{icon}] run {r.run_index}: {r.eval_result.match_rate:.0%} match")

    return "\n".join(lines)