Synthesizers¶

Synthesizers convert text prompts into audio. This is the first step of the russo pipeline.

Protocol¶

class Synthesizer(Protocol):
    async def synthesize(self, text: str) -> Audio: ...

Built-in: Google TTS¶

from russo.synthesizers import GoogleSynthesizer

synth = GoogleSynthesizer(
    api_key="...",
    voice="Kore",                              # optional
    model="gemini-2.5-flash-preview-tts",      # optional
)

audio = await synth.synthesize("Book a flight from Berlin to Rome")
audio.save("output.wav")  # save to file

Authentication Modes¶

API Key (Google AI)Vertex AI

synth = GoogleSynthesizer(api_key="AIza...")

synth = GoogleSynthesizer(
    project="my-gcp-project",
    location="us-central1",
)

Custom Synthesizers¶

Implement the protocol — no inheritance needed:

class ElevenLabsSynthesizer:
    def __init__(self, api_key: str, voice_id: str = "default"):
        self.api_key = api_key
        self.voice_id = voice_id

    async def synthesize(self, text: str) -> russo.Audio:
        # Call ElevenLabs API
        audio_bytes = await eleven_labs_tts(text, self.voice_id, self.api_key)
        return russo.Audio(data=audio_bytes, format="mp3")

Caching¶

Wrap any synthesizer with CachedSynthesizer to avoid repeated TTS calls:

from russo import CachedSynthesizer

cached = CachedSynthesizer(
    GoogleSynthesizer(api_key="..."),
    cache_key_extra={"voice": "Kore"},  # invalidate cache on config change
)

See Caching for details.

API Reference¶

See GoogleSynthesizer for the full API docs.