OmniVoice Studio bundles 6 TTS engines + MCP server, local-first, FSL license

OmniVoice Studio shipped this week — a single-developer project (debpalash on GitHub) that bundles six TTS engines behind a unified local-first interface, with a built-in MCP server for programmatic access. The pitch is direct: ElevenLabs charges $5-$330/month and routes every audio file through their cloud; OmniVoice runs everything locally with zero subscription. The bundled engines are OmniVoice (default), CosyVoice 3, MLX-Audio, VoxCPM2, MOSS-TTS-Nano and KittenTTS — note these are the newer/lesser-known engines rather than the famous open-source lineup (Kokoro, F5-TTS, Bark, Coqui, ChatTTS), which means the underlying voice quality is going to vary engine-by-engine. The default OmniVoice engine claims 646 language support; transcription uses WhisperX for 99 languages.

The capability worth flagging: zero-shot voice cloning from as little as 3 seconds of reference audio, via a diffusion-based TTS conditioned on the clip. For builders who've been paying ElevenLabs for voice-clone APIs, that's the price-to-zero conversion. Hardware floor is 8GB RAM and 4GB VRAM with automatic CPU offload; recommended 16GB RAM and 8GB+ VRAM; CPU-only mode works but runs ~3× slower. Architecture: React frontend at localhost:5173, FastAPI backend at port 8000, Server-Sent Events for streaming updates, WebSocket for dictation, plus the MCP server for letting agents (Claude Code, Cursor, custom) call TTS without a separate vendor key. Repo: github.com/debpalash/OmniVoice-Studio.

The license is the load-bearing gotcha and builders need to read it before they ship. **FSL-1.1-ALv2** — Functional Source License, which permits personal, educational and research use immediately but restricts commercial use until a delay period expires (typically two years), after which the license auto-converts to Apache 2.0. This means a startup that builds a product on OmniVoice Studio today is technically out of compliance with the license terms until 2028 unless they negotiate separately with the maintainer. For internal tooling at a company (non-commercial use of the bundled tool), it's fine. For shipping a product that competes with ElevenLabs commercially, it's not yet usable. The pattern is the same as Sentry's FSL move — open source for the community, commercial protection for the originator.

Monday morning: if you're building voice agents and your current ElevenLabs bill is hurting, OmniVoice Studio is worth a local install to evaluate quality on the engines you care about. The 3-second voice clone is the demo to run first; the 646-language claim deserves spot-checking on languages you actually need. Hooking it into an existing agent via the MCP server is a one-config-flag change for anything that speaks MCP. Honest unknowns: this is a single-developer project, no production deployments cited, no quality benchmarks vs ElevenLabs published, the engine bundling means the quality bar varies per voice path, and the FSL license blocks commercial deployment until the delay period. For research, internal tooling, or evaluation, it's free and local. For shipping a product, read the license first — and watch whether the project survives the bus-factor question that all solo-dev releases carry.

OmniVoice Studio bundles 6 TTS engines + MCP server, local-first, FSL license

More News