Google's TTS gets audio tags, but the real story is Flash Lite pricing

Google launched Gemini 3.1 Flash TTS with "audio tags" that let developers control vocal style, pacing, and delivery through natural language commands embedded in text. The model supports 70+ languages, includes SynthID watermarking, and scored 1,211 on the Artificial Analysis TTS leaderboard. It's rolling out in preview through the Gemini API, Google AI Studio, Vertex AI, and Google Vids.

The TTS release feels incremental — audio tags are essentially prompt engineering for speech synthesis, not a fundamental breakthrough. More interesting is how this fits Google's broader Gemini 3 strategy. While they're adding features to specialized models like TTS, the real action is in Flash Lite, their cheapest and fastest general model that's reshaping high-volume AI economics. Google is clearly segmenting: premium reasoning with Pro, balanced performance with Flash, and now rock-bottom pricing with Flash Lite.

What Google's announcement glosses over is the complexity creep in their pricing. As other sources note, Gemini now has five models across three service tiers with prompt-size thresholds — dozens of price combinations that make cost estimation a nightmare. The "most attractive quadrant" positioning for TTS sounds nice, but developers need calculators just to figure out their bills. Meanwhile, Flash Lite's structural cost advantages for 10M+ monthly calls suggest Google is betting on volume over margin.

For developers, the TTS tags are useful but not game-changing — you're still tweaking prompts, just with different syntax. The bigger opportunity is Flash Lite for high-throughput workloads where you don't need deep reasoning. But budget carefully: Google's multi-dimensional pricing means your costs can swing wildly based on usage patterns you might not anticipate.

Google's TTS gets audio tags, but the real story is Flash Lite pricing

More News