VibeVoice - Microsoft's Open Source Multi-Speaker Text-to-Speech AI Model for Podcasts and Long-Form Audio
🎙️ VibeVoice Multi-Speaker TTS Demo
Loading time: 3-5 seconds
Loading Demo...
Connecting to Hugging Face service
What is VibeVoice - Microsoft's Multi-Speaker TTS Model
VibeVoice is Microsoft's open-source text-to-speech (TTS) model purpose-built for multi-speaker, long-form, conversation-style audio. It can generate up to ~90 minutes of natural, turn-taking dialogue with up to four speakers, making it ideal for podcasts, audiobooks, and e-learning narration.
Powered by continuous speech tokenizers (~7.5 Hz) and a next-token diffusion decoder, VibeVoice maintains strong speaker consistency and natural prosody over long sequences. For creators, it works as a podcast voice generator, supports long text-to-speech narration, and enables multi-speaker dialogue synthesis.
The project is MIT-licensed, so you can run it locally or try it via hosted demos.