Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio

Comments (3)

watsonmusic · 2h ago

VibeVoice is a novel framework designed for generating expressive, long-form, multi-speaker conversational audio, such as podcasts, from text. It addresses significant challenges in traditional Text-to-Speech (TTS) systems, particularly in scalability, speaker consistency, and natural turn-taking. A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of 7.5 Hz. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details. The model can synthesize speech up to 90 minutes long with up to 4 distinct speakers, surpassing the typical 1-2 speaker limits of many prior models.

watsonmusic · 2h ago

https://huggingface.co/microsoft/VibeVoice-1.5B

watsonmusic · 2h ago

https://github.com/microsoft/VibeVoice

AI/ML Invisible Watermarking and Blockchain Timestamping (scoredetect.com)

Building an AI Agent with LangGraph (spin.atomicobject.com)

Show HN: Tweakcc – Customize Claude Code's CLI (themes, verbs, spinner) (github.com)

High rate of hallucinations observed for cricket data

Show HN: My OSS P2P file transfer tool for learning Next.js (as a C++ dev) (privydrop.app)

The Future Isn't Model Agnostic (fly.io)

How China Became an Innovation Powerhouse (economist.com)

Show HN: Free Web Dialer for U.S./Canada Toll-Free Numbers (Skype Replacement?) (tollfree.connect-ez.com)

Canadian Tech Hiring Freeze Continues (hiringlab.org)

It's Time for Americans to Start Talking About "Soft Secession" (cmarmitage.substack.com)

Scientists unlock secret to thick, stable beer foams (arstechnica.com)

Apple´s Tim Cook battle results (hugston.com)

Show HN: Simdgrep is a file grepper not written in Rust (github.com)

LibreOffice 25.8 in Windows 7 x64 ESU environment (trackerninja.codeberg.page)

Trump Media, Crypto.com to launch crypto treasury firm (reuters.com)

Security Flaws in the WebMonetization Site (shkspr.mobi)

The Coding Agent Metagame (calv.info)

Windows 7 x64 Extended Support Page (trackerninja.codeberg.page)

Show HN: A component-first approach to internationalization (github.com)

Going Viral with Product-Led "Visual Wow" Moments (iamcharliegraham.substack.com)

Targeted Wearout Attacks in Microprocessor Cores (arxiv.org)

When people do you wrong (jasonfeifer.com)

Scientists Create Molecule That Stores Energy Like Plants Do (thedebrief.org)

My mail notifier avoids interrupting me (2010) (utcc.utoronto.ca)

Framework Laptop 16 update brings Nvidia GeForce to the modular gaming laptop (arstechnica.com)

Show HN: Bringing Back Spectrum Visualizers in Stereo Systems (hackster.io)

Show HN: A zoomable, searchable archive of BYTE magazine (byte.tsundoku.io)

Show HN: Legal Eyes – Turn casual text into legalese with one click (legal-eyes.ai)

You need a kitchen slide rule (entropicthoughts.com)

FastAPI Cloud (fastapicloud.com)

I reverse-engineered a bug in my PPO agent that gave it a 9x performance boost (theprincipledagent.com)

Compress10MB – Online Video Compressor (compress10mb.com)

How to run LLMs on PC at home using Llama.cpp (theregister.com)

What's Next for Kotlin Multiplatform and Compose Multiplatform (blog.jetbrains.com)

Yes, AI is affecting employment. Here's the data (adpresearch.com)

GPT5 is the best coding LLM because other LLMs admit it?

Grok 2.5 has not been open-sourced (zdnet.com)

Wan-S2V: Audio-Driven Cinematic Video Generation (humanaigc.github.io)

Learning Deep Representations of Data Distributions (ma-lab-berkeley.github.io)

Silicon Valley is pouring millions into pro-AI PACs to sway midterms (techcrunch.com)

CDC scaled back a surveillance program for foodborne illnesses (nbcnews.com)

Multiplayer Word Game in the Browser (royale.circuitsgame.com)

Show HN: I Built a Privacy First Clipboard History Manager for macOS

AI Killed My Job: Translators (bloodinthemachine.com)

Ask HN: What are you working on (August 2025)?

Show HN: Cosmic AI Platform – Build and deploy CMS sites using natural language (cosmicjs.com)

It's good to quiver under the bar (kupajo.com)

Show HN: Ginormous News, AI generated daily global news briefings from radio (ginormous.news)

Hasselblad X2D ii 100C (hasselblad.com)

Am Fear Liath MòR (en.wikipedia.org)

Microsoft releases VibeVoice, generates 90-minute, 4-speaker audio

Comments (3)