Microsoft Brings Native Audio Creation to Copilot With New Expressive Voice Options

Microsoft has introduced a new native audio generation feature to its Copilot platform, expanding its AI capabilities beyond text and images. With this update, users can now provide Copilot with a written script and have it converted into a natural-sounding AI voiceover in different expressive styles. Unlike traditional text-to-speech tools, Microsoft claims its system delivers audio that feels more authentic and less robotic. This breakthrough is powered by the company’s in-house MAI-Voice-1 AI model, first unveiled in late August.

The announcement was made by Mustafa Suleyman, CEO of Microsoft AI, through a post on X (formerly Twitter). Suleyman highlighted that the audio generation feature is currently available through Copilot Labs, but only to users signing in with a personal Microsoft account. The move signals Microsoft’s intention to test the feature on a smaller scale before rolling it out more broadly across its ecosystem of apps and services.

At launch, Copilot offers users three distinct voice modes. The first, Scripted mode, delivers a straightforward and literal read of the input text, making it well-suited for use cases such as formal announcements, e-learning, and document narration. This ensures a professional and clear tone without unnecessary dramatization.

The second mode, called Emotive, is designed to add flair and expression. By varying pitch, tone, and pacing, it creates a more dynamic and engaging delivery that feels closer to human storytelling. Microsoft says this style is best for marketing, advertising, or entertainment contexts where dramatic impact matters. A third style, which Suleyman has hinted at but not fully detailed yet, is expected to further broaden Copilot’s voice versatility, giving users additional creative options.