SHARE
Facebook X Pinterest WhatsApp

Meta Previews Voicebox, A Generative AI Text to Speech Tool

thumbnail
Meta Previews Voicebox, A Generative AI Text to Speech Tool

Meta Platforms, the company behind Facebook and Instagram, has unveiled a generative AI text to speech tool called Voicebox AI.

Written By
thumbnail
David Curry
David Curry
Jul 17, 2023

Meta Platforms, the company behind Facebook and Instagram, have previewed a text to speech generative artificial intelligence tool, named Voicebox AI, which it claims outperforms all existing models. 

Similar to ChatGPT and DALL-E, the generative model is text based. Instead of it generating text or images however, Voicebox recreates the words in a variety of voices, alongside cutting out unwanted audio, pauses, and other audio issues.

SEE ALSO: Nearly Every Job Will be Touched by Generative AI

According to Meta, Voicebox can match an audio style from only two seconds of sample audio. It can also recreate the person’s voice in several languages, with English, French, German, Spanish, Polish, and Portuguese the first few languages Meta has added. 

In the preview video, Meta CEO Mark Zuckerberg appears to be revealing the capabilities of Voicebox. Meta did not say in the press release if it is actually Zuckerberg speaking or if the team used an audio sample, we assume it’s the latter. 

Meta has not said when it plans to make Voicebox available to the wider public. As with a lot of generative AI research projects, there is definitely a lot of ways bad actors could use this tool to commit fraud and spread misinformation. Meta has been rather adverse to launching AI tools to the general public, although this may be changing as the company has switched its focus from virtual reality and the metaverse to AI. 

“Prior to Voicebox, generative AI for speech required specific training for each task using carefully prepared training data,” said research engineer at Meta AI, Matt Le. “Voicebox uses a new approach to learn just from raw audio and an accompanying transcription. Unlike autoregressive models for audio generation, Voicebox can modify any part of a given sample, not just the end of an audio clip it is given.” 

In a research paper published at a similar time the press release, Meta AI said that Voicebox is able to generate a diverse set of audio samples twenty times faster than VALL-E, Microsoft’s own text to speech generative tool. It should be noted that both tools are not widely available, and claims made by either research team cannot be fully verified due to a lack of access. 

Outside of pranking friends, the audio editing and noise reduction tools should be valuable to audio and sound engineers, who would have previously spent hours removing noise on videos or clearing up portions of dialogue. It’s not clear how Meta would market this service to engineers however, as it is not a competitor in the media editing space. 

Text to speech does appear to be the next generative system to be taken up by the masses. Image generation and editing tools, in the form of DALL-E, Midjourney, and Stable Diffusion, were the first to break through. OpenAI’s ChatGPT was the first generative text tool to hit the public web, which has been a huge success for the AI research lab, gaining over 100 million users. 

Whether these tools will have the same broad appeal of ChatGPT remains to be seen. It looks to be more in the realm of DALL-E and other image generation tools, which could be assets for digital artists and people in media but aren’t as valuable to the wider public. 

thumbnail
David Curry

David is a technology writer with several years experience covering all aspects of IoT, from technology to networks to security.

Recommended for you...

How is AI Really Being Used?
Beyond Hallucinations: 7 Steps to Getting Accurate, Consistent, and Relevant Responses from AI
Leveraging AI and GenAI for Data-Driven Turnaround Planning
Improving Public Services with an AI Assist

Featured Resources from Cloud Data Insights

Cloud Evolution 2026: Strategic Imperatives for Chief Data Officers
Why Network Services Need Automation
The Shared Responsibility Model and Its Impact on Your Security Posture
The Role of Data Governance in ERP Systems
Sandip Roy
Nov 28, 2025
RT Insights Logo

Analysis and market insights on real-time analytics including Big Data, the IoT, and cognitive computing. Business use cases and technologies are discussed.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.