


Meta releases audio AI model that simulates real-person speech in just 2 seconds
Recently, Meta released the Voicebox AI model, which has significant advantages in audio simulation.
It is reported that Voicebox only needs a 2-second audio sample to accurately identify the audio details and timbre, and convert it into speech output based on the text results.
Voicebox is a generative AI model that helps with audio editing, sampling, and styling.
This technology can be used to help creators easily edit audio tracks in the future. At the same time, it can also provide assistance to people with damaged vocal cords and help them "sound" again. Enables visually impaired people to hear their friends' written messages through sound, while enabling people to speak any foreign language with their own voice.
At the same time, it can also automatically fill in the missing content based on the preceding and following content of the voice clip.
According to Meta, Voicebox can provide natural and realistic voice effects for AI assistants or NPCs in the future metaverse, greatly improving the user's immersion when using it.
Voicebox’s versatility supports a variety of tasks, including:
Contextual text-to-speech synthesis: Using audio samples as short as two seconds, Voicebox can match audio styles and use them for text-to-speech generation.
Voice Editing and Noise Reduction: Voicebox can recreate parts of speech interrupted by noise or replace misspoken words without having to re-record the entire speech. For example, you can identify a segment of speech interrupted by a barking dog, crop it, and then instruct Voicebox to regenerate the segment—like an eraser for audio editing.
Cross-language conversion: When given a sample of someone's speech and a text in English, French, German, Spanish, Polish, or Portuguese, Voicebox can generate a reading of the text in any of these languages, even if the sample speech and text are different languages. In the future, people will be able to use this feature to communicate in a more natural and authentic way, even if they don't understand the languages.
Flow matching is a method used by Voicebox that has been shown to improve the performance of diffusion models. Voicebox outperforms VALL-E, the current state-of-the-art English model, in intelligibility (5.9% vs. 1.9% word error rate) and audio similarity (0.580 vs. 0.681), while being 20x faster. For cross-language style transfer, Voicebox outperforms YourTTS, reducing the average word error rate from 10.9% to 5.2% and improving audio similarity from 0.335 to 0.481.
Voicebox achieves new state-of-the-art results, outperforming Vall-E and YourTTS in word error rate.
Voicebox also achieves state-of-the-art results on audio style similarity metrics in English and multilingual benchmarks respectively.
It is worth mentioning that Meta is currently aware of the potential harm that exists when Voicebox is used in the field of counterfeiting, so they are looking for a way to distinguish between real speech and Voicebox-generated speech.
Until a solution is found, Meta will not disclose the Voicebox AI model to the public to avoid unnecessary harm.
Editor’s comment: AI has now been applied in various fields. As the first multi-functional and efficient model that successfully performs task generalization, I believe Voicebox can create a new era of speech generation AI. If Meta cannot effectively deal with audio fraud, Voicebox technology may be disabled.
The above is the detailed content of Meta releases audio AI model that simulates real-person speech in just 2 seconds. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Get your GPU ready! Llama3.1 finally appeared, but the source is not Meta official. Today, the leaked news of the new Llama large model went viral on Reddit. In addition to the basic model, it also includes benchmark results of 8B, 70B and the maximum parameter of 405B. The figure below shows the comparison results of each version of Llama3.1 with OpenAIGPT-4o and Llama38B/70B. It can be seen that even the 70B version exceeds GPT-4o on multiple benchmarks. Image source: https://x.com/mattshumer_/status/1815444612414087294 Obviously, version 3.1 of 8B and 70

Last night Meta released the Llama38B and 70B models. The Llama3 instruction-tuned model is fine-tuned and optimized for dialogue/chat use cases and outperforms many existing open source chat models in common benchmarks. For example, Gemma7B and Mistral7B. The Llama+3 model improves data and scale and reaches new heights. It was trained on more than 15T tokens of data on two custom 24K GPU clusters recently released by Meta. This training dataset is 7 times larger than Llama2 and contains 4 times more code. This brings the capability of the Llama model to the current highest level, which supports text lengths of more than 8K, twice that of Llama2. under

The Meta Connect 2024event is set for September 25 to 26, and in this event, the company is expected to unveil a new affordable virtual reality headset. Rumored to be the Meta Quest 3S, the VR headset has seemingly appeared on FCC listing. This sugge

Llama3 is here! Just now, Meta’s official website was updated and the official announced Llama 38 billion and 70 billion parameter versions. And it is an open source SOTA after its launch: Meta official data shows that the Llama38B and 70B versions surpass all opponents in their respective parameter scales. The 8B model outperforms Gemma7B and Mistral7BInstruct on many benchmarks such as MMLU, GPQA, and HumanEval. The 70B model has surpassed the popular closed-source fried chicken Claude3Sonnet, and has gone back and forth with Google's GeminiPro1.5. As soon as the Huggingface link came out, the open source community became excited again. The sharp-eyed blind students also discovered immediately

Just now, the long-awaited Llama 3.1 has been officially released! Meta officially issued a voice that "open source leads a new era." In the official blog, Meta said: "Until today, open source large language models have mostly lagged behind closed models in terms of functionality and performance. Now, we are ushering in a new era led by open source. We publicly released MetaLlama3.1405B, which we believe It is the largest and most powerful open source basic model in the world. To date, the total downloads of all Llama versions have exceeded 300 million times, and we have just begun.” Meta founder and CEO Zuckerberg also wrote an article. Long article "OpenSourceAIIsthePathForward",

Over a year has now passed from Meta's initial release of the Quest 3 (curr. $499.99 on Amazon). Since then, Apple has shipped the considerably more expensive Vision Pro, while Byte Dance has now unveiled the Pico 4 Ultra in China. However, there is

According to news on December 24, meta, a technology company with huge influence in the social media industry, is now pinning its strong expectations on augmented reality (AR) glasses, a technology considered to be the next generation computing platform. Recently, meta’s technical director Andrew Bosworth revealed in an interview that the company is expected to launch an advanced AR glasses prototype code-named “Orion” in 2024. For a long time, meta has invested in AR technology as much as in other fields. They have invested huge amounts of money, amounting to billions of dollars, aiming to create a revolutionary product comparable to the iPhone. Although last year they announced the end of mass production plans for Orion glasses,

META usually refers to a virtual world or platform called Metaverse. The metaverse is a virtual world built by humans using digital technology that mirrors or transcends the real world and can interact with the real world. It is a digital living space with a new social system.
