Home Technology peripherals AI Meta releases audio AI model that simulates real-person speech in just 2 seconds

Meta releases audio AI model that simulates real-person speech in just 2 seconds

Jun 21, 2023 pm 03:20 PM
meta audio ai Voice simulation

Recently, Meta released the Voicebox AI model, which has significant advantages in audio simulation.

It is reported that Voicebox only needs a 2-second audio sample to accurately identify the audio details and timbre, and convert it into speech output based on the text results.

Meta releases audio AI model that simulates real-person speech in just 2 seconds

Voicebox is a generative AI model that helps with audio editing, sampling, and styling.

This technology can be used to help creators easily edit audio tracks in the future. At the same time, it can also provide assistance to people with damaged vocal cords and help them "sound" again. Enables visually impaired people to hear their friends' written messages through sound, while enabling people to speak any foreign language with their own voice.

At the same time, it can also automatically fill in the missing content based on the preceding and following content of the voice clip.

According to Meta, Voicebox can provide natural and realistic voice effects for AI assistants or NPCs in the future metaverse, greatly improving the user's immersion when using it.

Voicebox’s versatility supports a variety of tasks, including:

Contextual text-to-speech synthesis: Using audio samples as short as two seconds, Voicebox can match audio styles and use them for text-to-speech generation.

Voice Editing and Noise Reduction: Voicebox can recreate parts of speech interrupted by noise or replace misspoken words without having to re-record the entire speech. For example, you can identify a segment of speech interrupted by a barking dog, crop it, and then instruct Voicebox to regenerate the segment—like an eraser for audio editing.

Cross-language conversion: When given a sample of someone's speech and a text in English, French, German, Spanish, Polish, or Portuguese, Voicebox can generate a reading of the text in any of these languages, even if the sample speech and text are different languages. In the future, people will be able to use this feature to communicate in a more natural and authentic way, even if they don't understand the languages.

Flow matching is a method used by Voicebox that has been shown to improve the performance of diffusion models. Voicebox outperforms VALL-E, the current state-of-the-art English model, in intelligibility (5.9% vs. 1.9% word error rate) and audio similarity (0.580 vs. 0.681), while being 20x faster. For cross-language style transfer, Voicebox outperforms YourTTS, reducing the average word error rate from 10.9% to 5.2% and improving audio similarity from 0.335 to 0.481.

Meta releases audio AI model that simulates real-person speech in just 2 seconds

Voicebox achieves new state-of-the-art results, outperforming Vall-E and YourTTS in word error rate.

Meta releases audio AI model that simulates real-person speech in just 2 seconds

Voicebox also achieves state-of-the-art results on audio style similarity metrics in English and multilingual benchmarks respectively.

It is worth mentioning that Meta is currently aware of the potential harm that exists when Voicebox is used in the field of counterfeiting, so they are looking for a way to distinguish between real speech and Voicebox-generated speech.

Until a solution is found, Meta will not disclose the Voicebox AI model to the public to avoid unnecessary harm.

Editor’s comment: AI has now been applied in various fields. As the first multi-functional and efficient model that successfully performs task generalization, I believe Voicebox can create a new era of speech generation AI. If Meta cannot effectively deal with audio fraud, Voicebox technology may be disabled.

The above is the detailed content of Meta releases audio AI model that simulates real-person speech in just 2 seconds. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1653
14
PHP Tutorial
1251
29
C# Tutorial
1224
24
The first open source model to surpass GPT4o level! Llama 3.1 leaked: 405 billion parameters, download links and model cards are available The first open source model to surpass GPT4o level! Llama 3.1 leaked: 405 billion parameters, download links and model cards are available Jul 23, 2024 pm 08:51 PM

Get your GPU ready! Llama3.1 finally appeared, but the source is not Meta official. Today, the leaked news of the new Llama large model went viral on Reddit. In addition to the basic model, it also includes benchmark results of 8B, 70B and the maximum parameter of 405B. The figure below shows the comparison results of each version of Llama3.1 with OpenAIGPT-4o and Llama38B/70B. It can be seen that even the 70B version exceeds GPT-4o on multiple benchmarks. Image source: https://x.com/mattshumer_/status/1815444612414087294 Obviously, version 3.1 of 8B and 70

Six quick ways to experience the newly released Llama 3! Six quick ways to experience the newly released Llama 3! Apr 19, 2024 pm 12:16 PM

Last night Meta released the Llama38B and 70B models. The Llama3 instruction-tuned model is fine-tuned and optimized for dialogue/chat use cases and outperforms many existing open source chat models in common benchmarks. For example, Gemma7B and Mistral7B. The Llama+3 model improves data and scale and reaches new heights. It was trained on more than 15T tokens of data on two custom 24K GPU clusters recently released by Meta. This training dataset is 7 times larger than Llama2 and contains 4 times more code. This brings the capability of the Llama model to the current highest level, which supports text lengths of more than 8K, twice that of Llama2. under

New affordable Meta Quest 3S VR headset appears on FCC, suggesting imminent launch New affordable Meta Quest 3S VR headset appears on FCC, suggesting imminent launch Sep 04, 2024 am 06:51 AM

The Meta Connect 2024event is set for September 25 to 26, and in this event, the company is expected to unveil a new affordable virtual reality headset. Rumored to be the Meta Quest 3S, the VR headset has seemingly appeared on FCC listing. This sugge

Llama3 comes suddenly! The open source community is boiling again: the era of free access to GPT4-level models has arrived Llama3 comes suddenly! The open source community is boiling again: the era of free access to GPT4-level models has arrived Apr 19, 2024 pm 12:43 PM

Llama3 is here! Just now, Meta’s official website was updated and the official announced Llama 38 billion and 70 billion parameter versions. And it is an open source SOTA after its launch: Meta official data shows that the Llama38B and 70B versions surpass all opponents in their respective parameter scales. The 8B model outperforms Gemma7B and Mistral7BInstruct on many benchmarks such as MMLU, GPQA, and HumanEval. The 70B model has surpassed the popular closed-source fried chicken Claude3Sonnet, and has gone back and forth with Google's GeminiPro1.5. As soon as the Huggingface link came out, the open source community became excited again. The sharp-eyed blind students also discovered immediately

The strongest model Llama 3.1 405B is officially released, Zuckerberg: Open source leads a new era The strongest model Llama 3.1 405B is officially released, Zuckerberg: Open source leads a new era Jul 24, 2024 pm 08:23 PM

Just now, the long-awaited Llama 3.1 has been officially released! Meta officially issued a voice that "open source leads a new era." In the official blog, Meta said: "Until today, open source large language models have mostly lagged behind closed models in terms of functionality and performance. Now, we are ushering in a new era led by open source. We publicly released MetaLlama3.1405B, which we believe It is the largest and most powerful open source basic model in the world. To date, the total downloads of all Llama versions have exceeded 300 million times, and we have just begun.” Meta founder and CEO Zuckerberg also wrote an article. Long article "OpenSourceAIIsthePathForward",

Analyst discusses launch pricing for rumoured Meta Quest 3S VR headset Analyst discusses launch pricing for rumoured Meta Quest 3S VR headset Aug 27, 2024 pm 09:35 PM

Over a year has now passed from Meta's initial release of the Quest 3 (curr. $499.99 on Amazon). Since then, Apple has shipped the considerably more expensive Vision Pro, while Byte Dance has now unveiled the Pico 4 Ultra in China. However, there is

It is expected that in 2024, Meta plans to launch a revolutionary AR glasses prototype called 'Orion' It is expected that in 2024, Meta plans to launch a revolutionary AR glasses prototype called 'Orion' Jan 04, 2024 pm 09:35 PM

According to news on December 24, meta, a technology company with huge influence in the social media industry, is now pinning its strong expectations on augmented reality (AR) glasses, a technology considered to be the next generation computing platform. Recently, meta’s technical director Andrew Bosworth revealed in an interview that the company is expected to launch an advanced AR glasses prototype code-named “Orion” in 2024. For a long time, meta has invested in AR technology as much as in other fields. They have invested huge amounts of money, amounting to billions of dollars, aiming to create a revolutionary product comparable to the iPhone. Although last year they announced the end of mass production plans for Orion glasses,

What does META mean? What does META mean? Mar 05, 2024 pm 12:18 PM

META usually refers to a virtual world or platform called Metaverse. The metaverse is a virtual world built by humans using digital technology that mirrors or transcends the real world and can interact with the real world. It is a digital living space with a new social system.

See all articles