


Multimodality unified again! Meta releases self-supervised algorithm data2vec 2.0: training efficiency increased by up to 16 times!
Most breakthroughs in the field of artificial intelligence in recent years have been driven by self-supervised learning, such as the MLM (Masked Language Model) proposed in BERT, which re-predicts some words in the text by masking them , allowing massive unlabeled text data to be used to train models, and has since opened a new era of large-scale pre-training models. However, self-supervised learning algorithms also have obvious limitations. They are usually only suitable for data in a single modality (such as images, text, speech, etc.), and require a lot of computing power to learn from massive data. In contrast, humans learn significantly more efficiently than current AI models and can learn from different types of data.
##In January 2022, Meta AI released the self-supervised learning framework data2vec , integrating the three modal data (voice, visual and text) through a framework, there is a trend of unifying multi-modality. Recently Meta AI released data2cec 2.0 version , mainly improves the previous generation in terms of performance: with the same accuracy, the training speed is up to 16 times higher than other algorithms!
#Paper link: https://ai.facebook.com/research/publications/e fficient-self-supervised-learning-with-contextualized-target-representations-for-vision-speech-and-language
Code link:https://github.com/facebookresearch/fairseq/tree/main/examples/data2vecdata2vec 1.0
Currently Said that most machine learning models are still based on supervised learning models, which require specialized annotators to label target data. However, for some tasks (such as thousands of human languages on the earth), collecting labeled data is not feasible.
In contrast, self-supervised learning does not need to tell the model what is right and wrong, but allows the machine to learn images by observing the world , the structure of speech and text. Related research results have promoted speech (e.g., wave2vec 2.0), computer vision (e.g., masked autoencoders), and natural language processing (e.g., BERT) and other fields.
The main idea of data2vec is to first build a teacher network and first calculate the target representation from images, text or speech. The data is then masked to obscure parts of the input, and the process is repeated with a student network to predict the representations obtained by the teacher model.
In other words, the student model can only predict the representation of "complete input data" while accepting "incomplete input information". In order to ensure the consistency of the two models, the parameters of the two models are shared, but the parameters of the Teacher model will be updated faster in the early stages of training. In terms of experimental results, data2vec has significantly improved performance compared to the baseline model on speech, vision, text and other tasks. data2vec proposes a general The self-supervised learning framework unifies the learning of three modal data: speech, vision and language. The main pain point solved by data2vec2.0 is that building a self-supervised model requires a large amount of GPU computing power to complete the training. Similar to the original data2vec algorithm, data2vec 2.0 predicts contextualized representations of data, or layers of neural networks, rather than predicting pixels in images, words in text segments, or speech. Unlike other common algorithms, these so-called target representations are contextual, which means that the algorithm needs to The entire training example is taken into account. For example, the model learns the representation of the word bank based on the entire sentence containing bank, making it easier to deduce the correct meaning of the word. , such as distinguishing whether it specifically refers to a "financial institution" or "land by the river." The researchers believe that contextualized goals will facilitate richer learning tasks and enable data2vec 2.0 to learn faster than other algorithms. ##data2vec 2.0 improves the efficiency of the original data2vec algorithm in the following three ways: 1. Construct a target representation for a specific training example and reuse the representation on the masked version. In the masked version, different parts of the training examples are randomly hidden. The representations learned by both versions are then fed into the student model, which predicts the same contextualized target representation for different mask versions, effectively amortizing the computational effort required to create the target representation. 2. Similar to the masked autoencoder (MAE), the encoder network in the student model does not work. The blank parts of the training examples (blanked out). In the image experiments, approximately 80% of the sections were blank, resulting in significant computational cycle savings. 3. Use a more effective decoder model that no longer relies on the Transformer network, but relies on a multi-layer Convolutional network. In order to more intuitively understand how much more efficient data2vec 2.0 is than data2vec and other similar algorithms, researchers in Computer Vision Extensive experiments are conducted on benchmarks related to , speech and text tasks. In the experiment, the final accuracy and the time required for pre-training the model were mainly considered. The experimental environment was on the same hardware (GPU model, number, etc.) to measure the running speed of the algorithm. On computer vision tasks, the researchers evaluated data2vec 2.0 on the standard ImageNet-1K image classification benchmark, the dataset from which the model can learn Image representation. Experimental results show that data2vec 2.0 can equal the accuracy of masked autoencoder (MAE), but is 16 times faster. If you continue to give the data2vec 2.0 algorithm more running time, it can achieve higher accuracy and still be faster than MAE. On the speech task, the researchers tested it on the LibriLanguage speech recognition benchmark, and it was more than 11 times more accurate than wave2vec 2.0. For natural language processing tasks, the researchers evaluated data2vec 2.0 on the General Language Understanding Evaluation (GLUE) benchmark, requiring only half the training time It can achieve the same accuracy as RoBERTa, a reimplementation of BERT. ##
data2vec 2.0
The above is the detailed content of Multimodality unified again! Meta releases self-supervised algorithm data2vec 2.0: training efficiency increased by up to 16 times!. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.
