Table of Contents
Token pruning: getting better, but not completely better" >Token pruning: getting better, but not completely better
TokenMerging: Another idea" >TokenMerging: Another idea
Specific method" >Specific method
Home Technology peripherals AI New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

Apr 12, 2023 am 10:58 AM
ai Model

Visual Transformer (ViT) entered the public eye two years ago and has become a core component of computer vision research.

It successfully migrated a Transformer model in the field of natural language processing to the field of computer vision. Since then, progress in the field of computer vision has accelerated.

Despite being surpassed in terms of cost and performance, Vanilla ViT still has many advantages.

They are composed of simple matrix multiplications, which makes them faster than their raw number of operations would indicate.

Additionally, they support powerful self-supervised pre-training techniques such as MAE (Masked Autoencoder) that can produce state-of-the-art results while simultaneously Quick training.

#And because they make no assumptions about the data, they can be applied to many modes such as images, audio, text, etc. with almost no changes.

Of course, the ideal is very full, but the reality is very skinny. The ViT model is large in scale and has a large delay. Running this complex model on a device with limited resources can be very problematic.

Token pruning: getting better, but not completely better

To address the problem of slow operation, researchers Multiple solutions are given. One of the common ways to speed up the vision Transformer model is to perform token pruning.

#Prune tokens at runtime to produce efficient Transformers by pruning less important tokens. For example, DynamicViT prunes redundant tokens hierarchically to reduce FLOPs in classification tasks.

However, token pruning has several problems, the most important of which is that pruning tokens will cause information loss. Therefore, people are not interested in ViT model tokens. The number of pruning is limited. In order to reduce information loss, only unimportant tokens can be pruned.

#Also, in order for the pruned token to be valid, one needs to train the model again. This results in additional resource consumption.

#More importantly, token pruning is a dynamic process, and different numbers of token pruning need to be determined based on different images or sentences. While this is good for improving accuracy, it is not practical enough because in this case the data can no longer be batch processed.

#In order to solve this problem, people need to add masks during the pruning process, which will further affect the efficiency improvement.

# Simply put, token pruning does make ViT run faster, but this is achieved at the cost of information loss.

TokenMerging: Another idea

How to make it ViT is similar in speed to pruning, but maintains higher accuracy than pruning? The Meta AI research team has come up with a new solution: Token Merging (ToMe).

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

Paper link: https://arxiv.org/pdf/2210.09461.pdf

Token Merging chooses to combine tokens instead of pruning them. Thanks to its custom matching algorithm, it is as fast as pruning while being more accurate. Plus, it works without requiring any additional training, so you can use it on huge models to speed them up without sacrificing a lot of accuracy.

The goal of Meta is to insert a Token Merging module into the existing ViT to improve the throughput of training and inference without requiring additional training by merging redundant tokens.

The basic idea is: in the Transformer model, through merging, each layer is reduced by r tokens. Suppose a Transformer model has L layers, then rL tokens can be reduced by merging. The size of the variable r determines the relationship between speed and accuracy, since fewer markers means lower accuracy but higher throughput.

#It is worth noting that in Token Merging, rL tags will be reduced regardless of the content of the image. This perfectly solves the problem of inability to perform batch processing in token pruning.

With ToMe, batches of similar tokens are merged in each Transformer block: for example, dog fur is merged into a single token.

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

Token Merging is inserted into every attention block and every Transformer block. This also contrasts with the workflow of token pruning. The latter tends to place the pruning step at the beginning of each Transformer block.

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

Through Token Merging, the information of tokens that need to be merged can be disseminated, and ViT can also use the attention block Characteristics to determine which tokens need to be merged.

Specific method

The first step of merging It is determined to be similar tokens. Under the condition that QKV (query, key, value) in Transformer has been extracted, through ablation experiments, the research team found that using key can best measure the similarity between tokens (purple part in the figure below).

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

Because key has summarized the information contained in each token so that it can be used for dot-product in Attention. Measure the similarity between tokens.

In addition to studying which indicator is better for measuring token similarity, you also need to know what distance measures similarity. Through experiments, the research team found that using cosine distance to measure the similarity between tokes can achieve the best relationship between accuracy and speed.

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

After determining the similarity of tokens, a quick method is needed to determine which tokens need to match to reduce total r.

The Meta team does not use kmeans clustering algorithm or graph segmentation algorithm, but uses a matching algorithm, because the latter can not only accurately match the number of tokens in each layer , and can quickly perform thousands of matches. These cannot be accomplished by iterative clustering algorithms.

Therefore, the Meta team came up with a more efficient solution.

The design goals are as follows. 1.) avoid any iterations that cannot be parallelized, 2.) want the merged changes to be gradual, since clustering has no limit on how many markers can be merged into a group (which may adversely affect the network), while matching Then most tags are not merged.

New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging

  1. Divide all tokens into two sets A and B of the same size.
  2. Draw an edge from each token in set A to the most similar token in set B.
  3. #Leave only the most similar r edges and delete the rest.
  4. #Fusion the edges that are still connected (features are averaged).
  5. #Put these two sets together to get the final merged result.

Through this unique technology, the throughput and actual training speed of the ViT model can be improved. Using Token Merging can double the training speed. It can be used for image, video, and audio tasks and still achieve state-of-the-art accuracy.

The above is the detailed content of New ideas for accelerating ViT models! Meta launches Token Merging, which does not rely on pruning but merging. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? Apr 21, 2025 pm 02:42 PM

WorldCoin (WLD) stands out in the cryptocurrency market with its unique biometric verification and privacy protection mechanisms, attracting the attention of many investors. WLD has performed outstandingly among altcoins with its innovative technologies, especially in combination with OpenAI artificial intelligence technology. But how will the digital assets behave in the next few years? Let's predict the future price of WLD together. The 2025 WLD price forecast is expected to achieve significant growth in WLD in 2025. Market analysis shows that the average WLD price may reach $1.31, with a maximum of $1.36. However, in a bear market, the price may fall to around $0.55. This growth expectation is mainly due to WorldCoin2.

Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Apr 21, 2025 am 08:57 AM

Factors of rising virtual currency prices include: 1. Increased market demand, 2. Decreased supply, 3. Stimulated positive news, 4. Optimistic market sentiment, 5. Macroeconomic environment; Decline factors include: 1. Decreased market demand, 2. Increased supply, 3. Strike of negative news, 4. Pessimistic market sentiment, 5. Macroeconomic environment.

What does cross-chain transaction mean? What are the cross-chain transactions? What does cross-chain transaction mean? What are the cross-chain transactions? Apr 21, 2025 pm 11:39 PM

Exchanges that support cross-chain transactions: 1. Binance, 2. Uniswap, 3. SushiSwap, 4. Curve Finance, 5. Thorchain, 6. 1inch Exchange, 7. DLN Trade, these platforms support multi-chain asset transactions through various technologies.

How to win KERNEL airdrop rewards on Binance Full process strategy How to win KERNEL airdrop rewards on Binance Full process strategy Apr 21, 2025 pm 01:03 PM

In the bustling world of cryptocurrencies, new opportunities always emerge. At present, KernelDAO (KERNEL) airdrop activity is attracting much attention and attracting the attention of many investors. So, what is the origin of this project? What benefits can BNB Holder get from it? Don't worry, the following will reveal it one by one for you.

Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Apr 21, 2025 pm 06:24 PM

Aavenomics is a proposal to modify the AAVE protocol token and introduce token repos, which has implemented a quorum for AAVEDAO. Marc Zeller, founder of the AAVE Project Chain (ACI), announced this on X, noting that it marks a new era for the agreement. Marc Zeller, founder of the AAVE Chain Initiative (ACI), announced on X that the Aavenomics proposal includes modifying the AAVE protocol token and introducing token repos, has achieved a quorum for AAVEDAO. According to Zeller, this marks a new era for the agreement. AaveDao members voted overwhelmingly to support the proposal, which was 100 per week on Wednesday

Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Rexas Finance (RXS) can surpass Solana (Sol), Cardano (ADA), XRP and Dogecoin (Doge) in 2025 Apr 21, 2025 pm 02:30 PM

In the volatile cryptocurrency market, investors are looking for alternatives that go beyond popular currencies. Although well-known cryptocurrencies such as Solana (SOL), Cardano (ADA), XRP and Dogecoin (DOGE) also face challenges such as market sentiment, regulatory uncertainty and scalability. However, a new emerging project, RexasFinance (RXS), is emerging. It does not rely on celebrity effects or hype, but focuses on combining real-world assets (RWA) with blockchain technology to provide investors with an innovative way to invest. This strategy makes it hoped to be one of the most successful projects of 2025. RexasFi

What is the analysis chart of Bitcoin finished product structure? How to draw? What is the analysis chart of Bitcoin finished product structure? How to draw? Apr 21, 2025 pm 07:42 PM

The steps to draw a Bitcoin structure analysis chart include: 1. Determine the purpose and audience of the drawing, 2. Select the right tool, 3. Design the framework and fill in the core components, 4. Refer to the existing template. Complete steps ensure that the chart is accurate and easy to understand.

What are the hybrid blockchain trading platforms? What are the hybrid blockchain trading platforms? Apr 21, 2025 pm 11:36 PM

Suggestions for choosing a cryptocurrency exchange: 1. For liquidity requirements, priority is Binance, Gate.io or OKX, because of its order depth and strong volatility resistance. 2. Compliance and security, Coinbase, Kraken and Gemini have strict regulatory endorsement. 3. Innovative functions, KuCoin's soft staking and Bybit's derivative design are suitable for advanced users.

See all articles