Table of Contents
Recurrent Memory Transformer
Memory Task
Experimental results
Attention pattern of memory operations
Home Technology peripherals AI True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Apr 28, 2023 am 10:37 AM
ai quantum

More than a month ago, OpenAI’s GPT-4 came out. In addition to various excellent visual demonstrations, it also implements an important update: it can handle context tokens that are 8k in length by default, but can be up to 32K (approximately 50 pages of text). This means that when asking questions to GPT-4, we can enter much longer text than before. This greatly expands the application scenarios of GPT-4 and can better handle long conversations, long texts, and file search and analysis.

However, this record was quickly broken: CoLT5 from Google Research expanded the context token length that the model can handle to 64k .

Such a breakthrough is not easy, because these models using the Transformer architecture all face a problem: Transformer processing long documents is computationally very expensive, because the attention cost increases with the input The length grows quadratically, making it increasingly difficult to apply large models to longer inputs.

Despite this, researchers are still making breakthroughs in this direction. A few days ago, a study from the open source dialogue AI technology stack DeepPavlov and other institutions showed that:By using an architecture called Recurrent Memory Transformer (RMT), they can increase the effective context length of the BERT model to 2 million tokens (approximately equivalent to 3,200 pages of text according to OpenAI’s calculation method), while maintaining high memory retrieval accuracy (Note: Recurrent Memory Transformer was proposed by Aydar Bulatov et al. in a paper at NeurIPS 2022 Methods). The new method allows the storage and processing of local and global information, and the flow of information between segments of the input sequence through the use of recurrence.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

The author stated that by using the simple token-based algorithm introduced by Bulatov et al. in the article "Recurrent Memory Transformer" Memory mechanism, they can combine RMT with pre-trained Transformer models such as BERT, and use an Nvidia GTX 1080Ti GPU to perform full attention and full precision operations on sequences of more than 1 million tokens.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Paper address: https://arxiv.org/pdf/2304.11062.pdf

However, some people have reminded that this is not a real "free lunch". The improvement of the above-mentioned papers is obtained by "longer reasoning time and substantial decrease in quality". . Therefore, it is not yet a revolution, but it may become the basis for the next paradigm (tokens may be infinitely long).

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Recurrent Memory Transformer

This study adopts the method Recurrent Memory Transformer proposed by Bulatov et al. in 2022 (RMT) and change it to a plug-and-play method. The main mechanism is as shown in the figure below:

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Long input is split into multiple segments, and a memory vector is added before the first segment embedding and processed together with the segment token. For pure encoder models like BERT, the memory is only added once at the beginning of the segment, unlike (Bulatov et al., 2022), where the pure decoder model divides the memory into read and write parts. For time step τ and segment

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

, the loop executes as follows:

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens


Where, N is the number of layers of Transformer. After forward propagation,

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

# contains the updated memory token of segment τ.

The segments of the input sequence are processed in order. To enable circular connections, the study passes the output of the memory token from the current segment to the input of the next segment:

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Both memory and looping in RMT are based only on global memory tokens. This allows the backbone Transformer to remain unchanged, making RMT's memory enhancement capabilities compatible with any Transformer model.

Computational efficiency

This study estimates the FLOPs required for RMT and Transformer models of different sizes and sequence lengths .

As shown in Figure 3 below, if the length of the segment is fixed, RMT can scale linearly for any model size. This study achieves linear scaling by dividing the input sequence into segments and computing the complete attention matrix only within segment boundaries.

Due to the high computational complexity of the FFN layer, larger Transformer models tend to exhibit slower quadratic scaling with sequence length. However, for very long sequences larger than 32000, they fall back to quadratic expansion. For sequences with more than one segment (> 512 in this study), RMT requires fewer FLOPs than acyclic models and can reduce the number of FLOPs by up to 295 times. RMT provides a greater relative reduction in FLOP for smaller models, but the 29x reduction in FLOP for the OPT-175B model is significant in absolute terms.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Memory Task

To test memory abilities, the study constructed synthetic datasets that required memorization of simple facts and basic reasoning. Task input consists of one or more facts and a question that can only be answered using all the facts. To increase the difficulty of the task, the study also added natural language text unrelated to the question or answer to act as noise, so the model was tasked with separating facts from irrelevant text and using the facts to answer the question.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Fact memory

First item The task is to test RMT's ability to write and store information in memory for long periods of time, as shown at the top of Figure 4 below. In the simplest case, the facts tend to be at the beginning of the input, and the questions are always at the end. The amount of irrelevant text between questions and answers gradually increases to the point where the entire input does not fit into a single model input.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

##Fact Detection and Memory

Fact detection increases the difficulty of the task by moving a fact to a random position in the input, as shown in the middle of Figure 4 above. This requires the model to first distinguish the fact from irrelevant text, write the fact into memory, and then use it to answer the question at the end.

Using memorized facts to reason

Another operation of memory is to reason using memorized facts and the current context. To evaluate this functionality, the researchers used a more complex task in which two facts were generated and placed within an input sequence, as shown at the bottom of Figure 4 above. The question asked at the end of the sequence is described in such a way that arbitrary facts must be used to answer the question correctly.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Experimental results

The researchers used 4 to 8 NVIDIA 1080ti GPUs to train and evaluate the model . For longer sequences, they used a single 40GB NVIDIA A100 to speed up evaluation.

Course Learning

The researchers observed that using the training plan can significantly improve the accuracy and accuracy of the solutions. stability. Initially, RMT is trained on a shorter version of the task and increases the task length by adding another segment as the training converges. The course learning process continues until the required input length is reached.

In the experiment, the researchers first started with a sequence suitable for a single segment. The actual segment size is 499, but due to BERT's 3 special tokens and 10 memory placeholders retained from the model input, the size is 512. They note that after training on a shorter task, RMT is easier to solve longer versions of the task, thanks to the fact that it uses fewer training steps to converge to a perfect solution.

Extrapolation ability

What is the generalization ability of RMT to different sequence lengths? To answer this question, the researchers evaluated models trained on different numbers of segments to solve longer tasks, as shown in Figure 5 below.

They observed that models tend to perform better on shorter tasks, with the only exception being the single-segment inference task, which becomes Very difficult to solve. One possible explanation is that because the task size exceeds one segment, the model no longer "expects" problems in the first segment, resulting in a decrease in quality.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Interestingly, RMT’s ability to generalize to longer sequences also appears as the number of training segments increases. After training on 5 or more segments, RMT can generalize nearly perfectly to tasks twice as long.

In order to test the limitations of generalization, the researchers increased the verification task size to 4096 segments or 2,043,904 tokens (as shown in Figure 1 above). RMT performed on such a long sequence Performed surprisingly well. Detection and memory tasks are the simplest, and reasoning tasks are the most complex.

Attention pattern of memory operations

In Figure 6 below, by examining the RMT attention on a specific segment, the researcher observed that the memory operation corresponds to specific attention. model. Furthermore, the high extrapolation performance on extremely long sequences in Section 5.2 demonstrates the effectiveness of the learned memory operations, even when used thousands of times.

True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens

Please refer to the original paper for more technical and experimental details.

The above is the detailed content of True Quantum Speed ​​Reading: Breaking through the limit of GPT-4 that can only understand 50 pages of text at a time, new research extends to millions of tokens. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1662
14
PHP Tutorial
1261
29
C# Tutorial
1234
24
How much is Bitcoin worth How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

What are the top currency trading platforms? The top 10 latest virtual currency exchanges What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

How to understand DMA operations in C? How to understand DMA operations in C? Apr 28, 2025 pm 10:09 PM

DMA in C refers to DirectMemoryAccess, a direct memory access technology, allowing hardware devices to directly transmit data to memory without CPU intervention. 1) DMA operation is highly dependent on hardware devices and drivers, and the implementation method varies from system to system. 2) Direct access to memory may bring security risks, and the correctness and security of the code must be ensured. 3) DMA can improve performance, but improper use may lead to degradation of system performance. Through practice and learning, we can master the skills of using DMA and maximize its effectiveness in scenarios such as high-speed data transmission and real-time signal processing.

See all articles