


Extend the context length to 256k, is the unlimited context version of LongLLaMA coming?
In February this year, Meta released the LLaMA large-scale language model series, which successfully promoted the development of open source chat robots. Because LLaMA has fewer parameters than many previously released large models (the number of parameters ranges from 7 billion to 65 billion), but has better performance. For example, the largest LLaMA model with 65 billion parameters is comparable to Google's Chinchilla-70B and PaLM-540B. , so many researchers were excited once it was released.
However, LLaMA is only licensed for use by academic researchers, thus limiting the commercial application of the model.
Therefore, researchers began to look for those LLaMAs that could be used for commercial purposes. The project OpenLLaMA initiated by Hao Liu, a doctoral student at UC Berkeley, is one of the more popular open source copies of LLaMA. Using exactly the same preprocessing and training hyperparameters as the original LLaMA, it can be said that OpenLLaMA completely follows the training steps of LLaMA. Most importantly, the model is commercially available.
OpenLLaMA was trained on the RedPajama data set released by Together. There are three model versions, namely 3B, 7B and 13B. These models have been trained with 1T tokens. The results show that OpenLLaMA's performance is comparable to or even surpasses that of the original LLaMA in multiple tasks.
In addition to constantly releasing new models, researchers are constantly exploring the model's ability to handle tokens.
A few days ago, the latest research by Tian Yuandong’s team extended the LLaMA context to 32K with less than 1000 steps of fine-tuning. Going back further, GPT-4 supports 32k tokens (which is equivalent to 50 pages of text), Claude can handle 100k tokens (roughly equivalent to summarizing the first part of "Harry Potter" in one click) and so on.
Now, a new large-scale language model based on OpenLLaMA is coming, which extends the length of the context to 256k tokens and even more. The research was jointly completed by IDEAS NCBR, the Polish Academy of Sciences, the University of Warsaw, and Google DeepMind.
Picture
LongLLaMA is completed based on OpenLLaMA, and the fine-tuning method uses FOT (Focused Transformer). This paper shows that FOT can be used to fine-tune already existing large models to extend their context length.
The study uses the OpenLLaMA-3B and OpenLLaMA-7B models as a starting point and fine-tunes them using FOT. The resulting models, called LONGLLAMAs, are able to extrapolate beyond the length of their training context (even up to 256K) and maintain performance on short-context tasks.
- Project address: https://github.com/CStanKonrad/long_llama
- Paper address: https://arxiv. org/pdf/2307.03170.pdf
Some people describe this research as an infinite context version of OpenLLaMA. With FOT, the model can be easily extrapolated to longer sequences, such as A model trained on 8K tokens can be easily extrapolated to a 256K window size.
Picture
This article uses the FOT method, which is a plug-and-play extension in the Transformer model and can be used Train new models or fine-tune existing larger models with longer context.
To achieve this, FOT uses a memory attention layer and a cross-batch training process:
- The memory attention layer enables the model to retrieve information from external memory at inference time, effectively extending the context;
- The cross-batch training process makes the model tend to learn (key, value ) representation, these representations are very easy to use for the memory attention layer.
For an overview of the FOT architecture, see Figure 2:
Picture
The following table shows some model information of LongLLaMA:
Picture
Finally, the project also provides LongLLaMA and Comparison results of the original OpenLLaMA model.
The following figure shows some experimental results of LongLLaMA. On the password retrieval task, LongLLaMA achieved good performance. Specifically, the LongLLaMA 3B model far exceeded its training context length of 8K, achieving 94.5% accuracy for 100k tokens and 73% accuracy for 256k tokens.
Picture
The following table shows the performance of the LongLLaMA 3B model on two downstream tasks (TREC question classification and WebQS question answering) As a result, the results show that LongLLaMA performance improves significantly when using long contexts.
Picture
The table below shows that LongLLaMA performs well even on tasks that do not require long context. The experiments compare LongLLaMA and OpenLLaMA in a zero-sample setting.
Picture
For more details, please refer to the original paper and project.
The above is the detailed content of Extend the context length to 256k, is the unlimited context version of LongLLaMA coming?. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron
