Community

Learn

Tools Library

AI Tools

Leisure

English

Home

Technology peripherals

AI

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming?

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming?

PHPz

Jul 11, 2023 pm 03:05 PM

ai Model

In February this year, Meta released the LLaMA large-scale language model series, which successfully promoted the development of open source chat robots. Because LLaMA has fewer parameters than many previously released large models (the number of parameters ranges from 7 billion to 65 billion), but has better performance. For example, the largest LLaMA model with 65 billion parameters is comparable to Google's Chinchilla-70B and PaLM-540B. , so many researchers were excited once it was released.

However, LLaMA is only licensed for use by academic researchers, thus limiting the commercial application of the model.

Therefore, researchers began to look for those LLaMAs that could be used for commercial purposes. The project OpenLLaMA initiated by Hao Liu, a doctoral student at UC Berkeley, is one of the more popular open source copies of LLaMA. Using exactly the same preprocessing and training hyperparameters as the original LLaMA, it can be said that OpenLLaMA completely follows the training steps of LLaMA. Most importantly, the model is commercially available.

OpenLLaMA was trained on the RedPajama data set released by Together. There are three model versions, namely 3B, 7B and 13B. These models have been trained with 1T tokens. The results show that OpenLLaMA's performance is comparable to or even surpasses that of the original LLaMA in multiple tasks.

In addition to constantly releasing new models, researchers are constantly exploring the model's ability to handle tokens.

A few days ago, the latest research by Tian Yuandong’s team extended the LLaMA context to 32K with less than 1000 steps of fine-tuning. Going back further, GPT-4 supports 32k tokens (which is equivalent to 50 pages of text), Claude can handle 100k tokens (roughly equivalent to summarizing the first part of "Harry Potter" in one click) and so on.

Now, a new large-scale language model based on OpenLLaMA is coming, which extends the length of the context to 256k tokens and even more. The research was jointly completed by IDEAS NCBR, the Polish Academy of Sciences, the University of Warsaw, and Google DeepMind.

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

LongLLaMA is completed based on OpenLLaMA, and the fine-tuning method uses FOT (Focused Transformer). This paper shows that FOT can be used to fine-tune already existing large models to extend their context length.

The study uses the OpenLLaMA-3B and OpenLLaMA-7B models as a starting point and fine-tunes them using FOT. The resulting models, called LONGLLAMAs, are able to extrapolate beyond the length of their training context (even up to 256K) and maintain performance on short-context tasks.

Project address: https://github.com/CStanKonrad/long_llama
Paper address: https://arxiv. org/pdf/2307.03170.pdf

Some people describe this research as an infinite context version of OpenLLaMA. With FOT, the model can be easily extrapolated to longer sequences, such as A model trained on 8K tokens can be easily extrapolated to a 256K window size.

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

This article uses the FOT method, which is a plug-and-play extension in the Transformer model and can be used Train new models or fine-tune existing larger models with longer context.

To achieve this, FOT uses a memory attention layer and a cross-batch training process:

The memory attention layer enables the model to retrieve information from external memory at inference time, effectively extending the context;
The cross-batch training process makes the model tend to learn (key, value ) representation, these representations are very easy to use for the memory attention layer.

For an overview of the FOT architecture, see Figure 2:

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

The following table shows some model information of LongLLaMA:

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

Finally, the project also provides LongLLaMA and Comparison results of the original OpenLLaMA model.

The following figure shows some experimental results of LongLLaMA. On the password retrieval task, LongLLaMA achieved good performance. Specifically, the LongLLaMA 3B model far exceeded its training context length of 8K, achieving 94.5% accuracy for 100k tokens and 73% accuracy for 256k tokens.

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

The following table shows the performance of the LongLLaMA 3B model on two downstream tasks (TREC question classification and WebQS question answering) As a result, the results show that LongLLaMA performance improves significantly when using long contexts.

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

The table below shows that LongLLaMA performs well even on tasks that do not require long context. The experiments compare LongLLaMA and OpenLLaMA in a zero-sample setting.

Extend the context length to 256k, is the unlimited context version of LongLLaMA coming? Picture

For more details, please refer to the original paper and project.

The above is the detailed content of Extend the context length to 256k, is the unlimited context version of LongLLaMA coming?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Show More

Hot Article

How to fix KB5055523 fails to install in Windows 11?

3 weeks ago By DDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks ago By DDD

Strength Levels for Every Enemy & Monster in R.E.P.O.

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Roblox: Dead Rails - How To Tame Wolves

3 weeks ago By DDD

Blue Prince: How To Get To The Basement

3 weeks ago By DDD

Show More

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Show More

Hot Topics

Java Tutorial

1655

14

CakePHP Tutorial

1413

52

Laravel Tutorial

1306

25

PHP Tutorial

1252

29

C# Tutorial

1226

24

Show More

Related knowledge

How much is Bitcoin worth

How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0?

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

What are the top currency trading platforms? The top 10 latest virtual currency exchanges

What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

How to use the chrono library in C?

How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

See all articles