Table of Contents
Understanding of the main architecture and tasks
Expansion Law and Efficiency Improvement
Summary​
Home Technology peripherals AI For a comprehensive understanding of large language models, here is a reading list

For a comprehensive understanding of large language models, here is a reading list

Mar 31, 2023 pm 10:40 PM
ai Model

To learn about the design, constraints, and evolution behind contemporary large-scale language models, you can follow this article's reading list.

Large-scale language models have captured the public's attention, and in just five years, models such as Transforme have almost completely changed the field of natural language processing. Additionally, they are starting to revolutionize fields such as computer vision and computational biology.

Given that Transformers have such a big impact on everyone’s research process, this article will introduce you to a short reading list for machine learning researchers and practitioners to get started.

The following list is mainly expanded in chronological order, mainly some academic research papers. Of course, there are many other helpful resources. For example:

  • "The Illustrated Transformer" written by Jay Alammar
  • "The Transformer Family" written by Lilian Weng
  • "Transformer models: an introduction and catalog — 2023 Edition》
  • nanoGPT library written by Andrej Karpathy

Understanding of the main architecture and tasks

If you are new to Transformers and large language models , then these articles are best for you.

Paper 1: "Neural Machine Translation by Jointly Learning to Align and Translate"

For a comprehensive understanding of large language models, here is a reading list

Paper address: https:// arxiv.org/pdf/1409.0473.pdf

This article introduces a recurrent neural network (RNN) attention mechanism to improve the model's long-range sequence modeling capabilities. This enables RNNs to more accurately translate longer sentences - the motivation behind the development of the original Transformer architecture.

For a comprehensive understanding of large language models, here is a reading list

Image source: https://arxiv.org/abs/1409.0473

Paper 2: "Attention Is All You Need 》

For a comprehensive understanding of large language models, here is a reading list

Paper address: https://arxiv.org/abs/1706.03762

This article introduces the composition of encoder and decoder The original Transformer architecture, these parts will be introduced as separate modules later. In addition, this article introduces concepts such as scaling dot product attention mechanisms, multi-head attention blocks, and positional input encoding, which are still the foundation of modern Transformers.

For a comprehensive understanding of large language models, here is a reading list

Source: https://arxiv.org/abs/1706.03762

Paper 3: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding》

For a comprehensive understanding of large language models, here is a reading list

Paper address: https://arxiv.org/abs/1810.04805

Large-scale language model research followed the initial Transformer architecture and then began to extend in two directions: Transformer for predictive modeling tasks (such as text classification) and Transformer for generative modeling tasks (such as translation, summarization, and other forms of text creation) Transformer.

The BERT paper introduces the original concept of masked language modeling. If you are interested in this research branch, you can follow up with RoBERTa, which simplifies the pre-training objectives.

For a comprehensive understanding of large language models, here is a reading list

Image source: https://arxiv.org/abs/1810.04805

Paper 4: "Improving Language Understanding by Generative Pre-Training》

For a comprehensive understanding of large language models, here is a reading list

Paper address: https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative- Radford-Narasimhan/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035

The original GPT paper introduced the popular decoder-style architecture and pre-training with next word prediction. BERT can be considered a two-way Transformer due to its masked language model pre-training goal, while GPT is a one-way autoregressive model. Although GPT embeddings can also be used for classification, GPT methods are at the core of today's most influential LLMs such as ChatGPT.

If you are interested in this research branch, you can follow up on the GPT-2 and GPT-3 papers. In addition, this article will introduce the InstructGPT method separately later.

For a comprehensive understanding of large language models, here is a reading list

Paper 5: "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension"

For a comprehensive understanding of large language models, here is a reading list

Paper address https://arxiv.org/abs/1910.13461.

As mentioned above, BERT-type encoder-style LLM is usually the first choice for predictive modeling tasks, while GPT Decoder-style LLMs are better at generating text. To get the best of both worlds, the BART paper above combines the encoder and decoder parts.

For a comprehensive understanding of large language models, here is a reading list

Expansion Law and Efficiency Improvement

If you want to know more about techniques to improve Transformer efficiency, you can refer to the following papers

  • Paper 1: "A Survey on Efficient Training of Transformers"​
  • Paper address: https://arxiv.org/abs/2302.01107​


  • Paper 2: "FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness"
  • Paper address: https://arxiv.org/abs/2205.14135​


  • Paper 3: "Cramming: Training a Language Model on a Single GPU in One Day"
  • Paper address: https://arxiv .org/abs/2212.14034


  • Paper 4: "Training Compute-Optimal Large Language Models"
  • Paper address: https: //arxiv.org/abs/2203.15556

In addition, there is also the paper "Training Compute-Optimal Large Language Models"

Paper address: https://arxiv.org/abs /2203.15556

This article introduces the 70 billion parameter Chinchilla model, which outperforms the popular 175 billion parameter GPT-3 model on generative modeling tasks. However, its main highlight is that contemporary large-scale language models are severely undertrained.

This article defines linear scaling law for large language model training. For example, although Chinchilla is half the size of GPT-3, it outperforms GPT-3 because it is trained on 1.4 trillion (instead of 300 billion) tokens. In other words, the number of training tokens is as important as the model size.

For a comprehensive understanding of large language models, here is a reading list

Alignment - guiding large language models toward desired goals and interests

In recent years, many relatively powerful methods have emerged Large language models that can generate real text (e.g. GPT-3 and Chinchilla). In terms of commonly used pre-training paradigms, it seems that an upper limit has been reached.

In order to make the language model more helpful to humans and reduce misinformation and bad language, researchers have designed additional training paradigms to fine-tune the pre-trained basic model, including the following papers.

  • Paper 1: "Training Language Models to Follow Instructions with Human Feedback"
  • Paper address: https://arxiv.org/abs/2203.02155

In this so-called InstructGPT paper, researchers used RLHF (Reinforcement Learning from Human Feedback). They started with a pretrained GPT-3 base model and further fine-tuned it on human-generated cue-response pairs using supervised learning (step 1). Next, they asked humans to rank the model outputs to train the reward model (step 2). Finally, they use the reward model to update the pretrained and fine-tuned GPT-3 model using reinforcement learning via proximal policy optimization (step 3).

By the way, this paper is also known as the paper that describes the ideas behind ChatGPT - according to recent rumors, ChatGPT is an extended version of InstructGPT that is fine-tuned on a larger dataset.

For a comprehensive understanding of large language models, here is a reading list

  • Paper 2: "Constitutional AI: Harmlessness from AI Feedback"
  • Paper address: https://arxiv.org/abs/2212.08073

In this article In the paper, the researchers further advance the idea of ​​alignment and propose a training mechanism to create a "harmless" AI system. The researchers proposed a self-training mechanism based on a list of rules (provided by humans), rather than direct human supervision. Similar to the InstructGPT paper mentioned above, the proposed method uses reinforcement learning methods.

For a comprehensive understanding of large language models, here is a reading list

Summary​

This article tries to keep the arrangement of the form above as simple and beautiful as possible. It is recommended to focus on the first 10 papers to understand the ideas behind contemporary large-scale language models. Design, limitations and evolution.

If you want to read in depth, it is recommended to refer to the references in the above paper. Alternatively, here are some additional resources for readers to explore further:

Open Source Alternatives to GPT

  • Paper 1: "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model》
  • Paper address: https://arxiv.org/abs/2211.05100


  • Paper 2 : "OPT: Open Pre-trained Transformer Language Models"
  • Paper address: https://arxiv.org/abs/2205.01068

ChatGPT alternative

  • Paper 1 "LaMDA: Language Models for Dialog Applications"
  • Paper address: https://arxiv.org/abs/2201.08239


  • Paper 2: "Improving alignment of dialogue agents via targeted human judgments"
  • Paper address: https://arxiv.org/abs/2209.14375


  • Paper 3: "BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage"
  • Paper address: https://arxiv. org/abs/2208.03188

Large-scale language models in computational biology

  • Paper 1: "ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Learning》
  • Paper address: https://arxiv.org/abs/2007.06225


  • Paper 2: "Highly accurate protein structure prediction with AlphaFold"
  • Paper address: https://www.nature.com/articles/s41586-021-03819-2


  • Paper 3: "Large Language Models Generate Functional Protein Sequences Across Diverse Families"
  • Paper address: https://www.nature.com/articles/s41587-022-01618- 2

The above is the detailed content of For a comprehensive understanding of large language models, here is a reading list. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1655
14
PHP Tutorial
1252
29
C# Tutorial
1226
24
Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

How much is Bitcoin worth How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

What are the top currency trading platforms? The top 10 latest virtual currency exchanges What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

How to use the chrono library in C? How to use the chrono library in C? Apr 28, 2025 pm 10:18 PM

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

See all articles