


After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go
The popularity of ChatGPT and GPT-4 has brought large-scale language models to their highlight moment so far. But where to go next?
Yann LeCun recently participated in a study that pointed out that enhancing language models may be a promising direction.
This is a review article. This article will briefly introduce the main content of the paper.
Research background
Large-scale language models have greatly promoted the progress of natural language processing, and related technologies have created several products with millions of users, including coding assistants Copilot, Google search engine and the recently popular ChatGPT. By combining memory with compositional capabilities, large language models can perform tasks such as language understanding or conditional and unconditional text generation with unprecedented performance, making higher-bandwidth human-computer interaction a reality.
However, large language models still have some limitations that prevent their wider deployment. Large language models often provide non-factual but plausible predictions, often called hallucinations. This leads to many avoidable errors, for example in arithmetic contexts or in reasoning chains. In addition, as measured by the number of trainable parameters, the breakthrough capabilities of many large language models seem to appear as the scale increases. For example, some researchers have proven that after a large language model reaches a certain scale, it can perform some tasks through few-sample prompting. BIG-bench tasks. Although a series of recent works have produced small-scale language models that still retain some characteristics of large models, the training and maintenance costs of large language models are still high due to their size and data requirements. Continuous learning of large models remains an open research problem, and Goldberg previously discussed other limitations of large language models in the context of the GPT-3-based chatbot ChatGPT.
In a recent study, researchers from Meta and other institutions analyzed that these problems stem from an essential flaw of large language models: they are usually trained to Perform statistical language modeling given (i) a single parameter model and (ii) limited context (usually n preceding or surrounding tokens). Although n has been growing due to innovations in software and hardware in recent years, most models still use relatively small contexts compared to the potentially large contexts required to consistently perform language modeling correctly. Therefore, models require huge scale to store knowledge that is not present in the context but is necessary to perform the task at hand.
Paper link: https://arxiv.org/pdf/2302.07842v1.pdf
Therefore, more and more research is aimed at solving these problems, slightly deviating from the purely statistical language modeling paradigm mentioned above.
For example, there is a work to circumvent the limited context size by increasing the relevance of large language models, by adding information extracted from relevant external documents. By equipping large language models with modules that retrieve such documents from a database for a given context, it is possible to match some of the capabilities of some of the largest language models with fewer parameters. Note that the resulting model is now non-parametric as it can query external data sources. In general, language models can also improve their context through inference strategies to generate more relevant context and save more computation before generating an answer.
Another strategy is to allow the language model to leverage external tools to augment the current context with important missing information not included in the language model weights. While much of this work aims to mitigate the language model shortcomings mentioned above, it also directly illustrates that more systematic use of inference and tools to enhance language models may lead to more powerful agents. These models are called Augmented Language Models (ALM). As this trend accelerated, the number of related studies grew dramatically, requiring the classification of works and the definition of technical terms for different uses.
The terms used in this paper are defined as follows:
reasoning. In the context of augmented language models, inference is the decomposition of a potentially complex task into simpler subtasks that the language model can more easily solve on its own or using tools. There are various ways of decomposing subtasks, such as recursively or iteratively. In this sense, reasoning is similar to "planning" as defined in LeCun's 2022 paper "A Path Towards Autonomous Machine Intelligence". In this article, inference will often involve various strategies for improving language model inference skills, such as step-by-step inference using few examples. It’s not entirely clear whether the language model is actually reasoning, or simply generating a larger context that increases the likelihood of correctly predicting the missing token. It may be helpful to refer to the discussion on this topic by other researchers (Huang and Chang (2022)): Although reasoning may be an abuse of language based on the current SOTA results, the term is already used in the community. A more practical definition of contextual reasoning in augmented language models is giving the model more computational steps before generating an answer to a prompt.
tool. #For augmented language models, a tool is an external module, typically called using rules or special tokens, whose output is included in the context of the augmented language model. The tool can collect external information or have an impact on the virtual or physical world (often perceived by an augmented language model). An example of a tool that obtains external information is a document retriever, while a tool that has external effects is a robotic arm. Tools can be called during training or inference time. In general, learning to interact with a tool may include learning to call its API.
Behavior. For an augmented language model, an action is invoking a tool that has an impact on the virtual or physical world and observing the results, typically by including it in the current context of the augmented language model. For example, some of the works mentioned in this article discuss web search or the manipulation of robotic arms through language models. To overuse the terminology a bit, researchers sometimes refer to an augmented language model's invocation of a tool as a behavior, even if it has no external effects.
#Why should reasoning and tools be discussed together? The combination of reasoning and tools in language models is used to solve a large number of complex tasks without the need for heuristics and therefore has better generalization capabilities. Typically, inference will facilitate language models that decompose a given problem into potentially simpler subtasks, while tools will help get each step correct, such as getting results from mathematical operations. In other words, inference is a way for language models to combine different tools to solve complex tasks, and tools are a way to avoid inference failures using efficient decomposition. Both should benefit from the other. Furthermore, inference and tools can be placed under the same “hood” since both enhance the context of the language model to better predict missing tokens, albeit in different ways.
#Why should tools and actions be discussed together? Language models can be invoked in the same way as tools that gather additional information and have an impact on the virtual or physical world. For example, there seems to be no difference between a language model outputting Python code for solving a mathematical operation and a language model outputting Python code for operating a robotic arm. Some of the work discussed in the paper has used language models with implications for virtual or physical worlds. From this point of view, it can be said that language models have behavioral potential, and the important progress they have made as a direction for automated agents is also worth looking forward to.
This article divides the research included in the survey into three parts. Section 2 examines work on enhancing the inference capabilities of language models as defined above. Section 3 focuses on work that allows language models to interact with and take action on external tools. Finally, Section 4 explores whether reasoning and tool use is achieved through heuristics or through learning, for example through supervision or reinforcement. The survey also includes other components, which the authors discuss in Section V. For brevity, the survey focuses on work that combines inference or tools with language models. Finally, although the focus of this article is on large language models, not all studies considered employed large models, so to ensure accuracy, language models will also be adhered to in the remaining investigations.
Inference
Previous work has shown that large language models can solve simple inference problems but not complex inference problems: therefore, this paper Section focuses on various strategies to enhance the reasoning skills of language models. One of the challenges of complex inference problems for linear models is to correctly obtain the solution by combining their predicted correct answers into subproblems. For example, a language model can accurately predict the birth and death dates of famous people, but it may not accurately predict the age. Some researchers refer to this difference as the compositionality gap of language models. The remainder of this section discusses work related to three popular paradigms of induced inference in language models. Since the current work focuses on inference combined with tools, the reader is referred here to a more in-depth discussion of the work of other researchers on large language model inference.
Usage of Tools and Behaviors
# Recent language model research lines allow model access not necessarily stored in its weights knowledge, such as factual knowledge. More precisely, tasks such as precise computation or information retrieval can be offloaded to external modules, such as a Python interpreter or a search engine module that is queried by the model, in which case these modules make use of tools. Furthermore, when a tool has an impact on the external world, we can say that the language model performed an action. Easily include tools and behaviors in the form of special tokens, a convenient feature combined with Transformer language modeling.
After reviewing how language models can be enhanced to exercise their ability to reason and apply tools, this survey also describes how to teach models to apply these abilities.
For more research details, please refer to the original paper.
The above is the detailed content of After GPT-4 is released, what will happen to other large models? Yann LeCun: Enhanced language models may be the way to go. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

Using the chrono library in C can allow you to control time and time intervals more accurately. Let's explore the charm of this library. C's chrono library is part of the standard library, which provides a modern way to deal with time and time intervals. For programmers who have suffered from time.h and ctime, chrono is undoubtedly a boon. It not only improves the readability and maintainability of the code, but also provides higher accuracy and flexibility. Let's start with the basics. The chrono library mainly includes the following key components: std::chrono::system_clock: represents the system clock, used to obtain the current time. std::chron

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

Bitcoin’s price fluctuations today are affected by many factors such as macroeconomics, policies, and market sentiment. Investors need to pay attention to technical and fundamental analysis to make informed decisions.

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.
