Constructing Scaling Law from 80 models: a new work by a Chinese doctoral student, highly recommended by the author of the thinking chain-AI-php.cn

Home

Constructing Scaling Law from 80 models: a new work by a Chinese doctoral student, highly recommended by the author of the thinking chain

PHPz

Jun 06, 2024 pm 08:40 PM

industry law of expansion

In the field of AI, scaling laws are a powerful tool for understanding the scaling trend of LM. They provide a guideline for researchers. This law provides an important guide for understanding how the performance of language models changes with scale.

Unfortunately, scaling analysis is not common in many benchmarking and post-training studies because most researchers do not have the computational resources to build scaling laws from scratch, And the open model is trained on too few scales to make reliable scaling predictions.

Researchers from Stanford University, University of Toronto and other institutions have proposed an alternative observation method: Observational Scaling Laws, which combines language models ( The functionality of LM) is tied to downstream performance across multiple model families, rather than just within a single family as is the case with standard computational expansion laws.

#This approach bypasses model training and instead builds scaling laws based on approximately 80 publicly available models. But this leads to another problem. Constructing a single expansion law from multiple model families faces huge challenges because of the large differences in training computational efficiency and capabilities between different models.

Nevertheless, the study shows that these changes are consistent with a simple, generalized expansion law in which language model performance is a low-dimensional capability space (low-dimensional capability space), and the entire model family only differs in the efficiency of converting training calculations into capabilities.

Using the above method, this study demonstrates the surprising predictability of many other types of extension studies, and they found that: some emergent phenomena follow smooth sigmoidal behavior and can Predict from small models; agent performance like GPT-4 can be accurately predicted from simpler non-agent benchmarks. Additionally, the study shows how to predict the impact of post-training interventions such as thought chains on the model.

Research shows that even when fitted using only a small sub-GPT-3 model, observable expansion laws can accurately predict complex phenomena such as emergent capabilities, agents Extensions to performance and post-training methods (e.g. thought chains).

Paper address: https://arxiv.org/pdf/2405.10938
Paper title: Observational Scaling Laws and the Predictability of Language Model Performance

There are three authors of the paper, among which Yangjun Ruan is a Chinese author. He graduated from Zhejiang University with a bachelor's degree .

This paper also received forwarded comments from Jason Wei, the proposer of the thinking chain. Jason Wei said that he liked this research very much.

Paper Introduction

The study observed that hundreds of open models currently exist, These models come in different sizes and capabilities. However, researchers cannot directly use these models to calculate expansion laws (because the computational efficiency of training varies greatly between model families), but researchers hope that there is a more general expansion law that applies to model families.

In particular, this paper assumes that the downstream performance of LM is a function of low-dimensional capability space (such as natural language understanding, reasoning and code generation), and the model family changes only in that they will The efficiency of training calculations translates into these capabilities. If this relationship holds true, it would mean that there is a log-linear relationship from low-dimensional capabilities to downstream capabilities across model families (which would allow researchers to establish scaling laws using existing models) (Figure 1). This study obtained low-cost, high-resolution extension predictions using nearly 80 publicly available LMs (right).

Through analysis of standard LM benchmarks (e.g., Open LLM Leaderboard), researchers have discovered a number of such capability measures that vary within model families compared to the amount of computation. There is an expansion law relationship (R^2 > 0.9) (see Figure 3 below), and this relationship also exists in different model families and downstream indicators. This article calls this expansion relationship the observable expansion law.

Finally, this study shows that using observable expansion laws is cheap and simple because there are a few series of models that are sufficient to replicate many of the study's core findings. Using this approach, the study found that scaling predictions for baseline and post-training interventions can be easily achieved by evaluating only 10-20 models.

Emergent ability

Regarding whether LM has discontinuity under certain computational thresholds The emergence of “emergent” capabilities, and whether these capabilities can be predicted using small models, has been hotly debated. Observable expansion laws suggest that some of these phenomena follow smooth S-shaped curves and can be accurately predicted using small sub Llama-2 7B models.

Agent capabilities

The research Show that the more advanced and complex capabilities of LM as an agent, as measured by AgentBench and AgentBoard, can be predicted using observable scaling laws. Through observable scaling laws, the study was able to accurately predict the performance of GPT-4 using only a weaker model (sub GPT-3.5) and identified programming ability as a factor driving agent performance.

Post-training method expansion

This study shows that even if the expansion law is fitted On weaker models (sub Llama-2 7B), the expansion law can also reliably predict the benefits of post-training methods, such as Chain-of-Thought, Self-Consistency, etc.

Overall, the contribution of this study is to propose observable expansion laws that exploit predictable logarithms between calculations, simple capability measures, and complex downstream metrics linear relationship.

Verification of the Observable Expansion Law

The researchers verified these expansions experimentally The usefulness of the law. In addition, after the paper was published, the researchers also pre-registered predictions for future models to test whether the expansion law overfits the current model. The relevant code about the implementation process and data collection has been released on GitHub:

GitHub address: https://github.com/ryoungj/ObsScaling

Predictability of emergent capability

Figure 4 below shows the prediction using the PC (principal capability) metric results, as well as baseline results for predicting performance based on training FLOPs. It can be found that these capabilities can be accurately predicted using our PC metric even when using only a poorly performing model.

In contrast, using training FLOPs results in significantly worse extrapolation on the test set and significantly worse fit on the training set, as indicated by higher MSE values. These differences may be caused by training FLOPs for different model families.

Agent capability predictability

Figure 5 below shows the use of PC metrics Finally, the prediction results of the observable expansion law. It can be found that on both agent benchmarks, the performance of the hold-out model (GPT-4 or Claude-2) using the PC metric can be accurately predicted from the weaker performing (more than 10% gap) model.

This shows that the more complex agent capabilities of LMs are closely related to their underlying model capabilities and are able to make predictions based on the latter. This also illustrates that as backbone LMs continue to expand in scale, LM-based agent capabilities have good scalability characteristics.

The impact of post-training techniques

Figure 6a below illustrates the use of observable Expansion laws, expansion prediction results of CoT and SC (Self-Consistency, self-consistency). It can be found that the performance of stronger, larger models using CoT and CoT+SC but without (Naive) post-training techniques can be accurately predicted from weaker models with smaller computational scale (such as model size and training FLOPs) out.

It is worth noting that the scaling trends are different between the two technologies, with CoT showing a more obvious scaling trend compared to using the self-consistency of CoT.

Please refer to the original paper for more technical details.

The above is the detailed content of Constructing Scaling Law from 80 models: a new work by a Chinese doctoral student, highly recommended by the author of the thinking chain. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

How to fix KB5055612 fails to install in Windows 10?

3 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Nordhold: Fusion System, Explained

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1668

CakePHP Tutorial

1426

Laravel Tutorial

1329

PHP Tutorial

1273

C# Tutorial

1256

Related knowledge

DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

See all articles