Table of Contents
Jeff Dean Full text of response
Home Technology peripherals AI The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

Apr 08, 2023 pm 04:21 PM
Google ai paper

Yesterday, the most popular topic in the entire community was nothing more than a machine learning researcher on reddit questioning the participation of Google AI leader Jeff Dean in the paper. The paper, "An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems," was submitted to the preprint paper platform arXiv on Thursday. The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

In the paper, Jeff Dean et al. proposed an evolutionary algorithm that can generate large-scale multi-task models, while also supporting the dynamic and continuous addition of new tasks. The generated multi-task model is sparsely activated and integrated with task-based routing. The new method achieves competitive results on 69 image classification tasks, such as achieving a new industry-high recognition accuracy of 99.43% on CIFAR-10 for a model trained only on public data.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

It is this new SOTA implemented on CIFAR-10 that has been questioned, the previous SOTA was 99.40. "Producing this result required a total of 17,810 TPU core hours," she said. "If you don't work at Google, this means you have to use on-demand payment of $3.22/hour and the trained model costs $57,348."

Therefore, she asked her soul, "Jeff Dean spent enough money to support a family of four for five years, achieved a 0.03% improvement on CIFAR-10, and created a new SOTA. It was all worth it. ?"

This question has been echoed by many people in the field. Some researchers even said pessimistically, "I have almost lost interest in deep learning. As a practitioner in a small laboratory, it is basically impossible to compete with the technology giants in terms of computing budget. Even if you have a good theoretical idea, There may also be biases in the mainstream environment that make it difficult to see the light of day. This creates an unfair playing field."

As the topic continued to ferment, Jeff Dean personally responded on reddit. He said, "The goal of our research is not to obtain a higher-quality cifar10 model, and there are also problems with the cost calculation method of the original author."

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

Jeff Dean Full text of response

This paper was completed by Andrea Gesmundo and I, and Andrea Gesmundo did most of the work on the paper.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

Paper address: https://arxiv.org/pdf/2205.12755.pdf

What I want to say is that the goal of this research is not to get A high quality cifar10 model. Rather, this study explores a setting that can dynamically introduce new tasks into a running system and successfully obtain a high-quality model for the new task that will reuse representations from existing models and sparsely New parameters are introduced while avoiding multi-task system problems such as catastrophic forgetting or negative migration.

The experiments of this study show that we can dynamically introduce 69 different task streams from several independent visualization task benchmarks, ultimately resulting in a multi-task system that can jointly produce high-quality images for all these tasks. solution. The resulting model is sparsely activated for any given task, with the system introducing fewer and fewer new parameters for new tasks (see Figure 2 below). The multitasking system introduced only 1.4% new parameters for the incremental tasks at the end of this task stream, with each task activating an average of 2.3% of the total parameters of the model. There is considerable representation sharing between tasks, and the evolution process helps determine when it makes sense and when new trainable parameters should be introduced for new tasks.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

I also think that the author of the original post calculated the cost wrong. The experiment was to train a multi-task model to jointly solve 69 tasks instead of training a cifar10 model. As you can see from Table 7 below, the calculations used are a mix of TPUv3 cores and TPUv4 cores, so core hours cannot be simply calculated as they are priced differently.

Unless you have a particularly urgent task and need to quickly train cifar10 68 tasks, in fact, this type of research can easily use resources with preemptive prices, namely $0.97/hour TPUv4, $0.60/hour TPUv3 (not what they call Says you have on-demand pricing of $3.22/hour). Under these assumptions, the compute public cloud cost described in Table 7 is approximately $13,960 (using preemptible prices of 12,861 TPUv4 chip-hours and 2,474.5 TPUv3 chip-hours), or approximately $202/task.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

I think it is important to have models with sparse activations and be able to dynamically introduce new tasks into existing systems that can share representations where appropriate ) and avoid catastrophic forgetting, these studies are at least worth exploring. The system also has the advantage that new tasks can be automatically incorporated into the system without having to be specifically formulated for it (this is what the evolutionary search process does), which seems to be a useful property of a continuously learning system.

The code of this paper is open source and you can view it yourself.

Code address: https://github.com/google-research/google-research/tree/master/muNet

The author of the original post replied to Jeff Dean

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

After seeing Jeff Dean’s reply, the author of the original post said: To clarify, I think Jeff Dean’s paper (the evolutionary model used to generate model expansion in each task) is really useful. Interesting, this reminds me of another paper, but I can't remember the title. It was about adding new modules to the overall architecture for each new task, using the hidden states of other modules as part of the input to each layer. , but does not update the weights of existing components.

I also have an idea to build modules in the model of each task. Do you know how baby deer can walk within minutes of being born? In contrast, at that time, newborn fawns had essentially no "training data" to learn to sense movement or model the world, and instead had to exploit specialized structures in the brain that had to be inherited in order for the fawn to Have basic skills. These structures will be very useful, so in a sense that it will quickly generalize to a new but related control task.

So this paper got me thinking about the development of already existing inheritable structures that can be used to learn new tasks more efficiently.

Researchers in another lab may have the same idea, but get much worse results because they cannot afford to move from their existing setup to a large cloud platform. And, because the community is now overly focused on SOTA results, their research cannot be published. Even though the cost is "only" $202/task, it takes many iterations to get things right.

So, for those of us who don’t have access to a large computing budget, our options are essentially two. One is to pray and hope that Google will publicly distribute the existing model and we can fine-tune it to our needs. But as a result, the model may have learned biases or adversarial weaknesses that we can't eliminate. The second is to do nothing and lie down.

So, my problem is not just with this study. If OpenAI wants to spend hundreds of billions of dollars (figuratively speaking) on ​​GPT-4, then give it more power. This is a scientific and publishing culture that overly rewards glitz, big numbers, and luxury, rather than helping people get better at real work. My favorite paper is "Representation Learning with Contrastive Predictive Coding" by van der Oord in 2019, which uses an unsupervised pre-training task and then supervised training on a small subset of labels to achieve replica-labeled all data. accuracy results, and discuss this improvement from a data efficiency perspective. I reproduced and used these results in my work, saving myself time and money. Just based on this paper, I am willing to become his doctoral student.

However, OpenAI proposed a larger transformer model GPT-3 in the paper "Language Models are Few-Shot Learners", which received nearly 4,000 citations and the NeurIPS 2020 Best Paper Award, and also won the entire media attention. ​

The above is the detailed content of The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

What does cross-chain transaction mean? What are the cross-chain transactions? What does cross-chain transaction mean? What are the cross-chain transactions? Apr 21, 2025 pm 11:39 PM

Exchanges that support cross-chain transactions: 1. Binance, 2. Uniswap, 3. SushiSwap, 4. Curve Finance, 5. Thorchain, 6. 1inch Exchange, 7. DLN Trade, these platforms support multi-chain asset transactions through various technologies.

How to win KERNEL airdrop rewards on Binance Full process strategy How to win KERNEL airdrop rewards on Binance Full process strategy Apr 21, 2025 pm 01:03 PM

In the bustling world of cryptocurrencies, new opportunities always emerge. At present, KernelDAO (KERNEL) airdrop activity is attracting much attention and attracting the attention of many investors. So, what is the origin of this project? What benefits can BNB Holder get from it? Don't worry, the following will reveal it one by one for you.

WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? Apr 21, 2025 pm 02:42 PM

WorldCoin (WLD) stands out in the cryptocurrency market with its unique biometric verification and privacy protection mechanisms, attracting the attention of many investors. WLD has performed outstandingly among altcoins with its innovative technologies, especially in combination with OpenAI artificial intelligence technology. But how will the digital assets behave in the next few years? Let's predict the future price of WLD together. The 2025 WLD price forecast is expected to achieve significant growth in WLD in 2025. Market analysis shows that the average WLD price may reach $1.31, with a maximum of $1.36. However, in a bear market, the price may fall to around $0.55. This growth expectation is mainly due to WorldCoin2.

'Black Monday Sell' is a tough day for the cryptocurrency industry 'Black Monday Sell' is a tough day for the cryptocurrency industry Apr 21, 2025 pm 02:48 PM

The plunge in the cryptocurrency market has caused panic among investors, and Dogecoin (Doge) has become one of the hardest hit areas. Its price fell sharply, and the total value lock-in of decentralized finance (DeFi) (TVL) also saw a significant decline. The selling wave of "Black Monday" swept the cryptocurrency market, and Dogecoin was the first to be hit. Its DeFiTVL fell to 2023 levels, and the currency price fell 23.78% in the past month. Dogecoin's DeFiTVL fell to a low of $2.72 million, mainly due to a 26.37% decline in the SOSO value index. Other major DeFi platforms, such as the boring Dao and Thorchain, TVL also dropped by 24.04% and 20, respectively.

Ranking of leveraged exchanges in the currency circle The latest recommendations of the top ten leveraged exchanges in the currency circle Ranking of leveraged exchanges in the currency circle The latest recommendations of the top ten leveraged exchanges in the currency circle Apr 21, 2025 pm 11:24 PM

The platforms that have outstanding performance in leveraged trading, security and user experience in 2025 are: 1. OKX, suitable for high-frequency traders, providing up to 100 times leverage; 2. Binance, suitable for multi-currency traders around the world, providing 125 times high leverage; 3. Gate.io, suitable for professional derivatives players, providing 100 times leverage; 4. Bitget, suitable for novices and social traders, providing up to 100 times leverage; 5. Kraken, suitable for steady investors, providing 5 times leverage; 6. Bybit, suitable for altcoin explorers, providing 20 times leverage; 7. KuCoin, suitable for low-cost traders, providing 10 times leverage; 8. Bitfinex, suitable for senior play

Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Apr 21, 2025 pm 06:24 PM

Aavenomics is a proposal to modify the AAVE protocol token and introduce token repos, which has implemented a quorum for AAVEDAO. Marc Zeller, founder of the AAVE Project Chain (ACI), announced this on X, noting that it marks a new era for the agreement. Marc Zeller, founder of the AAVE Chain Initiative (ACI), announced on X that the Aavenomics proposal includes modifying the AAVE protocol token and introducing token repos, has achieved a quorum for AAVEDAO. According to Zeller, this marks a new era for the agreement. AaveDao members voted overwhelmingly to support the proposal, which was 100 per week on Wednesday

The top ten free platform recommendations for real-time data on currency circle markets are released The top ten free platform recommendations for real-time data on currency circle markets are released Apr 22, 2025 am 08:12 AM

Cryptocurrency data platforms suitable for beginners include CoinMarketCap and non-small trumpet. 1. CoinMarketCap provides global real-time price, market value, and trading volume rankings for novice and basic analysis needs. 2. The non-small quotation provides a Chinese-friendly interface, suitable for Chinese users to quickly screen low-risk potential projects.

Understand one article: Binance KERNEL Airdrop Process Understand one article: Binance KERNEL Airdrop Process Apr 21, 2025 pm 01:09 PM

In the world of cryptocurrencies, new opportunities always emerge. Recently, the KernelDAO (KERNEL) Megadrop project launched by Binance has attracted widespread attention. This project not only brings new investment options to investors, but also provides unique benefits to BNB holders. So, what exactly is KernelDAO? How will this airdrop be carried out? Let us understand it in one article.

See all articles