


The amount of text data used for training Google PaLM 2 is nearly 5 times that of the original generation
According to news on May 17, Google launched its latest large-scale language model PaLM 2 at the 2023 I/O Developer Conference last week. Internal company documents show that the amount of text data used to train new models starting in 2022 is almost five times that of the previous generation.
It is reported that Google’s newly released PaLM 2 can perform more advanced programming, computing and creative writing tasks. Internal documents revealed that the number of tokens used to train PaLM 2 is 3.6 trillion.
The so-called token is a string. People will segment the sentences and paragraphs in the text used to train the model. Each string is usually called a token. This is an important part of training large language models, teaching them to predict which word will come next in a sequence.
The previous generation of large language model PaLM released by Google in 2022 used 780 billion tokens in training.
Although Google has been keen to demonstrate its prowess in artificial intelligence technology, showing how it can be embedded in search engines, email, word processing and spreadsheets, it has been reluctant to disclose the scale of training data. or other details. Microsoft-backed OpenAI is also keeping details of its newly released GPT-4 large-scale language model secret.
Both companies stated that the reason for not disclosing this information is the fierce competition in the artificial intelligence industry. Both Google and OpenAI want to attract users who want to use chatbots instead of traditional search engines to search for information.
But as competition in the field of artificial intelligence heats up, the research community is demanding more transparency.
Since launching PaLM 2, Google has said that the new model is smaller than the previous large language model, which means the company's technology can become more efficient at completing more complex tasks. Parameters are often used to describe the complexity of a language model. According to internal documents, PaLM 2 was trained with 340 billion parameters, and the original PaLM was trained with 540 billion parameters.
Google had no immediate comment.
In a blog post about PaLM 2, Google said that the new model uses a "new technology" called "compute-optimal scaling" (compute-optimal scaling), which can make PaLM 2 " More efficient, with better overall performance, such as faster inference, fewer service parameters, and lower service costs."
When releasing PaLM 2, Google revealed that the new model was trained in 100 languages and Capable of performing a variety of tasks. PaLM 2 is used in 25 features and products, including Google's experimental chatbot Bard. PaLM 2 has four different versions according to parameter scale, ranging from small to large: Gecko, Otter, Bison and Unicorn.
According to information publicly disclosed by Google, PaLM 2 is more powerful than any existing model. Facebook announced the launch of a large language model called LLaMA in February this year, which used 1.4 trillion tokens in training. OpenAI disclosed the relevant training scale when it released GPT-3. At that time, the company stated that the model had been trained on 300 billion tokens. In March this year, OpenAI released a new model, GPT-4, and said it performed at “human levels” in many professional tests.
According to the latest documents, the language model launched by Google two years ago was trained on 1.5 trillion tokens.
As new generative AI applications quickly become mainstream in the technology industry, the controversy surrounding the underlying technology is becoming increasingly fierce.
In February of this year, El Mahdi El Mhamdi, a senior scientist in Google’s research department, resigned due to the company’s lack of transparency. On Tuesday, OpenAI CEO Sam Altman testified at a U.S. Senate Judiciary Subcommittee hearing on privacy and technology and agreed with new systems to deal with artificial intelligence.
“For a very new technology, we need a new framework,” Altman said. “Of course, companies like ours have a lot of responsibility for the tools they put out.”
The above is the detailed content of The amount of text data used for training Google PaLM 2 is nearly 5 times that of the original generation. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics











This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

This article introduces in detail the registration, use and cancellation procedures of Ouyi OKEx account. To register, you need to download the APP, enter your mobile phone number or email address to register, and complete real-name authentication. The usage covers the operation steps such as login, recharge and withdrawal, transaction and security settings. To cancel an account, you need to contact Ouyi OKEx customer service, provide necessary information and wait for processing, and finally obtain the account cancellation confirmation. Through this article, users can easily master the complete life cycle management of Ouyi OKEx account and conduct digital asset transactions safely and conveniently.

This article provides a complete guide to Binance registration and security settings, covering pre-registration preparations (including equipment, email, mobile phone number and identity document preparation), and introduces two registration methods on the official website and APP, as well as different levels of identity verification (KYC) processes. In addition, the article also focuses on key security steps such as setting up a fund password, enabling two-factor verification (2FA, including Google Authenticator and SMS Verification), and setting up anti-phishing codes, helping users to register and use the Binance Binance platform for cryptocurrency transactions safely and conveniently. Please be sure to understand relevant laws and regulations and market risks before trading and invest with caution.

How to optimize jieba word segmentation to improve keyword extraction of scenic spot comments? When using jieba word segmentation to process scenic spot comment data, if the word segmentation results are ignored...

Tutorial on using gate.io mobile app: 1. For Android users, visit the official Gate.io website and download the Android installation package, you may need to allow the installation of applications from unknown sources in your mobile phone settings; 2. For iOS users, search "Gate.io" in the App Store to download.

HederaHashgraph (HBAR) in-depth analysis: resilience performance and investment value HederaHashgraph (HBAR) has shown responsiveness recently and has become the focus of market attention. This article will explore HBAR and its potential investment value in depth. What is HBAR? HederaHashgraph is a distributed ledger platform based on hash graph technology, with its native tokens being HBAR. It aims to provide efficient, secure, and scalable decentralized applications (DApp) solutions. Core advantages include: Hashgraph consensus mechanism: using Hashgraph consensus algorithm based on directed acyclic graph (DAG) to achieve a traditional block

Top 10 recommended global virtual currency trading platforms in 2025, helping you to play the digital currency market! This article will deeply analyze the core advantages and special features of ten top platforms including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. Whether you are pursuing high liquidity and rich trading types, or focusing on safety, compliance and innovative functions, you can find a platform that suits you here. We will provide a comprehensive comparison of transaction types, security, special functions, etc. to help you choose the most suitable virtual currency trading platform and seize the opportunities of digital currency investment in 2025

This article introduces in detail the complete steps of logging in to the OKEx web version of Ouyi in detail, including preparation work (to ensure stable network connection and browser update), accessing the official website (to pay attention to the accuracy of the URL and avoid phishing website), finding the login entrance (click the "Login" button in the upper right corner of the homepage of the official website), entering the login information (email/mobile phone number and password, supporting verification code login), completing security verification (sliding verification, Google verification or SMS verification), and finally you can conduct digital asset trading after successfully logging in. A safe and convenient login process to ensure the safety of user assets.
