Home Technology peripherals AI The model behind Apple's intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo

The model behind Apple's intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo

Jun 13, 2024 pm 08:44 PM
industry apple inc.

At the just-concluded Worldwide Developers Conference, Apple announced Apple intelligence, a new personalized intelligence system deeply integrated into iOS 18, iPadOS 18 and macOS Sequoia.

The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo

Apple+ Intelligence consists of a variety of highly intelligent generative models designed for users’ daily tasks. In Apple's just-updated blog, they detailed two of the models.

  • A device-side language model with about 3 billion parameters;

  • A larger server-based language model, the The model runs on Apple servers via private cloud computing.

These two base models are part of Apple’s generative model family, and Apple says they will share more information about this model family in the near future. .

In this blog, Apple spends a lot of time introducing how they develop high-performance, fast, and energy-saving models; how to train these models; how to fine-tune adapters for specific user needs; and how to evaluate the models to help and performance in avoiding accidental injury.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo

##Pre-training

##The basic model is trained on the AXLearn framework, which is Apple’s An open source project released in 2023. The framework is built on JAX and XLA, enabling users to efficiently and scalably train models on a variety of hardware and cloud platforms, including TPUs and GPUs in the cloud and on-premises. In addition, Apple uses techniques such as data parallelism, tensor parallelism, sequence parallelism, and FSDP to scale training along multiple dimensions such as data, model, and sequence length.

When Apple trains its base model, it uses authorized data, including data specially selected to enhance certain functions, as well as data provided by Apple’s web page The crawler AppleBot collects data from the public Internet. Publishers of web content can choose not to have their web content used to train Apple Intelligence by setting data usage controls.

#Apple never uses users’ private data when training its base model. To protect privacy, they use filters to remove personally identifiable information, such as credit card numbers, that is publicly available on the Internet. Additionally, they filter out vulgar language and other low-quality content before it makes it into the training data set. In addition to these filtering measures, Apple performs data extraction and deduplication and uses model-based classifiers to identify and select high-quality documents for training.

Post-training

Apple found that data quality has a negative impact on the model is crucial, so a hybrid data strategy of manually annotated and synthetic data is used in the training process, along with comprehensive data management and filtering procedures. Apple developed two new algorithms in the post-training phase: (1) a rejection sampling fine-tuning algorithm with a “teacher committee”, (2) reinforcement from human feedback using a mirror-descent strategy optimization and a leave-one-out advantage estimator Learning (RLHF) algorithm. These two algorithms significantly improve the model's instruction following quality.

Optimization

## In addition to ensuring the high performance of the generated model itself Performance, Apple also uses a variety of innovative technologies to optimize models on the device and on the private cloud to improve speed and efficiency. In particular, they made a lot of optimizations to the model's reasoning process in generating the first token (the basic unit of a single character or word) and subsequent tokens to ensure fast response and efficient operation of the model.
Apple adopts a group query attention mechanism in both the device-side model and the server model to improve efficiency. To reduce memory requirements and inference costs, they use shared input and output vocabulary embedding tables that are not duplicated during mapping. The device-side model has a vocabulary of 49,000, while the server model has a vocabulary of 100,000.
For device-side inference, Apple uses low-bit palletization, a key optimization technique that meets the necessary memory, power consumption, and performance requirements. To maintain model quality, Apple also developed a new framework using the LoRA adapter that combines a hybrid 2-bit and 4-bit configuration strategy — an average of 3.5 bits per weight — to achieve the same accuracy as the uncompressed model.
In addition, Apple has used Talaria, an interactive model latency and power analysis tool, as well as activation quantization and embedding quantization, and developed an efficient key implementation on the neural engine. Value(KV) Method to cache updates.
Through this series of optimizations, on iPhone 15 Pro, when the model receives a prompt word, the time required from receiving the prompt word to generating the first token The time is about 0.6 milliseconds. This delay time is very short, indicating that the model is very fast in generating responses at a rate of 30 tokens per second.

Model Adaptation
Apple fine-tunes the base model to the user’s daily activities and can dynamically tailor it to the current task.

The research team used adapters, small neural network modules that can be plugged into various layers of a pre-trained model, to fine-tune the model for specific tasks. Specifically, the research team adjusted the attention matrix, the attention projection matrix, and the fully connected layer in the point-wise feedforward network.

By fine-tuning only the adapter layer, the original parameters of the pre-trained base model remain unchanged, retaining the general knowledge of the model, while tailoring the adapter layer to support specific tasks. The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
Figure 2: Adapters are small collections of model weights overlaid on a common base model. They can be loaded and exchanged dynamically - enabling the underlying model to dynamically specialize in the task at hand. Apple Intelligence includes an extensive set of adapters, each fine-tuned for specific functionality. This is an efficient way to extend the functionality of its base model.

The research team uses 16 bits to represent the values ​​of adapter parameters. For a device model with about 3 billion parameters, 16 adapters Parameters typically require 10 megabytes. Adapter models can be dynamically loaded, temporarily cached in memory, and exchanged. This enables the underlying model to dynamically specialize in the current task while efficiently managing memory and ensuring operating system responsiveness.

To facilitate the training of adapters, Apple has created an efficient infrastructure to quickly retrain, test, and deploy adapters when the underlying model or training data is updated.

Performance evaluation

Apple is benchmarking the model When testing, focus on human evaluation because the results of human evaluation are highly relevant to the user experience of the product.

To evaluate product-specific summary capabilities, the research team used a set of 750 responses carefully sampled for each use case. The evaluation dataset emphasizes the variety of inputs a product feature may face in production and includes a layered mix of single and stacked documents of varying content types and lengths. Experimental results found that models with adapters were able to generate better summaries than similar models.

#As part of responsible development, Apple identifies and evaluates specific risks inherent in abstracts. For example, summaries sometimes remove important nuances or other details. However, the research team found that the digest adapter did not amplify sensitive content in more than 99% of targeted adversarial samples.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
                                                                                                                                                                                                                                  Figure 3: Proportion of “good” and “poor” responses for summary use cases.

In addition to evaluating the base model and specific features supported by the adapter, the research team also evaluated on-device models and based on General functionality of the server's model. Specifically, the research team used a comprehensive set of real-world prompts to test model functionality, covering brainstorming, classification, closed Q&A, coding, extraction, mathematical reasoning, open Q&A, rewriting, security, summarization, and writing. Task.

The research team compared the model with open source models (Phi-3, Gemma, Mistral, DBRX) and commercial models of comparable scale (GPT-3.5-Turbo, GPT-4- Turbo) for comparison. It was found that Apple's model was favored by human evaluators compared to most competing models. For example, Apple's on-device model with ~3B parameters outperforms larger models including Phi-3-mini, Mistral-7B, and Gemma-7B; server models compete with DBRX-Instruct, Mixtral-8x22B, and GPT-3.5 -Turbo is not inferior in comparison and is very efficient at the same time.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
# This Figure 4: The proportion of response ratio in the assessment of Apple Basic Model and Comparison Model.

The research team also used a different set of adversarial prompts to test the model on harmful content, sensitive topics, and facts The performance of , measures the rate of model violations as assessed by human evaluators, with lower numbers being better. Faced with adversarial prompts, both on-device and server models are robust, with lower violation rates than open source and commercial models.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
## This Figure 5: The proportion of violations of harmful content, sensitive theme and factuality (the lower the better). Apple's model is very robust when faced with adversarial prompts.

Given the broad capabilities of large language models, Apple is actively engaging with internal and external teams on manual and automated red teams Collaborate to further evaluate the safety of the model.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
Figure 6: Proportion of preferred responses in parallel evaluations of Apple’s base model and similar models in terms of security prompts. Human evaluators found the Apple base model's responses to be safer and more helpful.

#To further evaluate the model, the research team used the Instruction Tracing Evaluation (IFEval) benchmark to compare its instruction tracing capabilities to similarly sized models. Results show that both on-device and server models follow detailed instructions better than open source and commercial models of equal scale.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
## 图 7: Apple basic model and instruction tracking ability of similar scale models (using IFEVAL benchmark).

Apple also evaluated the model’s writing abilities across various writing instructions.
The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo
                                              Figure 8: Writing ability (the higher, the better).

Finally, let’s take a look at Apple’s video introducing the technology behind Apple Intelligence. The model behind Apples intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-TurboReference link: https://machinelearning.apple.com/research/introducing-apple-foundation-models

The above is the detailed content of The model behind Apple's intelligence is announced: the 3B model is better than Gemma-7B, and the server model is comparable to GPT-3.5-Turbo. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Nordhold: Fusion System, Explained
3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1669
14
PHP Tutorial
1273
29
C# Tutorial
1256
24
DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners DeepMind robot plays table tennis, and its forehand and backhand slip into the air, completely defeating human beginners Aug 09, 2024 pm 04:01 PM

But maybe he can’t defeat the old man in the park? The Paris Olympic Games are in full swing, and table tennis has attracted much attention. At the same time, robots have also made new breakthroughs in playing table tennis. Just now, DeepMind proposed the first learning robot agent that can reach the level of human amateur players in competitive table tennis. Paper address: https://arxiv.org/pdf/2408.03906 How good is the DeepMind robot at playing table tennis? Probably on par with human amateur players: both forehand and backhand: the opponent uses a variety of playing styles, and the robot can also withstand: receiving serves with different spins: However, the intensity of the game does not seem to be as intense as the old man in the park. For robots, table tennis

The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home The first mechanical claw! Yuanluobao appeared at the 2024 World Robot Conference and released the first chess robot that can enter the home Aug 21, 2024 pm 07:33 PM

On August 21, the 2024 World Robot Conference was grandly held in Beijing. SenseTime's home robot brand "Yuanluobot SenseRobot" has unveiled its entire family of products, and recently released the Yuanluobot AI chess-playing robot - Chess Professional Edition (hereinafter referred to as "Yuanluobot SenseRobot"), becoming the world's first A chess robot for the home. As the third chess-playing robot product of Yuanluobo, the new Guoxiang robot has undergone a large number of special technical upgrades and innovations in AI and engineering machinery. For the first time, it has realized the ability to pick up three-dimensional chess pieces through mechanical claws on a home robot, and perform human-machine Functions such as chess playing, everyone playing chess, notation review, etc.

Claude has become lazy too! Netizen: Learn to give yourself a holiday Claude has become lazy too! Netizen: Learn to give yourself a holiday Sep 02, 2024 pm 01:56 PM

The start of school is about to begin, and it’s not just the students who are about to start the new semester who should take care of themselves, but also the large AI models. Some time ago, Reddit was filled with netizens complaining that Claude was getting lazy. "Its level has dropped a lot, it often pauses, and even the output becomes very short. In the first week of release, it could translate a full 4-page document at once, but now it can't even output half a page!" https:// www.reddit.com/r/ClaudeAI/comments/1by8rw8/something_just_feels_wrong_with_claude_in_the/ in a post titled "Totally disappointed with Claude", full of

At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded At the World Robot Conference, this domestic robot carrying 'the hope of future elderly care' was surrounded Aug 22, 2024 pm 10:35 PM

At the World Robot Conference being held in Beijing, the display of humanoid robots has become the absolute focus of the scene. At the Stardust Intelligent booth, the AI ​​robot assistant S1 performed three major performances of dulcimer, martial arts, and calligraphy in one exhibition area, capable of both literary and martial arts. , attracted a large number of professional audiences and media. The elegant playing on the elastic strings allows the S1 to demonstrate fine operation and absolute control with speed, strength and precision. CCTV News conducted a special report on the imitation learning and intelligent control behind "Calligraphy". Company founder Lai Jie explained that behind the silky movements, the hardware side pursues the best force control and the most human-like body indicators (speed, load) etc.), but on the AI ​​side, the real movement data of people is collected, allowing the robot to become stronger when it encounters a strong situation and learn to evolve quickly. And agile

ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award ACL 2024 Awards Announced: One of the Best Papers on Oracle Deciphering by HuaTech, GloVe Time Test Award Aug 15, 2024 pm 04:37 PM

At this ACL conference, contributors have gained a lot. The six-day ACL2024 is being held in Bangkok, Thailand. ACL is the top international conference in the field of computational linguistics and natural language processing. It is organized by the International Association for Computational Linguistics and is held annually. ACL has always ranked first in academic influence in the field of NLP, and it is also a CCF-A recommended conference. This year's ACL conference is the 62nd and has received more than 400 cutting-edge works in the field of NLP. Yesterday afternoon, the conference announced the best paper and other awards. This time, there are 7 Best Paper Awards (two unpublished), 1 Best Theme Paper Award, and 35 Outstanding Paper Awards. The conference also awarded 3 Resource Paper Awards (ResourceAward) and Social Impact Award (

Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Li Feifei's team proposed ReKep to give robots spatial intelligence and integrate GPT-4o Sep 03, 2024 pm 05:18 PM

Deep integration of vision and robot learning. When two robot hands work together smoothly to fold clothes, pour tea, and pack shoes, coupled with the 1X humanoid robot NEO that has been making headlines recently, you may have a feeling: we seem to be entering the age of robots. In fact, these silky movements are the product of advanced robotic technology + exquisite frame design + multi-modal large models. We know that useful robots often require complex and exquisite interactions with the environment, and the environment can be represented as constraints in the spatial and temporal domains. For example, if you want a robot to pour tea, the robot first needs to grasp the handle of the teapot and keep it upright without spilling the tea, then move it smoothly until the mouth of the pot is aligned with the mouth of the cup, and then tilt the teapot at a certain angle. . this

Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Distributed Artificial Intelligence Conference DAI 2024 Call for Papers: Agent Day, Richard Sutton, the father of reinforcement learning, will attend! Yan Shuicheng, Sergey Levine and DeepMind scientists will give keynote speeches Aug 22, 2024 pm 08:02 PM

Conference Introduction With the rapid development of science and technology, artificial intelligence has become an important force in promoting social progress. In this era, we are fortunate to witness and participate in the innovation and application of Distributed Artificial Intelligence (DAI). Distributed artificial intelligence is an important branch of the field of artificial intelligence, which has attracted more and more attention in recent years. Agents based on large language models (LLM) have suddenly emerged. By combining the powerful language understanding and generation capabilities of large models, they have shown great potential in natural language interaction, knowledge reasoning, task planning, etc. AIAgent is taking over the big language model and has become a hot topic in the current AI circle. Au

Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Hongmeng Smart Travel S9 and full-scenario new product launch conference, a number of blockbuster new products were released together Aug 08, 2024 am 07:02 AM

This afternoon, Hongmeng Zhixing officially welcomed new brands and new cars. On August 6, Huawei held the Hongmeng Smart Xingxing S9 and Huawei full-scenario new product launch conference, bringing the panoramic smart flagship sedan Xiangjie S9, the new M7Pro and Huawei novaFlip, MatePad Pro 12.2 inches, the new MatePad Air, Huawei Bisheng With many new all-scenario smart products including the laser printer X1 series, FreeBuds6i, WATCHFIT3 and smart screen S5Pro, from smart travel, smart office to smart wear, Huawei continues to build a full-scenario smart ecosystem to bring consumers a smart experience of the Internet of Everything. Hongmeng Zhixing: In-depth empowerment to promote the upgrading of the smart car industry Huawei joins hands with Chinese automotive industry partners to provide

See all articles