Home Technology peripherals AI Google is optimizing the diffusion model. Samsung mobile phones run Stable Diffusion and produce images in 12 seconds.

Google is optimizing the diffusion model. Samsung mobile phones run Stable Diffusion and produce images in 12 seconds.

Apr 28, 2023 am 08:19 AM
Google Model

Stable Diffusion is as well-known in the field of image generation as ChatGPT in the conversation large model. It is capable of creating realistic images of any given input text in tens of seconds. Because Stable Diffusion has more than 1 billion parameters, and due to limited computing and memory resources on the device, this model is primarily run in the cloud.

Without careful design and implementation, running these models on a device may result in increased latency due to the iterative denoising process and excessive memory consumption.

How to run Stable Diffusion on the device has aroused everyone's research interest. Previously, some researchers developed an application that uses Stable Diffusion to generate images on the iPhone 14 Pro. Takes one minute and uses approximately 2GiB of application memory.

Apple has also made some optimizations to this before. They can generate an image with a resolution of 512x512 in half a minute on iPhone, iPad, Mac and other devices. Qualcomm follows closely behind, running Stable Diffusion v1.5 on Android phones, generating images with a resolution of 512x512 in less than 15 seconds.

Recently, in a paper published by Google "Speed ​​Is All You Need: On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations", they implemented a GPU-driven Stable Diffusion 1.4 is run on the device, achieving SOTA inference latency performance (on Samsung S23 Ultra, it only takes 11.5 seconds to generate a 512 × 512 image through 20 iterations). Furthermore, this study is not specific to one device; rather, it is a general approach applicable to improving all potential diffusion models.

This research opens up many possibilities for running generative AI locally on your phone, without a data connection or cloud server. Stable Diffusion was only released last fall, and it can already be plugged into devices and run today, which shows how fast this field is developing.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

##Paper address: https://arxiv.org/pdf/2304.11267.pdf

In order to achieve this generation speed, Google has put forward some optimization suggestions. Let’s take a look at how Google optimizes.

Method introduction

This research aims to propose optimization methods to improve the speed of large-scale diffusion model Vincentian diagrams. Some optimization suggestions are proposed for Stable Diffusion. These optimization suggestions are also Suitable for other large diffusion models.

First let’s take a look at the main components of Stable Diffusion, including: text embedder (text embedder), noise generation (noise generation), denoising neural network (denoising neural network) and Image decoder (image decoder, as shown in Figure 1 below.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

##Then let’s take a closer look at the three issues proposed in this study. An optimization method.

Specialized kernel: Group Norm and GELU

Group Normalization (GN) method The working principle is to divide the channels of the feature map into smaller groups and normalize each group independently, thus making GN less dependent on batch size and more suitable for various batch sizes and network architectures. . Instead of performing reshape, mean, variance, and normalization operations in sequence, this research designed a unique GPU shader form of kernel that can perform all these operations in one GPU command without any intermediate Tensor.

Gaussian error linear unit (GELU), as a commonly used model activation function, contains a large number of numerical calculations, such as multiplication, addition and Gaussian error function. This study uses a A dedicated shader to integrate these numerical calculations and their accompanying split and multiplication operations so that they can be performed in a single AI paint call.

Improving the efficiency of the attention module

The text-to-image transformer in Stable Diffusion helps model conditional distributions, which is crucial for text-to-image generation tasks. However, self/cross-attention mechanisms encounter difficulties in processing long sequences due to memory complexity and time complexity. Based on this, this study proposes two optimization methods to alleviate the computational bottleneck.

On the one hand, in order to avoid performing the entire softmax calculation on a large matrix, this study uses a GPU shader to reduce computational operations, which greatly reduces the memory footprint and overall latency of the intermediate tensor. The specific method is shown in Figure 2 below.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

On the other hand, this study uses FlashAttention [7], an IO-aware precise attention algorithm, which enables high Bandwidth Memory (HBM) requires fewer accesses than standard attention mechanisms, improving overall efficiency.

Winograd Convolution

Winograd convolution converts the convolution operation into a series of matrix multiplications. This method can reduce many multiplication operations and improve calculation efficiency. However, this also increases memory consumption and numerical errors, especially when using larger tiles.

The backbone of Stable Diffusion relies heavily on 3×3 convolutional layers, especially in the image decoder, where they account for 90%. This study provides an in-depth analysis of this phenomenon to explore the potential benefits of using Winograd with different tile sizes on 3 × 3 kernel convolutions. Research has found that a tile size of 4 × 4 is optimal as it provides the best balance between computational efficiency and memory utilization.

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

Experimentation

The study was benchmarked on a variety of devices: Samsung S23 Ultra (Adreno 740) and iPhone 14 Pro Max (A16). The benchmark results are shown in Table 1 below:

谷歌下场优化扩散模型,三星手机运行Stable Diffusion,12秒内出图

It is obvious that as each optimization is activated, the latency gradually decreases (It can be understood that the time to generate images is reduced). Specifically, compared to the baseline: 52.2% latency reduction on Samsung S23 Ultra; 32.9% latency reduction on iPhone 14 Pro Max. In addition, the study also evaluates the end-to-end latency of Samsung S23 Ultra, generating a 512 × 512 pixel image within 20 denoising iteration steps, achieving SOTA results in less than 12 seconds.

Small devices can run their own generative artificial intelligence models. What does this mean for the future? We can expect a wave.

The above is the detailed content of Google is optimizing the diffusion model. Samsung mobile phones run Stable Diffusion and produce images in 12 seconds.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial
1660
14
PHP Tutorial
1261
29
C# Tutorial
1234
24
Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Sesame Open Door Exchange Web Page Login Latest version gateio official website entrance Mar 04, 2025 pm 11:48 PM

A detailed introduction to the login operation of the Sesame Open Exchange web version, including login steps and password recovery process. It also provides solutions to common problems such as login failure, unable to open the page, and unable to receive verification codes to help you log in to the platform smoothly.

Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Sesame Open Door Exchange Web Page Registration Link Gate Trading App Registration Website Latest Feb 28, 2025 am 11:06 AM

This article introduces the registration process of the Sesame Open Exchange (Gate.io) web version and the Gate trading app in detail. Whether it is web registration or app registration, you need to visit the official website or app store to download the genuine app, then fill in the user name, password, email, mobile phone number and other information, and complete email or mobile phone verification.

Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Top 10 recommended for crypto digital asset trading APP (2025 global ranking) Mar 18, 2025 pm 12:15 PM

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Tutorial on how to register, use and cancel Ouyi okex account Tutorial on how to register, use and cancel Ouyi okex account Mar 31, 2025 pm 04:21 PM

This article introduces in detail the registration, use and cancellation procedures of Ouyi OKEx account. To register, you need to download the APP, enter your mobile phone number or email address to register, and complete real-name authentication. The usage covers the operation steps such as login, recharge and withdrawal, transaction and security settings. To cancel an account, you need to contact Ouyi OKEx customer service, provide necessary information and wait for processing, and finally obtain the account cancellation confirmation. Through this article, users can easily master the complete life cycle management of Ouyi OKEx account and conduct digital asset transactions safely and conveniently.

Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Sesame Open Door Trading Platform Download Mobile Version Gateio Trading Platform Download Address Feb 28, 2025 am 10:51 AM

It is crucial to choose a formal channel to download the app and ensure the safety of your account.

How to register and download the latest app on Bitget official website How to register and download the latest app on Bitget official website Mar 05, 2025 am 07:54 AM

This guide provides detailed download and installation steps for the official Bitget Exchange app, suitable for Android and iOS systems. The guide integrates information from multiple authoritative sources, including the official website, the App Store, and Google Play, and emphasizes considerations during download and account management. Users can download the app from official channels, including app store, official website APK download and official website jump, and complete registration, identity verification and security settings. In addition, the guide covers frequently asked questions and considerations, such as

Why is Bittensor said to be the 'bitcoin' in the AI ​​track? Why is Bittensor said to be the 'bitcoin' in the AI ​​track? Mar 04, 2025 pm 04:06 PM

Original title: Bittensor=AIBitcoin? Original author: S4mmyEth, Decentralized AI Research Original translation: zhouzhou, BlockBeats Editor's note: This article discusses Bittensor, a decentralized AI platform, hoping to break the monopoly of centralized AI companies through blockchain technology and promote an open and collaborative AI ecosystem. Bittensor adopts a subnet model that allows the emergence of different AI solutions and inspires innovation through TAO tokens. Although the AI ​​market is mature, Bittensor faces competitive risks and may be subject to other open source

Detailed tutorial on how to register for binance (2025 beginner's guide) Detailed tutorial on how to register for binance (2025 beginner's guide) Mar 18, 2025 pm 01:57 PM

This article provides a complete guide to Binance registration and security settings, covering pre-registration preparations (including equipment, email, mobile phone number and identity document preparation), and introduces two registration methods on the official website and APP, as well as different levels of identity verification (KYC) processes. In addition, the article also focuses on key security steps such as setting up a fund password, enabling two-factor verification (2FA, including Google Authenticator and SMS Verification), and setting up anti-phishing codes, helping users to register and use the Binance Binance platform for cryptocurrency transactions safely and conveniently. Please be sure to understand relevant laws and regulations and market risks before trading and invest with caution.

See all articles