Table of Contents
Method Overview
Home Technology peripherals AI GPT-3 and Stable Diffusion work together to help the model understand Party A's needs for image retouching.

GPT-3 and Stable Diffusion work together to help the model understand Party A's needs for image retouching.

Apr 12, 2023 pm 02:55 PM
ai Model

After the popularity of the diffusion model, many people have focused on how to use more effective prompts to generate the images they want. In the continuous attempts of some AI painting models, people have even summarized the keyword experience for making AI draw pictures well:

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

That is to say , if you master the correct AI skills, the effect of improving the quality of drawing will be very obvious (see: "How to draw "Alpaca Playing Basketball"? Someone spent 13 US dollars to force DALL· E 2 Show your true skills》).

In addition, some researchers are working in another direction: how to change a painting to what we want with just a few words.

Some time ago, we reported on a study from Google Research and other institutions ​. Just say what you want an image to look like and it will basically do what you want, generating photorealistic images of, for example, a puppy sitting down:

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

The input description given to the model here is "a dog that sits down", but according to people's daily communication habits, the most natural description should be "let this dog sit down". Some researchers believe that this is a problem that should be optimized, and the model should be more in line with human language habits.

Recently, a research team from UC Berkeley proposed a new method for editing images based on human instructions: InstructPix2Pix: Given an input image and a text description that tells the model what to do, the model Ability to follow description instructions to edit images.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

##Paper address: https://arxiv.org/pdf/2211.09800.pdf

For example, to change the sunflowers in the painting to roses, you only need to say directly to the model "Change the sunflowers to roses":

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

To obtain training data, this study combines two large pre-trained models - a language model (GPT-3) and a text-to-image generation model (Stable Diffusion) - to generate a large pairwise training data set of image editing examples. . The researchers trained a new model, InstructPix2Pix, on this large dataset and generalized to real images and user-written instructions at inference time.

InstructPix2Pix is ​​a conditional diffusion model that generates an edited image given an input image and a text instruction to edit the image. The model performs image editing directly in the forward pass and does not require any additional example images, full descriptions of input/output images, or fine-tuning of each example, so the model can quickly edit images in just a few seconds. .

Although InstructPix2Pix is ​​trained entirely on synthetic examples (i.e., text descriptions generated by GPT-3 and images generated by Stable Diffusion), the model achieves the accuracy of training on arbitrary real images and Zero-shot generalization to human-written text. The mockup supports intuitive image editing, including replacing objects, changing image styles, and more.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

Method Overview

The researchers treated instruction-based image editing as a supervised learning problem: First, they generated a paired training dataset containing text editing instructions and images before and after editing ( Figure 2a-c), and then trained an image editing diffusion model on this generated dataset (Figure 2d). Although trained using generated images and editing instructions, the model is still able to edit real images using arbitrary instructions written by humans. Figure 2 below is an overview of the method.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

Generate a multi-modal training data set

In the data In the set generation stage, the researchers combined the capabilities of a large language model (GPT-3) and a text-to-image model (Stable Diffusion) to generate a multi-modal training data set containing text editing instructions and corresponding images before and after editing. This process consists of the following steps:

  • Fine-tune GPT-3 to generate a collection of text edits: Given a prompt describing an image, generate a text describing the changes to be made command and a prompt describing the changed image (Figure 2a);
  • Use the text-to-image model to convert two text prompts (i.e. before editing and after editing) into a corresponding pair image (Fig. 2b).

InstructPix2Pix

The researchers used the generated training data to train a conditional diffusion model. Based on the Stable Diffusion model, images can be edited based on written instructions.

Diffusion models learn to generate data samples through a series of denoising autoencoders that estimate the fraction of the data distribution (pointing in the direction of high-density data). Latent diffusion is improved by operating in the latent space of a pretrained variational autoencoder with encoder and decoder ## Efficiency and quality of diffusion models.

For an image x, the diffusion process adds noise to the encoded latent , which produces a noisy latent z_t, where the noise The level increases with time step t∈T. We learn a network that predicts the noise added to a noisy latent z_t given an image conditioning C_I and a text instruction conditioning C_T. The researchers minimized the following latent diffusion goal:

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

Previous studies (Wang et al.) have shown that for image translation (image translation ) task, especially when pairwise training data is limited, fine-tuning a large image diffusion model is better than training it from scratch. Therefore, in new research, the authors use pre-trained Stable Diffusion checkpoint to initialize the weights of the model, taking advantage of its powerful text-to-image generation capabilities.

To support image conditioning, the researchers added additional input channels to the first convolutional layer, connecting z_t and GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.. All available weights of the diffusion model are initialized from the pretrained checkpoint, while weights operating on newly added input channels are initialized to zero. The author here reuses the same text conditioning mechanism originally used for caption without taking the text editing instruction c_T as input.

Experimental results

In the following figures, the authors show the image editing results of their new model. These results are for a different set of real photos and artwork. The new model successfully performs many challenging edits, including replacing objects, changing seasons and weather, replacing backgrounds, modifying material properties, converting art media, and more.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

##The researchers compared the new method to recent Some technologies such as SDEdit, Text2Live, etc. are compared. The new model follows instructions for editing images, whereas other methods, including baseline methods, require descriptions of images or editing layers. Therefore, when comparing, the author provides "edited" text annotations for the latter instead of editing instructions. The authors also quantitatively compare the new method with SDEdit, using two metrics that measure image consistency and editing quality. Finally, the authors show how the size and quality of generated training data affects ablation results in model performance.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

GPT-3 and Stable Diffusion work together to help the model understand Party As needs for image retouching.

The above is the detailed content of GPT-3 and Stable Diffusion work together to help the model understand Party A's needs for image retouching.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Recommended reliable digital currency trading platforms. Top 10 digital currency exchanges in the world. 2025 Apr 28, 2025 pm 04:30 PM

Recommended reliable digital currency trading platforms: 1. OKX, 2. Binance, 3. Coinbase, 4. Kraken, 5. Huobi, 6. KuCoin, 7. Bitfinex, 8. Gemini, 9. Bitstamp, 10. Poloniex, these platforms are known for their security, user experience and diverse functions, suitable for users at different levels of digital currency transactions

Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Which of the top ten currency trading platforms in the world are the latest version of the top ten currency trading platforms Apr 28, 2025 pm 08:09 PM

The top ten cryptocurrency trading platforms in the world include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi Global, Bitfinex, Bittrex, KuCoin and Poloniex, all of which provide a variety of trading methods and powerful security measures.

What are the top currency trading platforms? The top 10 latest virtual currency exchanges What are the top currency trading platforms? The top 10 latest virtual currency exchanges Apr 28, 2025 pm 08:06 PM

Currently ranked among the top ten virtual currency exchanges: 1. Binance, 2. OKX, 3. Gate.io, 4. Coin library, 5. Siren, 6. Huobi Global Station, 7. Bybit, 8. Kucoin, 9. Bitcoin, 10. bit stamp.

What are the top ten virtual currency trading apps? The latest digital currency exchange rankings What are the top ten virtual currency trading apps? The latest digital currency exchange rankings Apr 28, 2025 pm 08:03 PM

The top ten digital currency exchanges such as Binance, OKX, gate.io have improved their systems, efficient diversified transactions and strict security measures.

Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Which of the top ten currency trading platforms in the world are among the top ten currency trading platforms in 2025 Apr 28, 2025 pm 08:12 PM

The top ten cryptocurrency exchanges in the world in 2025 include Binance, OKX, Gate.io, Coinbase, Kraken, Huobi, Bitfinex, KuCoin, Bittrex and Poloniex, all of which are known for their high trading volume and security.

How to measure thread performance in C? How to measure thread performance in C? Apr 28, 2025 pm 10:21 PM

Measuring thread performance in C can use the timing tools, performance analysis tools, and custom timers in the standard library. 1. Use the library to measure execution time. 2. Use gprof for performance analysis. The steps include adding the -pg option during compilation, running the program to generate a gmon.out file, and generating a performance report. 3. Use Valgrind's Callgrind module to perform more detailed analysis. The steps include running the program to generate the callgrind.out file and viewing the results using kcachegrind. 4. Custom timers can flexibly measure the execution time of a specific code segment. These methods help to fully understand thread performance and optimize code.

Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Decryption Gate.io Strategy Upgrade: How to Redefine Crypto Asset Management in MeMebox 2.0? Apr 28, 2025 pm 03:33 PM

MeMebox 2.0 redefines crypto asset management through innovative architecture and performance breakthroughs. 1) It solves three major pain points: asset silos, income decay and paradox of security and convenience. 2) Through intelligent asset hubs, dynamic risk management and return enhancement engines, cross-chain transfer speed, average yield rate and security incident response speed are improved. 3) Provide users with asset visualization, policy automation and governance integration, realizing user value reconstruction. 4) Through ecological collaboration and compliance innovation, the overall effectiveness of the platform has been enhanced. 5) In the future, smart contract insurance pools, forecast market integration and AI-driven asset allocation will be launched to continue to lead the development of the industry.

How much is Bitcoin worth How much is Bitcoin worth Apr 28, 2025 pm 07:42 PM

Bitcoin’s price ranges from $20,000 to $30,000. 1. Bitcoin’s price has fluctuated dramatically since 2009, reaching nearly $20,000 in 2017 and nearly $60,000 in 2021. 2. Prices are affected by factors such as market demand, supply, and macroeconomic environment. 3. Get real-time prices through exchanges, mobile apps and websites. 4. Bitcoin price is highly volatile, driven by market sentiment and external factors. 5. It has a certain relationship with traditional financial markets and is affected by global stock markets, the strength of the US dollar, etc. 6. The long-term trend is bullish, but risks need to be assessed with caution.

See all articles