With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.-AI-php.cn

Table of Contents

Home

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 12, 2023 pm 03:46 PM

ai Model

Recently, text-to-image models have become a popular research direction. Whether it is a large natural landscape or a novel scene image, it may be automatically generated using simple text descriptions.

Among them, rendering wildly imagined scenes is a challenging task that requires compositing instances of specific themes (objects, animals, etc.) in new scenes so that they appear natural. Seamlessly blend into the scene.

Some large-scale text-to-image models achieve high-quality and diverse image synthesis based on text prompts written in natural language. The main advantage of these models is the strong semantic priors learned from a large number of image-text description pairs, such as associating the word "dog" with various instances of dogs that can appear in different poses in the image.

While the synthesis capabilities of these models are unprecedented, they lack the ability to imitate a given reference subject and synthesize new images with the same subject but different instances in different scenes. It can be seen that the expression ability of the output domain of existing models is limited.

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

In order to solve this problem, researchers from Google and Boston University proposed a "personalized" text-to-image diffusion model DreamBooth. Ability to adapt to user-specific image generation needs.

Paper address: https://arxiv.org/pdf/2208.12242.pdf

Project Address: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion

The goal of this research is to extend the language of the model - the visual dictionary, so that it can incorporate new vocabulary Bind to the specific theme the user wants to generate. Once the new dictionary is embedded into the model, it can use these words to synthesize novel and realistic images of specific topics while contextualizing them in different scenes, preserving key identifying features, as shown in Figure 1 below.

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Specifically, the study implants images of a given subject into the model’s output domain so that they can be synthesized using a unique identifier . To this end, the study proposes a method to represent a given topic with a rare token identifier and fine-tunes a pre-trained, diffusion-based text-to-image framework that operates in two steps; generating low-resolution from text images, and then apply a super-resolution (SR) diffusion model.

This study first fine-tuned a low-resolution text-to-image model using input images and text hints containing unique identifiers (with subject class names, such as "A [V] dog") . To prevent the model from overfitting class names to specific instances and semantic drift, this study proposes a self-generated, class-specific prior preservation loss, which exploits the prior semantics of classes embedded in the model to encourage the model Generate different instances of the same class under a given topic.

In the second step, the study fine-tunes the super-resolution component using low-resolution and high-resolution versions of the input image. This allows the model to maintain high fidelity to small but important details in the subject of the scene.

Let’s take a look at the specific methods proposed in this study.

Method Introduction

Given 3-5 captured images without text descriptions, this paper aims to generate images with high detail fidelity and prompts by text New images to guide change. The study does not impose any restrictions on input images, and subject images can have different contexts. The method is shown in Figure 3. The output image can modify the original image, such as the position of the subject, change the properties of the subject such as color, shape, and modify the subject's posture, expression, material, and other semantic modifications.

More specifically, this method takes as input some images (usually 3 - 5 images) of a subject (for example, a specific dog) and the corresponding class name (for example, the dog category), and Returns a fine-tuned/personalized text-to-image model that encodes a unique identifier referencing the subject. Then, during reasoning, unique identifiers can be embedded in different sentences to synthesize topics in different contexts.

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

The first task of the research is to implant topic instances into the output domain of the model and bind the topics to unique identifiers. This study proposes methods for designing identifiers, in addition to designing a new method for supervising the model fine-tuning process.

In order to solve the problem of image overfitting and language drift, this study also proposes a loss (Prior-Preservation Loss), which encourages the diffusion model to continuously generate the same class as the subject. Different instances, thereby alleviating problems such as model overfitting and language drift.

In order to preserve image details, the study found that the super-resolution (SR) component of the model should be fine-tuned. This article is completed on the basis of the pre-trained Imagen model. The specific process is shown in Figure 4. Given 3-5 images of the same subject, the text-to-image diffusion model is then fine-tuned in two steps:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Rare token identifier represents the topic

This study marks all input images of the topic as "a [identifier] [class noun]", where [ identifier] is a unique identifier linked to the topic, while [class noun] is a rough class descriptor of the topic (e.g. cat, dog, watch, etc.). This study specifically uses class descriptors in sentences in order to associate class priors with topics.

Effect display

The following is a stable diffusion implementation of Dreambooth (refer to the project link). Qualitative results: The training images come from the "Textual Inversion" library:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

After the training is completed, at the prompt of "photo of a sks container", the model is generated The container photo is as follows:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Add a location "photo of a sks container on the beach" in the prompt, and the container will appear on the beach;

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

The green container is too simple in color. If you want to add some red, enter the prompt "photo of a red sks container" to get it done:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

Enter the prompt "a dog on top of sks container" to make the puppy sit in the box:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

The following are some results presented in the paper. Generate artistic pictures about dogs in different artist styles:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

This research can also synthesize various expressions that do not appear in the input image, demonstrating the extrapolation ability of the model:

With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.

For more details, please refer to the original paper.

The above is the detailed content of With just 3 samples and a sentence, AI can customize photo-realistic images. Google is playing with a very new diffusion model.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

How to fix KB5055612 fails to install in Windows 10?

4 weeks ago By DDD

Roblox: Bubble Gum Simulator Infinity - How To Get And Use Royal Keys

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Roblox: Grow A Garden - Complete Mutation Guide

3 weeks ago By DDD

Nordhold: Fusion System, Explained

4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Mandragora: Whispers Of The Witch Tree - How To Unlock The Grappling Hook

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Java Tutorial

1673

CakePHP Tutorial

1429

Laravel Tutorial

1333

PHP Tutorial

1278

C# Tutorial

1257

Related knowledge

AI and Composer: Enhancing Code Quality and Development May 09, 2025 am 12:20 AM

In Composer, AI mainly improves development efficiency and code quality through dependency recommendation, dependency conflict resolution and code quality improvement. 1. AI can recommend appropriate dependency packages according to project needs. 2. AI provides intelligent solutions to deal with dependency conflicts. 3. AI reviews code and provides optimization suggestions to improve code quality. Through these functions, developers can focus more on the implementation of business logic.

Strategy for making money with zero foundation: 5 types of altcoins that must be stocked in 2025, make sure to make 50 times more profitable! May 08, 2025 pm 08:30 PM

In cryptocurrency markets, altcoins are often seen by investors as potentially high-return assets. Although there are many altcoins on the market, not all altcoins can bring the expected benefits. This article will provide a detailed guide for investors with zero foundation, introducing the 5 altcoins worth hoarding in 2025, and explaining how to achieve the goal of making a 50x steady profit through these investments.

Top 10 cryptocurrency exchanges in the currency circle, the latest ranking of the top 10 digital currency trading platforms in 2025 May 08, 2025 pm 10:45 PM

Ranking of the top ten cryptocurrency exchanges in the currency circle: 1. Binance: Leading the world, providing efficient trading and a variety of financial products. 2. OKX: It is innovative and diverse, supporting a variety of transaction types. 3. Huobi: Stable and reliable, with high-quality service. 4. Coinbase: Be friendly for beginners and simple interface. 5. Kraken: The first choice for professional traders, with powerful tools. 6. Bitfinex: efficient trading, rich trading pairs. 7. Bittrex: Safety compliance, regulatory cooperation. 8. Poloniex and so on.

Web3 AI Crypto Presale provides hedge fund trading tools for all users May 08, 2025 pm 08:24 PM

In a market that is often driven by substantive stories, real features may be missed. PiCoin is gaining momentum through its community’s support and increased institutional interest ahead of the 2025 consensus. In a market that is often driven by substantive stories, it’s easy to miss out on real features. PiCoin (PI) gained momentum before consensus was reached in 2025, while Cardano (ADA) faced new competitors when it was moving faster, and another project is offering something different. While Web3AI cryptocurrency is still in pre-sale state, it is not a concern that it gets by chasing trends, but by giving users access to the same type of tools used by quantum hedge funds

Shocking release! The latest authoritative rankings of the top ten exchange apps in the 2025 currency circle May 08, 2025 pm 08:03 PM

The following is the authoritative comprehensive ranking of the global digital currency exchange app in 2025, which is compiled based on multi-dimensional data such as transaction volume, security, compliance and user experience to help you accurately grasp market trends:

Top 10 virtual currency exchanges in the currency circle App Latest ranking of the top 10 digital currency exchanges in the currency circle in 2025 May 12, 2025 pm 06:00 PM

Top 10 virtual currency exchange apps in the currency circle: 1. Binance, 2. OKX, 3. Huobi, 4. Coinbase, 5. Kraken, 6. Bitfinex, 7. Bybit, 8. KuCoin, 9. Gemini, 10. Bitstamp, these platforms are popular for their transaction volume, security and user experience.

How to set, get and delete WordPress cookies (like a professional) May 12, 2025 pm 08:57 PM

Do you want to know how to use cookies on your WordPress website? Cookies are useful tools for storing temporary information in users’ browsers. You can use this information to enhance the user experience through personalization and behavioral targeting. In this ultimate guide, we will show you how to set, get, and delete WordPresscookies like a professional. Note: This is an advanced tutorial. It requires you to be proficient in HTML, CSS, WordPress websites and PHP. What are cookies? Cookies are created and stored when users visit websites.

2025 Huobi APKV10.50.0 Download Guide How to Download May 12, 2025 pm 08:48 PM

Huobi APKV10.50.0 download guide: 1. Click the direct link in the article; 2. Select the correct download package; 3. Fill in the registration information; 4. Start the Huobi trading process.

See all articles