Table of Contents
Method introduction
Experiment
Conclusion
Home Technology peripherals AI Without training, this new method achieves freedom in generating image sizes and resolutions.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Apr 08, 2024 pm 04:52 PM
ai train

Recently, diffusion models have surpassed GAN and autoregressive models and become the mainstream choice for generative models due to their excellent performance. Diffusion model-based text-to-image generation models such as SD, SDXL, Midjourney, and Imagen have demonstrated an amazing ability to generate high-quality images. Typically, these models are trained at a specific resolution to ensure efficient processing and accurate model training on existing hardware.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 1: Comparison of different methods used to generate 2048×2048 images under SDXL 1.0. [1]

In these diffusion models, pattern duplication and severe artifacts often occur. For example, it is shown on the far left side of Figure 1. These problems are particularly acute beyond the training resolution.

In a paper, researchers from the Chinese University of Hong Kong SenseTime Joint Laboratory and other institutions conducted an in-depth study of the convolutional layer of the UNet structure commonly used in diffusion models, and analyzed the frequency FouriScale is proposed from the perspective of domain analysis, as shown in Figure 2.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 2 Schematic diagram of FouriScale’s process (orange line) to ensure consistency across resolutions.

By introducing dilated convolution operations and low-pass filtering operations to replace the original convolutional layers in the pre-trained diffusion model, the structure and scale consistency at different resolutions can be achieved. Combined with the "fill then crop" strategy, this method can flexibly generate images that meet different sizes and aspect ratios. Furthermore, with FouriScale as a guide, this method is able to guarantee complete image structure and excellent image quality when generating high-resolution images of any size. FouriScale does not require any offline prediction calculations and has good compatibility and scalability.

Quantitative and qualitative experimental results demonstrate that FouriScale achieves significant improvements in generating high-resolution images using pre-trained diffusion models.

Without training, this new method achieves freedom in generating image sizes and resolutions.


  • Paper address: https://arxiv.org/abs/2403.12963
  • Open source code: https://github.com/LeonHLJ/FouriScale
  • Paper title: FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Method introduction

1. Atrous convolution ensures structural consistency across resolutions

The denoising network of the diffusion model is usually at a specific resolution. Trained on images or latent space, this network usually adopts U-Net structure. The authors aim to use the parameters of the denoising network during the inference stage to generate higher resolution images without the need for retraining. To avoid structural distortion at inference resolution, the authors try to establish structural consistency between default and high resolutions. For the convolutional layer in U-Net, the structural consistency can be expressed as:

Without training, this new method achieves freedom in generating image sizes and resolutions.

where k is the original convolution kernel and k' is New convolution kernel customized for larger resolutions. According to the frequency domain representation of spatial downsampling, it is as follows:

Without training, this new method achieves freedom in generating image sizes and resolutions.

Formula (3) can be written as:

Without training, this new method achieves freedom in generating image sizes and resolutions.

This formula shows that the Fourier spectrum of the ideal convolution kernel k' should be spliced ​​by the Fourier spectrum of s×s convolution kernels k. In other words, the Fourier spectrum of k' should have periodic repetition, and this repeating pattern is the Fourier spectrum of k.

The widely used dilated convolution just meets this requirement. The frequency domain periodicity of atrous convolution can be expressed by the following formula:

Without training, this new method achieves freedom in generating image sizes and resolutions.

When using a pre-trained diffusion model (training resolution is (h, w)) to generate a high-resolution image of (H, W), the parameters of the atrous convolution Using the original convolution kernel, the expansion factor is (H/h, W/w), which is the ideal convolution kernel k'.

2. Low-pass filtering ensures scale consistency across resolutions

#However, only using hole volumes Product cannot solve the problem perfectly. As shown in the upper left corner of Figure 3, only using atrous convolution still has the phenomenon of pattern repetition in details. The author believes that this is because the frequency aliasing phenomenon of spatial downsampling changes the frequency domain components, resulting in differences in frequency domain distribution at different resolutions. In order to ensure scale consistency across resolutions, they introduced low-pass filtering to filter out high-frequency components to remove the frequency aliasing problem after spatial downsampling. As can be seen from the comparison curve on the right side of Figure 3, after using low-pass filtering, the frequency distribution at high and low resolutions is closer, thus ensuring consistent scale. As can be seen from the lower left corner of Figure 3, after using low-pass filtering, the pattern repetition phenomenon of details has been significantly improved.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 3 (a) Visual comparison with or without low-pass filtering. (b) Fourier relative logarithmic amplitude curve without low-pass filtering. (c) Fourier relative logarithmic amplitude curve with low-pass filtering.

3. Suitable for image generation of any size

The above method can only In order to adapt FouriScale to image generation of any size when the aspect ratio of the generated resolution is consistent with the default inference resolution, the author adopts a "fill and then crop" method. Method 1 shows the combination of this strategy Pseudocode of FouriScale.

Without training, this new method achieves freedom in generating image sizes and resolutions.

4. FouriScale guide

Due to The frequency domain operation in FouriScale inevitably causes loss of detail and undesirable artifacts in the generated images. In order to solve this problem, as shown in Figure 4, the author proposed FouriScale as a guidance method. Specifically, based on the original conditional generation estimation and unconditional generation estimation, they introduced an additional conditional generation estimation. The generation process of this additional conditional generation estimate also uses atrous convolution, but uses a gentler low-pass filtering to ensure that details are not lost. At the same time, they will use the attention score in the conditional generation estimate output by FouriScale to replace the attention score in this additional conditional generation estimate. Since the attention score contains the structural information in the generated image, this operation will correctly The image structure information is introduced while ensuring the image quality.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 4 (a) FouriScale boot diagram. (b) The generated image without using FouriScale as a guide has obvious artifacts and detail errors. (c) Generated image using FouriScale as guide.

Experiment

1. Quantitative test results

The author followed the method of [1] and tested three Vincentian graph models (including SD 1.5, SD 2.1 and SDXL 1.0) to generate four higher resolution images. The tested resolutions were 4x, 6.25x, 8x, and 16x the number of pixels of their respective training resolutions. The results of randomly sampling 30000/10000 image and text pairs on Laion-5B are shown in Table 1:

Without training, this new method achieves freedom in generating image sizes and resolutions.

Table 1 Different training is not required Comparison of quantitative results of methods

Their method achieved optimal results in each pre-trained model and at different resolutions.

2. Qualitative test results

As shown in Figure 5, their method In each pre-trained model, image generation quality and consistent structure can be guaranteed at different resolutions.

Without training, this new method achieves freedom in generating image sizes and resolutions.

Figure 5 Comparison of generated images by different training-free methods

Conclusion

This paper proposes FouriScale to enhance the ability of pre-trained diffusion models to generate high-resolution images. FouriScale is analyzed from the frequency domain and improves the structure and scale consistency at different resolutions through atrous convolution and low-pass filtering operations, solving key challenges such as repeated patterns and structural distortion. Adopting a "fill then crop" strategy and using FouriScale as a guide enhances the flexibility and quality of text-to-image generation while adapting to different aspect ratio generation. Quantitative and qualitative experimental comparisons show that FouriScale can ensure higher image generation quality under different pre-trained models and different resolutions.

The above is the detailed content of Without training, this new method achieves freedom in generating image sizes and resolutions.. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? WorldCoin (WLD) price forecast 2025-2031: Will WLD reach USD 4 by 2031? Apr 21, 2025 pm 02:42 PM

WorldCoin (WLD) stands out in the cryptocurrency market with its unique biometric verification and privacy protection mechanisms, attracting the attention of many investors. WLD has performed outstandingly among altcoins with its innovative technologies, especially in combination with OpenAI artificial intelligence technology. But how will the digital assets behave in the next few years? Let's predict the future price of WLD together. The 2025 WLD price forecast is expected to achieve significant growth in WLD in 2025. Market analysis shows that the average WLD price may reach $1.31, with a maximum of $1.36. However, in a bear market, the price may fall to around $0.55. This growth expectation is mainly due to WorldCoin2.

What does cross-chain transaction mean? What are the cross-chain transactions? What does cross-chain transaction mean? What are the cross-chain transactions? Apr 21, 2025 pm 11:39 PM

Exchanges that support cross-chain transactions: 1. Binance, 2. Uniswap, 3. SushiSwap, 4. Curve Finance, 5. Thorchain, 6. 1inch Exchange, 7. DLN Trade, these platforms support multi-chain asset transactions through various technologies.

'Black Monday Sell' is a tough day for the cryptocurrency industry 'Black Monday Sell' is a tough day for the cryptocurrency industry Apr 21, 2025 pm 02:48 PM

The plunge in the cryptocurrency market has caused panic among investors, and Dogecoin (Doge) has become one of the hardest hit areas. Its price fell sharply, and the total value lock-in of decentralized finance (DeFi) (TVL) also saw a significant decline. The selling wave of "Black Monday" swept the cryptocurrency market, and Dogecoin was the first to be hit. Its DeFiTVL fell to 2023 levels, and the currency price fell 23.78% in the past month. Dogecoin's DeFiTVL fell to a low of $2.72 million, mainly due to a 26.37% decline in the SOSO value index. Other major DeFi platforms, such as the boring Dao and Thorchain, TVL also dropped by 24.04% and 20, respectively.

How to win KERNEL airdrop rewards on Binance Full process strategy How to win KERNEL airdrop rewards on Binance Full process strategy Apr 21, 2025 pm 01:03 PM

In the bustling world of cryptocurrencies, new opportunities always emerge. At present, KernelDAO (KERNEL) airdrop activity is attracting much attention and attracting the attention of many investors. So, what is the origin of this project? What benefits can BNB Holder get from it? Don't worry, the following will reveal it one by one for you.

Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Aavenomics is a recommendation to modify the AAVE protocol token and introduce token repurchase, which has reached the quorum number of people. Apr 21, 2025 pm 06:24 PM

Aavenomics is a proposal to modify the AAVE protocol token and introduce token repos, which has implemented a quorum for AAVEDAO. Marc Zeller, founder of the AAVE Project Chain (ACI), announced this on X, noting that it marks a new era for the agreement. Marc Zeller, founder of the AAVE Chain Initiative (ACI), announced on X that the Aavenomics proposal includes modifying the AAVE protocol token and introducing token repos, has achieved a quorum for AAVEDAO. According to Zeller, this marks a new era for the agreement. AaveDao members voted overwhelmingly to support the proposal, which was 100 per week on Wednesday

What are the hybrid blockchain trading platforms? What are the hybrid blockchain trading platforms? Apr 21, 2025 pm 11:36 PM

Suggestions for choosing a cryptocurrency exchange: 1. For liquidity requirements, priority is Binance, Gate.io or OKX, because of its order depth and strong volatility resistance. 2. Compliance and security, Coinbase, Kraken and Gemini have strict regulatory endorsement. 3. Innovative functions, KuCoin's soft staking and Bybit's derivative design are suitable for advanced users.

Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Why is the rise or fall of virtual currency prices? Apr 21, 2025 am 08:57 AM

Factors of rising virtual currency prices include: 1. Increased market demand, 2. Decreased supply, 3. Stimulated positive news, 4. Optimistic market sentiment, 5. Macroeconomic environment; Decline factors include: 1. Decreased market demand, 2. Increased supply, 3. Strike of negative news, 4. Pessimistic market sentiment, 5. Macroeconomic environment.

Ranking of leveraged exchanges in the currency circle The latest recommendations of the top ten leveraged exchanges in the currency circle Ranking of leveraged exchanges in the currency circle The latest recommendations of the top ten leveraged exchanges in the currency circle Apr 21, 2025 pm 11:24 PM

The platforms that have outstanding performance in leveraged trading, security and user experience in 2025 are: 1. OKX, suitable for high-frequency traders, providing up to 100 times leverage; 2. Binance, suitable for multi-currency traders around the world, providing 125 times high leverage; 3. Gate.io, suitable for professional derivatives players, providing 100 times leverage; 4. Bitget, suitable for novices and social traders, providing up to 100 times leverage; 5. Kraken, suitable for steady investors, providing 5 times leverage; 6. Bybit, suitable for altcoin explorers, providing 20 times leverage; 7. KuCoin, suitable for low-cost traders, providing 10 times leverage; 8. Bitfinex, suitable for senior play

See all articles