MultiDiffusion: Empowering Text-to-Image Models with Enhanced Control and Adaptability

AI News

MultiDiffusion: Empowering Text-to-Image Models with Enhanced Control and Adaptability

Jimmy W.

July 21, 2023

MultiDiffusion: Empowering Text-to-Image Models with Enhanced Control and Adaptability

Diffusion models are considered the best when it comes to text-to-image generative models. They have the ability to create high-quality and diverse pictures based on text prompts. However, giving users control over the generated material has been a challenge. This advancement has the potential to transform how digital content is created.

Currently, there are two techniques to regulate diffusion models: training a model from scratch or fine-tuning an existing model. Both methods require significant computation and a lengthy development period due to the increasing number of models and training data.

This study introduces MultiDiffusion, a unified framework that greatly enhances the adaptability of a pre-trained diffusion model for controlled picture production. It allows for flexible text-to-image production and provides many controls over the created content, such as aspect ratio and spatial guiding signals.

The goal of MultiDiffusion is to create a new generation process that combines multiple reference diffusion generation processes with a common set of characteristics or constraints. Each area of the resulting image is subjected to the reference diffusion model, which predicts a denoising sampling step. The MultiDiffusion then performs a global denoising sampling step to merge all the separate phases.

For example, if we want to create a picture with any aspect ratio using a reference diffusion model trained on square images, MultiDiffusion solves this challenge by fusing denoising directions from different crops to produce a seamless image.

Using MultiDiffusion, we can apply a pre-trained reference text-to-image model to various tasks such as generating pictures with specific resolution or aspect ratio or generating images from illegible region-based text prompts. Their methodology achieves state-of-the-art controlled generation quality even compared to specially trained approaches for these tasks.

The complete codebase of MultiDiffusion will be released on their Github page, and more demos can be found on their project page. It’s important to credit the researchers for their work.

Source link

LEAVE A REPLY Cancel reply