Introducing InstaFlow: A One-Step AI Model for Text-to-Image Generation
Text-to-image generation has undergone a revolutionary transformation with the advent of diffusion models. These models offer exceptional quality and creativity. However, they often require multiple steps to achieve desired outcomes, resulting in sluggishness. To address this issue, researchers have developed InstaFlow, a one-step generative model derived from the Stable Diffusion (SD) model.
The Challenge and Solution
During the distillation process of the SD model, researchers faced a significant challenge – suboptimal coupling of noise and images. This issue hindered the distillation process. To overcome this challenge, researchers turned to Rectified Flow, an innovative generative model that incorporates probabilistic flows. Rectified Flow utilizes a technique called reflow, which straightens the trajectory of probability flows, thereby reducing the transport cost between the noise distribution and the image distribution. This improvement in coupling greatly facilitates the distillation process and solves the initial problem.
The above image demonstrates how InstaFlow works.
InstaFlow has shown remarkable performance in various datasets. On the MS COCO 2017-5k dataset, it achieved an FID score of 23.3, representing a significant improvement over the previous state-of-the-art technique called progressive distillation (37.2 → 23.3 in FID). By leveraging an expanded network with 1.7 billion parameters, the researchers further enhanced the FID score to 22.4.
In the MS COCO 2014-30k dataset, InstaFlow outperformed the recent StyleGAN-T model with an FID of 13.1 in just 0.09 seconds, making it the best performer in the ≤ 0.1-second category. It is also worth mentioning that InstaFlow achieved this performance with a relatively low computational cost of only 199 A100 GPU days.
The success of InstaFlow has led researchers to propose several potential contributions:
- Improving One-Step SD: By scaling up the dataset, model size, and training duration, researchers believe the performance of one-step SD can be significantly improved.
- One-Step ControlNet: Applying the InstaFlow pipeline to train ControlNet models can enable the generation of controllable contents within milliseconds.
- Personalization for One-Step Models: Fine-tuning SD with the training objective of diffusion models and LORA allows users to customize the pre-trained SD model to generate specific content and styles.
- Neural Network Structure for One-Step Generation: Exploring alternative one-step structures, such as successful architectures used in GANs, and leveraging techniques like pruning and quantization can potentially enhance the quality and efficiency of one-step generation.
If you like our work, you will love our newsletter. Subscribe now to stay updated with the latest AI research news and more.