Big-scale text-to-image (T2I) diffusion models have made significant advancements in generating images based on given text prompts. These models have benefited from the availability of large amounts of training data and powerful computers. However, generating images that align with user expectations and modifying existing images can still be challenging.
Image editing requires different capabilities compared to image creation. GAN-based methods have been widely used for picture editing because they can manipulate the latent space efficiently. However, diffusion models offer more stability and produce higher quality output compared to GAN models.
In a recent research paper by Peking University and ARC Lab, Tencent PCG, the authors investigate whether diffusion models can also be used for editing images. The key requirement for this is a compact and editable latent space. Many diffusion-based image editing approaches leverage the similarity between intermediate text and image properties to achieve this. The study reveals a strong local resemblance between word and object features, which can be used for editing purposes.
While large-scale T2I diffusion generation shows a strong correlation between text characteristics and intermediate image features, there is also a robust correspondence between intermediate image features themselves. This correspondence has been explored in a method called DIFT, demonstrating a high degree of similarity between image elements. The research team uses this method for image modification.
To adapt the diffusion model’s intermediate representation for image editing, the researchers propose a strategy called DragonDiffusion. This strategy uses a classifier guidance-based approach to convert editing signals into gradients through feature correspondence loss. Two groups of features, namely guidance features and generation features, are employed at different stages. The robust image feature correspondence guides the revision and refinement of the generating features. This approach also helps to maintain content consistency between the modified image and the original.
The study also acknowledges another work called Drag-Diffusion, which explores a similar topic. Drag-Diffusion uses LORA to preserve the original appearance and enhances the editing process by optimizing an intermediate step in the diffusion procedure. In contrast, DragonDiffusion relies on classifier guidance and directly incorporates all editing and content consistency signals from the image, without requiring model fine-tuning or training.
DragonDiffusion extracts all content modification and preservation signals from the original image, allowing the direct transfer of T2I creation capabilities to picture editing applications. Extensive trials demonstrate that DragonDiffusion can successfully perform various fine-grained image-altering tasks, such as object resizing, repositioning, appearance changes, and content dragging.
To access the research paper and GitHub link for more details, check out the provided links. Don’t forget to join our ML SubReddit, Discord Channel, and Email Newsletter to stay updated with the latest AI research news and projects. If you have any questions or suggestions, feel free to email us.
Also, make sure to explore the AI Tools Club for hundreds of AI tools that can elevate your work.