Recently, there have been exciting developments in text-to-image (T2I) diffusion models, which have sparked interest in various generative tasks. One particular challenge is the inversion of pre-trained T2I models to capture object appearances in reference images. However, there has been limited exploration of capturing object relations, which involves understanding interactions between objects and image composition. The existing inversion methods struggle with entity leakage from reference images, which poses privacy concerns.
Addressing this challenge is crucial. That’s why this study focuses on the Relation Inversion task, which aims to learn relationships in given exemplar images. The objective is to derive a relation prompt within the text embedding space of a pre-trained T2I diffusion model, where objects in each exemplar image follow a specific relation. By combining the relation prompt with user-defined text prompts, users can generate images corresponding to specific relationships while customizing objects, styles, backgrounds, and more.
To enhance the representation of high-level relation concepts, a preposition prior is introduced. This prior is based on the observation that prepositions are closely linked to relations, and complex relations can be expressed using a basic set of prepositions. Additionally, a novel framework called ReVersion is proposed to address the Relation Inversion problem. This framework incorporates a relation-steering contrastive learning scheme to guide the relation prompt towards a relation-dense region in the text embedding space. It also utilizes a relation-focal importance sampling strategy to emphasize object interactions. The researchers have also introduced the ReVersion Benchmark, which offers a range of exemplar images featuring diverse relations.
The study presents some outcomes of the ReVersion framework, demonstrating its effectiveness in relation inversion. Since this is a novel task, there are no other state-of-the-art approaches to compare with. If you’re interested in learning more about ReVersion, you can check out the paper and the project.