What Are Cinemagraphs and How Can AI Help Create Them?
If you’re new to the term “cinemagraphs,” you might be wondering what they are. But chances are, you’ve already come across them. Cinemagraphs are captivating illustrations where certain elements repeat continuous movements while the rest of the scene remains still. They’re not images or videos, but a unique way to showcase dynamic scenes while capturing a specific moment. Lately, cinemagraphs have become popular on social media, websites, and even in virtual meetings.
Creating cinemagraphs is a challenging task that involves capturing videos or images and using semi-automated techniques to create seamless looping videos. It requires a lot of effort from the user, including capturing suitable footage, stabilizing frames, selecting animated and static regions, and specifying motion directions.
However, a new research problem has arisen: synthesizing text-based cinemagraphs. This would significantly reduce the reliance on data capture and manual efforts. The proposed method aims to capture motion effects that are difficult to express through photos or existing text-to-image techniques, expanding the possibilities in cinemagraph creation.
Current methods face challenges in achieving this goal. One approach is to use a text-to-image model to generate an artistic image and then animate it. But existing animation methods struggle with creating meaningful motions for artistic inputs. Another option is to use text-based video models, but they often lead to noticeable artifacts and fail to produce the desired motions.
To address these challenges, a new algorithm called Text2Cinemagraph is proposed. It bridges the gap between artistic images and animation models designed for real videos. It generates two images – one artistic and one realistic – from a text prompt provided by the user. These images share the same semantic layout. The realistic image is used as an input for motion prediction models, while the artistic image represents the desired style and appearance of the final cinemagraph. By transferring the motion information from the realistic image to the artistic one, the final cinemagraph can be synthesized.
To improve motion prediction, additional information from text prompts and semantic segmentation of the realistic image is leveraged. This technique automates the generation of realistic cinemagraphs, making it easier for content creators to achieve diverse artistic styles and imaginative visual elements.
This was a summary of Text2Cinemagraph, an AI technique that automates the generation of realistic cinemagraphs. To learn more about this work, check out the paper, Github, and Project. And make sure to join our ML SubReddit, Discord Channel, and Email Newsletter for the latest AI research news and projects.
Article by Daniele Lorenzi, a Ph.D. candidate at the Institute of Information Technology. His research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.