Generative AI is gaining popularity in the computer vision community due to recent advancements in text-driven image and video synthesis. These advancements, such as Text-to-Image (T2I) and Text-to-Video (T2V), have shown impressive quality and potential in image and video synthesis, editing, and animation. However, there is still room for improvement, especially in human-centric applications like human dance synthesis.
In the past, researchers have attempted to transfer dance movements from one video to another using Generative Adversarial Networks (GANs). This often required fine-tuning on the target person, which limited the control and accuracy of the synthesized content. More recently, some researchers have used pre-trained diffusion-based models to generate dance images and videos based on text prompts. However, this coarse-grained conditioning approach makes it difficult for users to specify the desired human appearance and dance moves precisely.
To address these limitations, a new approach called DISCO has been proposed for human dance generation in real-world scenarios. DISCO incorporates a novel model architecture with disentangled control to improve faithfulness and compositionality. It also utilizes a pre-training strategy called human attribute pre-training to enhance generalizability, enabling the model to handle unseen human attributes and generate high-quality dance content.
DISCO aims to generate human dance images and videos that are faithful to the reference images, while accurately following the provided pose. It also strives to be generalizable, meaning it can handle unseen human subjects, backgrounds, and poses without the need for fine-tuning. Additionally, DISCO allows for flexible composition of human subjects, backgrounds, and poses sourced from different images and videos.
Overall, DISCO presents a comprehensive solution to the challenges of human dance generation in real-world scenarios. Its sophisticated model architecture and innovative pre-training strategy make it a promising AI technique for generating human dance content.
If you’re interested in learning more about DISCO and its applications, you can find further information in the provided links. Join the ML SubReddit, Discord Channel, and Email Newsletter to stay updated on the latest AI research news and projects. Don’t hesitate to reach out if you have any questions or feedback.
About the author:
Daniele Lorenzi is a Ph.D. candidate with research interests in adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation. He holds an M.Sc. in ICT for Internet and Multimedia Engineering from the University of Padua, Italy. Currently, he is working at the Christian Doppler Laboratory ATHENA in the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt.
Check out the Paper, Project, and GitHub link for more details on DISCO. And don’t forget to explore the 800+ AI Tools in AI Tools Club. StoryBird.ai also offers amazing features, allowing you to generate illustrated stories from prompts. Give it a try!