Generative AI: Revolutionizing Visual Computing
In the field of computer graphics and 3D computer vision, researchers and groups have been working for years to create realistic models for generating computer-generated visuals or deducing physical characteristics from pictures. These models play a crucial role in various industries such as gaming, visual effects, virtual and augmented reality, robotics, and more. They enable rendering, simulation, geometry processing, and photogrammetry.
However, a new paradigm has emerged with the rise of generative artificial intelligence (AI). Generative AI systems can create and manipulate photorealistic and styled photos, movies, or 3D objects with just a written prompt or high-level human instruction as input. These systems automate time-consuming tasks that were previously only accessible to domain experts.
The Power of Generative AI
Generative AI is powered by foundation models for visual computing such as Stable Diffusion, Imagen, Midjourney, DALL-E 2, and DALL-E 3. These models have been trained on massive amounts of text-image pairings, giving them an unparalleled ability to generate diverse visual content. They are incredibly vast, with billions of learnable parameters.
These models form the basis for various generative AI tools and have been trained using powerful graphics processing units (GPUs) in the cloud. They have the potential to revolutionize the field by automating complex tasks and enabling the creation of realistic and high-quality visuals.
Challenges and Future Opportunities
While generative AI has made significant strides in 2D image generation, there are still challenges to be addressed. One challenge is adapting current models for use in higher-dimensional domains such as video and 3D scene creation. Another challenge is the availability of specific training data, as there is often a scarcity of high-quality and varied 3D objects and settings compared to 2D photos.
Computation is also a limitation, as current network architectures are often inefficient for training on large amounts of video data. This results in slow inference times for diffusion models. Addressing these challenges will require ongoing research and innovation from both the academic and industry communities.
The Rise of Diffusion Models
Despite the challenges, the number of diffusion models for visual computing has significantly increased in the past year. These models have the potential to transform the way we create and modify images, videos, and objects in 3D and 4D spaces.
This state-of-the-art report (STAR) aims to provide a comprehensive review of recent publications on diffusion models in visual computing, educate readers about the principles of diffusion models, and identify areas for further research.
To learn more, you can read the paper authored by researchers from multiple universities.