Introducing Generative Multiplane Images: Making a 2D GAN 3D-Aware
Generative adversarial networks (GANs) have made significant advancements in generating new pictures that are similar to those in a given training dataset. Recently, there has been a focus on improving the quality and resolution of the generated pictures. While most developments have concentrated on generating outputs within the same space as the training dataset, the latest literature explores creative outputs that diverge from the available data.
In particular, researchers are interested in generating 3D geometry and textures for specific objects, such as faces, even when the dataset only consists of single-view photos. To achieve this, they use memory-intensive 3D-aware inductive biases, which are explicit or implicit 3D volumes. Training these 3D-aware GANs does not require using 3D geometry or multi-view pictures. Previous methods have combined 3D-aware inductive biases with rendering engines to learn the 3D geometry. However, improving the quality of the outputs remains challenging due to computational complexity and the need to modify the generator’s structure.
To address this challenge, researchers aim to make minimal modifications to an existing 2D GAN while maintaining a productive inference and training process. They start with the StyleGANv2 model, which has readily available training milestones. They create a new generator branch that produces fronto-parallel alpha maps, similar to multiplane images (MPIs). These MPIs serve as a scene representation for unconditional 3D-aware generative models. Researchers ensure view consistency by combining the alpha maps with the standard picture output of StyleGANv2 using an end-to-end differentiable multiplane style rendering.
The number of alpha maps can be dynamically adjusted, addressing memory concerns. While training the new alpha branch, the regular StyleGANv2 generator and discriminator are also adjusted. The output generated by this method is called a ‘generative multiplane image’ (GMPI). Only two adjustments are necessary to obtain alpha maps with an expected 3D structure: the alpha map prediction must be conditioned on depth or a learnable token, and the discriminator must be conditioned on camera poses.
In conclusion, this research introduces a 2D GAN that is 3D-aware by conditioning the alpha planes on depth and the discriminator on camera posture. It also explores an MPI-like 3D-aware generative model trained using standard single-view 2D picture datasets. The methods for encoding 3D-aware inductive biases are investigated using three high-resolution datasets: FFHQ, AFHQv2, and MetFaces.
To learn more about this research, read the paper and check out the available Pytorch implementation on GitHub. This article is a research summary written by Marktechpost Staff and all credit for the research goes to the researchers involved. Stay updated with the latest in machine learning by joining our ML Subreddit.
About the Author: Aneesh Tickoo is a consulting intern at MarktechPost, pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He focuses on projects that harness the power of machine learning, with a research interest in image processing and a passion for building solutions in this field. Feel free to connect with Aneesh for collaboration on interesting projects.
StoryBird.ai has just released some amazing features! Generate an illustrated story from a prompt. Check it out here.