Home AI News Make-it-3D: Leveraging Implicit 3D Knowledge in a 2D Diffusion Model for 3D Object Reconstruction

Make-it-3D: Leveraging Implicit 3D Knowledge in a 2D Diffusion Model for 3D Object Reconstruction

Make-it-3D: Leveraging Implicit 3D Knowledge in a 2D Diffusion Model for 3D Object Reconstruction

Imagination is a powerful tool that humans possess. We can look at an image and visualize how the object would appear from a different angle. However, this is a challenging task for computer vision and deep learning models. Generating 3D objects from a single image is complex because there is limited information available from just one viewpoint. Different approaches have been proposed, but they have limitations in reconstructing detailed geometry and rendering large views.

One technique involves using pre-trained 3D-aware generative networks to project the input image into a latent space. However, these networks are often limited to specific object classes and cannot handle general 3D objects. Additionally, creating a diverse dataset for estimating novel views or a powerful 3D foundation model for general objects is currently a difficult challenge.

Recent advances in diffusion models have shown promise in 2D image synthesis. These models can generate images from different viewpoints, indicating that they already have some understanding of 3D knowledge. Building on this observation, the paper presented in this article explores the possibility of using a diffusion model to reconstruct 3D objects.

The proposed approach, called Make-It-3D, is a two-stage process that utilizes a diffusion prior to generate high-quality 3D content from a single image. In the first stage, the diffusion prior improves the neural radiance field (NeRF) using a method called score distillation sampling (SDS). Reference-view supervision is also used to optimize the model. Unlike previous approaches that focus on textual descriptions, Make-It-3D prioritizes the fidelity of the 3D model to the reference image.

To overcome the issue of the generated models not aligning faithfully with reference images, the model is asked to maximize the similarity between the reference and the new view rendering. The depth of the reference image is used as additional geometry prior to help shape optimization. The initial 3D model generation stage produces a rough model with reasonable geometry, but the appearance may lack the quality of the reference image. The second stage focuses on texture enhancement while maintaining the geometry from the first stage. The final refinement involves using ground-truth textures for regions visible in the reference image.

The results of Make-It-3D are compared with other state-of-the-art techniques, and some samples are shown. This AI framework shows promising results for generating high-fidelity 3D objects from a single image. If you want to learn more about this work, you can find a link to the paper and the project page. Don’t forget to check out our AI Tools Club for more AI resources.

Source link


Please enter your comment!
Please enter your name here