Novel view synthesis from a single image involves inferring hidden parts of objects and scenes, while maintaining consistency with the original image. Traditional methods use neural radiance fields (NeRF) and local image features to project points onto the image plane and create a 3D representation. However, when faced with significant occlusion, this projection fails to capture the details, resulting in blurry renderings.
To address this problem, we introduce NerfDiff, an advanced approach that combines the power of a 3D-aware conditional diffusion model (CDM) with NeRF. At test time, NerfDiff synthesizes and refines virtual views based on the CDM’s knowledge and incorporates them into NeRF. This process greatly enhances the rendering quality and captures fine details.
Furthermore, we propose a NeRF-guided distillation algorithm that generates 3D-consistent virtual views from the CDM samples. By fine-tuning NeRF with these improved virtual views, we achieve superior results compared to existing NeRF-based and geometry-free approaches. Our method has been extensively evaluated on challenging datasets such as ShapeNet, ABO, and Clevr3D, demonstrating its effectiveness.
To visualize the comparison between our approach and the state-of-the-art (visionNeRF), refer to Figure 1.
For a better understanding of NerfDiff’s architecture, refer to Figure 2.
Lastly, Figure 3 illustrates the training and fine-tuning pipeline of NerfDiff.
With these advancements in novel view synthesis, we bring significant improvements to the field of artificial intelligence and computer vision. This groundbreaking technology has the potential to revolutionize industries that rely on realistic virtual imaging, such as virtual reality, gaming, and architectural design.