Introducing AI Inpainting: Manipulating Images in 3D
The manipulation of images has always been a fascinating field with numerous applications in content creation. One popular area of study is object removal and insertion, which is also known as image inpainting. In the past, inpainting models were mainly focused on 2D images. However, researchers have been working on advancing these models to manipulate complete 3D scenes.
Neural Radiance Fields (NeRFs) have played a significant role in transforming real 2D photos into lifelike 3D representations. As technology improves, these 3D representations are becoming more accessible and may soon become the norm. This research aims to enable similar manipulations of 3D NeRFs, particularly in the area of inpainting.
Inpainting 3D objects comes with its own set of challenges. Firstly, there is a lack of 3D data available, making it difficult to work with. Additionally, the use of NeRFs as a scene representation adds complexity. Modifying the underlying data structure directly based on geometry is impractical because of the neural representations’ implicit nature. There are also challenges with maintaining consistency across multiple views.
Different approaches have been attempted to address these challenges. Some methods, like NeRF-In and SPIn-NeRF, aim to resolve inconsistencies after the fact. However, they struggle when the inpainted views have significant perceptual differences or involve complex appearances. Single-reference inpainting methods have also been explored, but they come with their own challenges, such as reduced visual quality in non-reference views and issues with disocclusions.
To overcome these limitations, a new approach has been developed. The system takes N images from different perspectives, along with their corresponding camera transformation matrices and masks that define unwanted regions. It also requires an inpainted reference view that provides information about the desired edits. This reference view can be something as simple as a text description of the object to replace the mask.
The authors of this approach focus on view-dependent effects (VDEs) to account for changes in the scene from different viewpoints. They add VDEs to the masked area from non-reference viewpoints by adjusting reference colors to match the surrounding context in other views. They also use monocular depth estimators to guide the geometry of the inpainted region based on the depth of the reference image. To supervise unoccluded pixels that aren’t visible in the reference, they use additional inpaintings.
To see a visual comparison of the proposed method with the state-of-the-art SPIn-NeRF-Lama, you can visit the project page linked below.
This AI framework offers a novel approach to reference-guided controllable inpainting of neural radiance fields. If you’re interested in learning more, please refer to the links provided. Don’t forget to check out their newsletter, ML SubReddit, and Facebook Community for the latest AI research news and projects.