Home AI News Generating 3D Meshes from Text: Democratizing Content Production and Overcoming Data Constraints

Generating 3D Meshes from Text: Democratizing Content Production and Overcoming Data Constraints

Generating 3D Meshes from Text: Democratizing Content Production and Overcoming Data Constraints

Mesh representations of 3D scenes play a crucial role in various applications like AR/VR and computer graphics. However, creating these assets is a laborious task that requires a lot of skill. To simplify the process, researchers have explored the use of generative models, specifically diffusion models, to generate high-quality images from text in the 2D domain. These techniques have made content production more accessible by reducing the barriers to creating customized images.

Now, there is a growing interest in using similar techniques to generate 3D models from text. However, current methods have limitations and need to be more versatile. One of the major challenges in creating 3D models is the limited availability of training data. Compared to 2D image synthesis, 3D datasets are much smaller. Researchers have addressed this issue by formulating 3D creation as an iterative optimization problem in the image domain, expanding the capabilities of 2D text-to-image models into the 3D realm.

Despite these advancements, generating 3D structures and textures for large-scale scenes remains a challenge. Ensuring that the output is dense and coherent across different viewpoints, and includes all the necessary features like walls, floors, and furniture, is difficult when dealing with enormous scenes. However, using a mesh representation is still preferred for many end-user activities, especially rendering on affordable technology.

To overcome these limitations, researchers from TU Munich and the University of Michigan propose a technique that extracts scene-scale 3D meshes from commercially available 2D text-to-image models. Their approach involves using inpainting and monocular depth perception to iteratively create a scene. They start by creating a depth estimate for the first mesh by generating an image from text and back projecting it into a 3D space. Then, the model is rendered from different angles repeatedly.

To create textured 3D meshes, the researchers use text-based prompts with 2D text-to-image models. They generate the scene iteratively, selecting viewpoints to cover a significant portion of the scene material and filling in any gaps adaptively. To ensure seamless integration of the generated content with the mesh, they align depth maps and remove areas with distorted textures.

The result is scene-scale 3D models that can represent various rooms with appealing materials and uniform geometry. The key contributions of their research are a technique that combines 2D text-to-image models and monocular depth estimation for iterative scene creation, a method to create 3D meshes of room-scale interior scenes with beautiful textures and geometry from any text input, and a customized perspective selection process to create watertight meshes.

For more information on this research, you can check out the paper, project, and Github page. Credit goes to the researchers involved in this project. Don’t forget to join our ML SubReddit, Discord Channel, and Email Newsletter for the latest AI research news and interesting projects.

About the author: Aneesh Tickoo is a consulting intern at MarktechPost and is currently pursuing an undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He is passionate about image processing and enjoys working on machine learning projects. Connect with him to collaborate on exciting projects.

In addition, StoryBird.ai has recently introduced some amazing features that allow you to generate illustrated stories from prompts. Check it out!

(Note: This article has been optimized for SEO and uses relevant keywords related to AI.)

Source link


Please enter your comment!
Please enter your name here