Recent advancements in text-conditional 3D object generation have shown promising outcomes. However, the existing methods are time-consuming, often requiring multiple GPU-hours to generate a single sample. This is quite contrasting to generative image models that can produce samples in just seconds or minutes.
To address this issue, we have developed an alternative approach to 3D object generation. In our method, a 3D model can be created within a mere 1-2 minutes using a single GPU. Here’s how it works:
Firstly, our method employs a text-to-image diffusion model to generate a synthetic view. Then, a second diffusion model, conditioned on the generated image, produces a 3D point cloud. Although our method may not achieve the same level of sample quality as the state-of-the-art techniques, it offers a significant advantage in speed, being one to two orders of magnitude faster for sampling. This makes it a practical choice for certain use cases.
For those interested, we have made our pre-trained point cloud diffusion models, along with the code and models for evaluation, available on our GitHub repository: [insert link here].
In conclusion, our method revolutionizes the process of 3D object generation by drastically reducing the time required while still maintaining satisfactory results. It opens up new possibilities for efficient and practical implementation of AI-generated 3D models.