Remaining dauntless in the face of multiple ethical complaints, Meta remains loyal to artificial intelligence (AI) with the public release of Meta 3D Gen (3DGen), the newest addition to its novel generative AI technology that turns prompts into high-quality three-dimensional assets in under a minute.
On July 2, the Mark Zuckerberg-owned company gave a glimpse of its latest technology in a video posted on X that showed how text prompts such as “a raccoon holding a pizza,” “a cactus plant,” and “a plate of rotisserie chicken” materialized into 3D assets.
Higher Quality, Faster Processing
Meta 3D Gen came to life after fusing AssetGen and TextureGen—AI models developed by the company popular for generating 3D objects and adding textures. The integration of these technologies enhanced the quality of the immersive content and its production speed.
“This system can generate 3D assets with high-resolution textures & material maps end-to-end with results that are superior in quality to previous state-of-the-art solutions—at 3-10x the speed of previous work,” Meta explained in the video’s caption.
With the rise in popularity and demand for AI, text-to-3D generators have been utilized to produce characters, backdrop, props, and other materials from simple textual input, finding wide applications in video games, visual effects, animation, and even architecture. However, this process often takes a long time using existing models.
3DGen, on the other hand, generates an output much faster. Meta supports the science behind this claim in its technical paper, detailing the results of its research evaluating the performance of its newest model against its competitors.
“Authoring 3D content is one of the most time-consuming and challenging aspects of designing and developing video games, augmented and virtual reality applications, and special effects in the movie industry. By providing AI assistants that can double as a 3D artist, we can enable new experiences centered on creating personalized, user-generated 3D content,” Meta stated in its paper.
How 3DGen Works
The report also explains how the technology functions in a two-step operation. Once the user enters a text prompt, AssetGen produces a raw 3D asset with texture and physical-based rendering (PBR) after 30 seconds. The material is then enhanced using TextureGen to generate texture and better-quality PBR maps after 20 seconds.
While this technology shares similarities with Midjourney, Adobe Firefly, and other text-to-image models, 3DGen sets itself apart from competitors by building complete 3D models with underlying mesh structures compatible with PBR.
Moreover, by breaking down the process into two steps to separate mesh models and texture maps, 3DGen allows for trial and error, which is common to many text-to-image generators. It also gives users more control over their desired output without tweaking the underlying model.