On Friday, researchers from Nvidia announced Magic3D, an AI model that can generate 3D models from text descriptions. After entering a prompt such as, “A blue poison-dart frog sitting on a water lily,” Magic3D generates a 3D mesh model, complete with colored texture, in about 40 minutes. With modifications, the resulting model can be used in video games or CGI art scenes.
In its academic paper, Nvidia frames Magic3D as a response to DreamFusion, a text-to-3D model that Google researchers announced in September. Similar to how DreamFusion uses a text-to-image model to generate a 2D image that then gets optimized into volumetric NeRF (Neural radiance field) data, Magic3D uses a two-stage process that takes a coarse model generated in low-resolution and optimizes it to higher-resolution. According to the paper’s authors, the resulting Magic3D method can generate 3D objects two times faster than DreamFusion.
Magic3D can also perform prompt-based editing of 3D meshes. Given a low-resolution 3D model and a base prompt, it is possible to alter the text to change the resulting model. Also, Magic3D’s authors demonstrate preserving the same subject throughout several generations (a concept often called coherence) and applying the style of a 2D image (such as a cubist painting) to a 3D model.