A Framework for Procedural Generation and Evaluation of 3D Assets Driven by Large Language Models
-
Graphical Abstract
-
Abstract
The automated generation of 3D content holds significant importance for fields such as virtual reality, gaming, and film production. However, efficiently and controllably generating high-quality 3D objects based on natural language instructions remains challenging. This paper proposes a large language model driven framework for procedural generation and evaluation of 3D assets. We first design an LLM-based procedural generation framework that converts natural language descriptions into script commands for the Infinigen library, enabling cross-modal conversion from text to code and then to 3D objects that meet semantic requirements. Subsequently, we introduce a screening and optimization mechanism that employs the CLIP model to evaluate and score generation results, enabling automatic filtering of generated 3D models. Through ablation experiments, we assess the impact of different modules on generation effectiveness, and utilize the CLIP Score metric to quantitatively analyze the semantic alignment and visual quality of generated results. Experimental results demonstrate that our method significantly improves the quality of generated 3D models and their consistency with input text compared to baselines without CLIP screening. Our approach achieves higher CLIP matching scores across multiple categories of 3D object generation tasks. This research proves the feasibility of integrating large models with procedural 3D generation techniques, providing an effective pathway for cross-modal controllable generation of 3D content.
-
-