大模型驱动三维资产程序化生成与评估方法

杭朋洁; 卢奕南; 白桐源; 伍铁如; 马锐

doi:10.3724/SP.J.1089.2025-00373

大模型驱动三维资产程序化生成与评估方法

A Framework for Procedural Generation and Evaluation of 3D Assets Driven by Large Language Models

摘要

摘要: 三维内容的自动化生成对于虚拟现实、游戏、影视等具有重要意义。针对根据自然语言指令高效、可控地生成高质量三维物体中存在的问题，提出基于大语言模型的三维资产程序化生成与评估方法。首先将用户自然语言描述转换为InfiniGen库的脚本指令，生成符合语义要求的三维模型，实现从文本到代码再到三维物体的跨模态转换；然后使用CLIP模型的筛选优化机制对生成结果进行评估和打分；最后根据分数自动选择三维模型。实验结果表明，与不采用CLIP筛选的基线方法相比，所提方法显著地提高了生成三维模型的质量以及与输入文本的一致性；在多个类别的三维物体生成任务上，该方法取得了更高的CLIP匹配分数；证明了将大模型与程序化三维生成技术相结合的可行性，为跨模态可控生成三维内容提供了一条有效的路径。

Abstract: The automatic generation of 3D content is of great significance for virtual reality, games, film and television and other fields. In view of the problems existing in the efficient and controllable generation of high-quality three-dimensional objects according to natural language instructions, a method of programmed generation and evaluation of three-dimensional assets based on large language model is proposed. Firstly, the user’s natural language description is transformed into the script instructions of InfiniGen library to generate a three-dimensional model that meets the semantic requirements, and realize the cross modal transformation from text to code and then to three-dimensional objects. Then, the screening optimization mechanism of clip model is used to evaluate and score the generated results. Finally, the 3D model is automatically selected according to the score. Experimental results show that compared with the baseline method without clip filtering, the proposed method significantly improves the quality of 3D model generation and the consistency with the input text. This method achieves higher clip matching scores for multi class 3D object generation tasks. The feasibility of combining large model with programmed 3D generation technology is proved, which provides an effective path for the controllable generation of 3D content across modes.

HTML全文

参考文献(29)

施引文献

资源附件(1)

英文长摘要