基于富文本的三维高斯生成与色彩编辑

吴启凌; 徐昆

doi:10.3724/SP.J.1089.2025-00261

基于富文本的三维高斯生成与色彩编辑

吴启凌,
徐昆

3DGS Generation and Color Editing with Rich Text

Wu Qiling,
Xu Kun

摘要

摘要: 三维高斯溅射凭借其渲染效率与质量, 已经成为生成式三维模型的主流表示. 然而, 现有的三维高斯的语义编辑方法效率低、一致性差. 针对基于二维扩散的编辑方法容易产生三维不一致、颜色渗漏等问题. 因此, 提出一种基于富文本引导的三维高斯语义编辑方法, 重点是色彩编辑任务. 首先, 使用现有的文本生成三维高斯模型得到待编辑高斯与跨视角注意力图; 其次, 利用注意力图提取三维一致的语义分割, 建立富文本词组与编辑区域的关联; 最后, 设计基于区域的扩散降噪过程, 结合语义分割进行噪声混合, 逐步引导编辑区域的色彩向目标色彩逼近. 在多条提示词上的实验结果表明, 所提方法和2种参考方法相比, 编辑区域更为精准、颜色渗漏更少, 且编辑结果更自然; 和基于优化的方法相比, 单次编辑时间从10 min缩短至25 s.

Abstract: 3D Gaussian splatting(3DGS) has emerged as a dominant representation for generative 3D modeling, due to its rendering efficiency and quality. However, existing semantic editing methods for 3DGS still face challenges such as low efficiency and poor consistency. 2D diffusion-based editing methods suffer from 3D inconsistency and color bleeding. To address these issues, we propose a rich-text-guided semantic editing approach for 3DGS focusing on color editing. First, we employ an existing text-to-3DGS model to produce initial Gaussians and cross-view attention maps. Second, the attention maps are used to extract semantically consistent 3D segmentations, which establish correspondence between rich-text spans and target editing regions. Finally, we design a region-based diffusion denoising process that incorporates semantic segmentation to blend noise and progressively guide the color of the target regions toward the desired values. Experiments conducted across diverse prompts demonstrate that, compared with two baseline methods, our approach produces more precise editing regions, reduces color bleeding, and yields more natural results. Furthermore, compared with optimization-based method, our method reduces edit time from 10 minutes to 25 seconds per operation.

HTML全文

参考文献(0)

施引文献

资源附件(0)