高级检索

基于跨视图信息交互的细粒度三维模型分类

Fine-Grained 3D Shape Classification Based on Cross-View Message Interaction

  • 摘要: 针对当前的细粒度三维模型分类方法通常集中于增强单个视图的细粒度特征提取能力, 忽略了视图间特征的依赖关系及不同粒度特征的有效融合的问题, 提出一种基于跨视图信息交互的细粒度三维模型分类网络MSGFormer. 首先通过视图内局部图像块Token间的自注意力, 完成局部区域的交互和视图特征的提取; 然后利用跨视图信息Token的交互, 实现视图间信息的流动和交互特征的捕捉; 再采用局部图像块Token选择策略选取局部显著特征, 显式地突出局部细粒度特征; 最后将视图的全局特征、交互特征和局部显著特征三者融合增强, 实现细粒度三维模型分类. 在细粒度分类数据集FG3D的3个子数据集Airplane, Car和Chair上的总体准确率分别达到97.40%, 80.30%和85.70%; 在元类别分类数据集ModelNet40上的总体准确率达到97.81%; 均优于所对比的方法, 表明所提网络具有良好的细粒度分类性能和泛化性能.

     

    Abstract: In response to the limitations of existing fine-grained 3D shape classification methods, which often focus on enhancing fine-grained feature extraction within individual views while neglecting the inter-view features dependency and the effective fusion of multi-granular features, we propose a fine-grained 3D shape classification network named MSGFormer, based on cross-view message interaction. First, the self-attention mechanism within local patch Tokens in each view achieves local region interaction and view feature extraction. Then, incorporating cross-view message Tokens interaction enables the flow of information between views and captures interactive features. The local patch Tokens selection strategy selects the local dominant features, explicitly highlighting the local fine-grained features. Finally, the global view, interactive, and local dominant features are fused and enhanced, which realizes the fine-grained 3D shape classification. On three subsets of the fine-grained classification dataset FG3D—Airplane, Car, and Chair—the proposed method achieves overall accuracies of 97.40%, 80.30%, and 85.70%, respectively; on the meta-category classification dataset ModelNet40, it achieves an overall accuracy of 97.81%; surpassing the compared methods, demonstrating the proposed network's excellent performance in fine-grained classification and generalization.

     

/

返回文章
返回