基于渐进式分析的视觉语言模型推理能力可视分析方法

夏旺; 周学文; 江棨; 王云超; 高海东; 孙国道; 梁荣华

doi:10.3724/SP.J.1089.2025-00337

基于渐进式分析的视觉语言模型推理能力可视分析方法

A Progressive Visual Analytics Method for Reasoning Capability of Vision-Language Models

摘要

摘要: 针对视觉语言模型可解释方法多聚焦模态间表面映射、缺乏全局决策依据与推理逻辑分析的问题，本文提出一种渐进式可视分析方法及交互式可视分析系统MMRLens，以提升模型决策过程的透明度和可解释性。首先，从全局层面构建模型的决策特征及决策关联；其次，通过决策层级、推理层级与模态层级的渐进式分析，揭示模型决策过程中的潜在规律与推理逻辑；再次，构建交互式可视分析系统MMRLens，实现模型整体性能展示、全局决策规则概览、上下文案例推理逻辑探索及实例化模态特征重要性分析四大功能，其中全局决策规则概览采用基于拓扑约束的UMAP算法以保持决策间的相似度关联和层级结构；最后，通过面对面专家访谈和基于问卷评分的用户研究验证方法有效性。评估结果表明，所提方法帮助用户从全局到细节逐步揭示推理逻辑，提升了模型决策过程的透明度和可解释性。

Abstract: To address the issue that existing explainable methods for vision-language models mostly focus on super-ficial inter-modal mapping and lack analysis of global decision-making basis and reasoning logic, this pa-per proposes a progressive visual analysis method and an interactive visual analysis system MMRLens to improve the transparency and interpretability of the model decision-making process. First, the model’s de-cision features and decision correlations are constructed at a global level. Second, progressive analysis across the decision level, reasoning level, and modal level is conducted to reveal the potential patterns and reasoning logic in the model decision-making process. Third, an interactive visual analysis system MMR-Lens is built to realize four core functions: overall model performance visualization, global decision rule overview, contextual case reasoning logic exploration, and instantiated modal feature importance analysis. Among them, the global decision rule overview adopts a topology-constrained UMAP algorithm to pre-serve the similarity relationships and hierarchical structure among decisions. Finally, the effectiveness of the proposed method is verified through face-to-face expert interview evaluations and user studies based on questionnaire scoring. The expert evaluation results show that the proposed method helps users gradu-ally uncover reasoning logic from global to detailed perspectives, and improves the transparency and in-terpretability of the model decision-making process.

HTML全文

参考文献(0)

施引文献

资源附件(0)

英文长摘要