Abstract:
To address the issue that existing explainable methods for vision-language models mostly focus on super-ficial inter-modal mapping and lack analysis of global decision-making basis and reasoning logic, this pa-per proposes a progressive visual analysis method and an interactive visual analysis system MMRLens to improve the transparency and interpretability of the model decision-making process. First, the model’s de-cision features and decision correlations are constructed at a global level. Second, progressive analysis across the decision level, reasoning level, and modal level is conducted to reveal the potential patterns and reasoning logic in the model decision-making process. Third, an interactive visual analysis system MMR-Lens is built to realize four core functions: overall model performance visualization, global decision rule overview, contextual case reasoning logic exploration, and instantiated modal feature importance analysis. Among them, the global decision rule overview adopts a topology-constrained UMAP algorithm to pre-serve the similarity relationships and hierarchical structure among decisions. Finally, the effectiveness of the proposed method is verified through face-to-face expert interview evaluations and user studies based on questionnaire scoring. The expert evaluation results show that the proposed method helps users gradu-ally uncover reasoning logic from global to detailed perspectives, and improves the transparency and in-terpretability of the model decision-making process.