BTVis:基于BERTopic的交互式层次主题建模可视分析系统
BTVis: An Interactive Hierarchical Topic Modelling Visual Analysis System Based on BERTopic
-
摘要: 主题建模是自然语言处理中的重要文本挖掘方法, 但其建模过程复杂且会生成部分不符合用户期望的结果. 为帮助非专家用户理解模型建模过程, 高效掌握和修改模型结果, 设计了基于BERTopic的交互式可视分析系统. 系统通过以下关键功能提升BERTopic的可解释性与实用性: 1) 挖掘并展示BERTopic层次聚类的中间过程, 直观揭示主题生成机制; 2) 分析离群文档, 揭示其与主题间的潜在关系; 3) 提出多粒度局部模型编辑算法, 增强BERTopic主题模型准确性; 4) 开发基于Web的交互式层次主题模型系统BTVis, 支持用户通过可视分析与交互探索理解并提升模型结果. 在TED、豆瓣影评等真实长文本和短文本数据集中进行了定性分析, 面向100位参与者设计用户实验, 并通过一致性、多样性和稳定性指标和其他模型进行对比, 实验结果验证了所提系统的有效性和实用性.Abstract: Topic modelling is an important text mining method in natural language processing, but its modeling process is complex and can generate results that do not always align with user expectations. To address this issue and enable non-expert users to understand the modelling process and modify model results quickly, we propose the interactive visual analysis system BTVis based on BERTopic. The system enhances the interpretability and usability of BERTopic through the following key features: 1) explored and visualized the intermediate process of hierarchical clustering of BERTopic to intuitively reveal the topic generation mechanism; 2) analyzed the outlier documents to uncover potential relationships between the outlier documents and the topics; 3) proposed the multi-granular local model editing algorithms to enhance the accuracy of the BERTopic model; 4) implemented the interactive hierarchical topic modeling system BTVis in a web environment, allowing users to enhance BERTopic results through visual analysis and interactive exploration. We conducted experiments including qualitative analysis, user experiments and quantitative tests with multiple real datasets. The results show the effectiveness and practicality of the system.