高级检索
汤颖, 苏建明, 童宁. TMvis: 基于LDA的主题建模可视分析系统[J]. 计算机辅助设计与图形学学报, 2019, 31(10): 1728-1738. DOI: 10.3724/SP.J.1089.2019.17987
引用本文: 汤颖, 苏建明, 童宁. TMvis: 基于LDA的主题建模可视分析系统[J]. 计算机辅助设计与图形学学报, 2019, 31(10): 1728-1738. DOI: 10.3724/SP.J.1089.2019.17987
Tang Ying, Su Jianming, Tong Ning. TMvis: A Visual Analysis System Based on LDA Topic Modelling[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(10): 1728-1738. DOI: 10.3724/SP.J.1089.2019.17987
Citation: Tang Ying, Su Jianming, Tong Ning. TMvis: A Visual Analysis System Based on LDA Topic Modelling[J]. Journal of Computer-Aided Design & Computer Graphics, 2019, 31(10): 1728-1738. DOI: 10.3724/SP.J.1089.2019.17987

TMvis: 基于LDA的主题建模可视分析系统

TMvis: A Visual Analysis System Based on LDA Topic Modelling

  • 摘要: 主题建模是非常重要的一类文本挖掘方法,被广泛用于构建文本语料库的主题,但其存在难以解释和调整的问题.为了协助用户构建字典以及帮助用户理解主题模型并调节模型,设计并实现了渐进式可视化分析框架,包含2个可视化工作区:语料库优化可视化工作区,协助用户高效构建字典;主题模型可视化工作区,提供多尺度信息可视化以辅助用户理解主题模型并交互地改进主题建模.实现了Web环境下的交互式可视主题模型系统TMvis,并采用20newsgroups新闻数据设计了对照实验,证明了方法的有效性.此外,实现了针对豆瓣电影数据的案例分析,验证了系统的实用性.

     

    Abstract: Topic modeling is one of the most important text mining methods, which has been widely used in analyzing the topic composition of a text corpus. Its main drawback lies in that it is difficult to interpret or adjust the topic modeling results. To help users understand and manipulate topic models, we design and implement a progressive visual analysis framework with two visualization components: a corpus refinement component which assists users construct the dictionary efficiently;and a topic modelling component which illustrates multi-dimensional information concerning topics and allows for interactive manipulation of topic models. The effectiveness of the proposed approach is tested with a control experiment using the 20 newsgroups news dataset. A case study on the real Douban movie dataset further verifies the practicability of TMvis.

     

/

返回文章
返回