Abstract:
In the information age,literature data is growing explosively.In the face of massive unlabeled literature data,unsupervised text clustering can quickly and efficiently reorganize and summarize large-scale data.However,there are many factors that affect the effect of literature clustering results.From data preprocessing to text representation to text clustering,the results of different selection in these steps may be quite different.Moreover,the variety of methods in each step and the difficulty in explaining and evaluating the results of text clustering have caused great difficulties for literature clustering.Therefore,this paper proposes a complete visual analysis framework of literature clustering results.The framework includes data preprocessing,text representation,text clustering and visual analysis of clustering results.Visual analysis method is used to interpret,analyze,evaluate,adjust and optimize the clustering results.Based on this framework,this paper designs and implements a visual analysis system of literature clustering results,and studies the influence of different text representation methods and clustering algorithms on clustering results.Finally,three cases are used to verify the effectiveness of the framework.