一种基于维度投影的多维数据相关性可视分析方法
Visual Analysis of Correlation in Multidimensional Data Based on Dimension Projection Technique
-
摘要: 针对高维多元数据相关性分析需求,首先提出一种基于KNN和Pearson相关系数的维度相关性度量方法KNN-Pearson,通过数据集在维度某数据值处的密度定量地表示某维度对聚类的贡献度,并将其作为计算元素,通过Pearson相关系数计算各维度间的相关性大小,定量地表示各维度之间的相关程度;进而提出一种基于维度投影的相关性可视分析方法,通过多维尺度分析(MDS)进行维度投影,用投影散点图和矩阵热图展示维度之间的相关性,用投影矩阵和平行坐标展示数据的分布态势和聚类特征,允许通过维度选择构造用户感兴趣的子空间,在子空间中交互地分析数据、探索规律;将上述方法应用于食品安全领域,设计并实现了一个农残数据相关性可视分析系统,通过数据筛选、维度选择、尺度缩放以及多视图联动等交互手段实现对多地区农产品中检出农药的相关性分析,从而发现检测地区对农产品施用农药的模式,掌握农药施用的规律.最后通过用户体验以及评价,证明了文中方法的有效性.Abstract: Aiming at the requirement of correlation analysis in high dimensional multivariate data,a correlation measure KNN-Pearson based on KNN and Pearson coefficients is proposed firstly.This method can quantitatively express the contribution degree of a dimension to clustering.The method uses the data density of a dimension value as the calculation element,and calculates the correlation between the dimensions by the Pearson correlation coefficient.Then,a visual analysis method of correlation based on dimension projection is proposed.The method uses the multidimensional scale technique to carry out the projection of the dimension,and shows the correlation between the dimensions by using the projection scatter plot and the matrix thermal graph.The distribution and clustering characteristics of the data are displayed by the data projection matrix and parallel coordinates,allowing the subspaces of interest to be constructed by the dimension selection,and the data can be analyzed interactively in the subspace.The visual analysis method of correlation is applied to the field of food safety,and a visual analysis system of pesticide residue detection data is designed and implemented.Through the interaction of data screening,dimension selection,scale scaling and multi-view linkage,the correlation analysis of pesticide detection in multi-regional agricultural products was realized,and the pattern of pesticide application in agricultural areas was found.The effectiveness of the method is proved by user experience and evaluation.