基于表征学习的时空态势可视分析方法
Visual Analysis of Spatiotemporal Situation via Representation Learning
-
摘要: 时空态势描述了大量运动对象在一段时间内的空间位置变化过程, 分析和寻找时空态势对大量实际应用都具有重要意义. 针对海量的数据和无法避免的数据缺失为分析时空态势带来的问题, 提出一种从大规模时空数据中寻找代表性时空态势的表征学习的方法. 首先, 基于Denoising Autoencoder框架实现一个时空态势表征模型, 将每个时空态势编码为一个紧凑的表征向量, 向量之间的距离体现了时空态势之间的特征差异, 通过向编码器添加噪声并让解码器还原未带噪声的原始态势, 提高表征模型在数据缺失条件下的鲁棒性; 然后提出一种可视摘要算法, 以一个精简的小图标直观地显示连续时空态势的整体运动特征, 进一步, 构造了一个时空态势投影供用户快速找到数据集中的代表性的时空态势. 基于纽约出租车数据集和芝加哥犯罪数据集, 采用自相似和交叉相似的指标与既有相似性方法对比, 实验结果表明, 所提出的表征模型在大数据量、高缺失率条件下的准确性和鲁棒性, 并具有较高的执行效率; 用户实验包括选择和排序任务, 结果表明, 所提方法找到的代表性的时空态势与用户的主观认知更加一致.Abstract: Spatiotemporal situations involve the process of change in the continuous spatial distribution of a large number of moving objects over a long period of motion. Aiming at the problems caused by the huge amount of data and unavoidable data loss and distortion for analyzing spatiotemporal situations, a representation learning-based approach is proposed for exploring representative spatiotemporal situations from large datasets. Firstly, we implement a spatiotemporal situation representation model based on the Denoising Autoencoder framework, which can encode each spatiotemporal situation as a compact representation vector. The distance between the representation vectors reflects the overall difference between different spatiotemporal situations. By introducing noise to the input data of the model encoder and allowing the decoder to restore the original data without noise, our model can effectively improve the characterization accuracy and robustness under the conditions of missing and distorted data. Second, we propose a summary algorithm that can visualize the overall characteristics of continuous spatiotemporal situations with a streamlined glyph. Furthermore, we construct a spatiotemporal situation projection for users to quickly find representative spatiotemporal situations from large datasets. Based on the New York taxi dataset and Chicago crime dataset, we use self-similarity and cross-similarity indicators to compare with the existing methods. The quantitative experiments show the representation model's accuracy, robustness, and high execution efficiency under the conditions of large data volume and a high missing rate. User experiments include selection and ranking tasks, and the results illustrate that the representative spatiotemporal situations found by our method are more consistent with users' subjective perceptions.