面向时间序列分类任务的SAX方法特性及可视化探索

宋伟; 张帆; 叶阳东; 范明; 徐明亮

面向时间序列分类任务的SAX方法特性及可视化探索

Intrinsic Property Study ＆ Visualization of SAX Method towards Time Series Data Classification

摘要

摘要: 针对时间序列特征SAX表示方法的信息损失及保留情况,利用信息嵌入成本（IEC）这一度量手段来分析SAX方法的内在统计特性,并通过复杂网络表示方法建立时间序列的图形化表示,达到时间序列数据分析及可视化的目的.对于时间序列数据,首先进行SAX特征表示并计算其IEC值;然后对原始时间序列及其SAX表示分别进行分类并对比分类错误率,分析IEC分值与分类错误率的关系;最后根据数据自身特点及IEC分值选取具有代表性的数据集,将SAX表示转化为马尔科夫转移矩阵,进而采用复杂网络表示方法进行可视化展示.对原始时间序列采用分位数离散化特征表示方法,与SAX方法进行可视化效果对比的实验结果表明,SAX方法能在有效降低复杂性的同时保留原始时间序列中的核心信息.文中提供了IEC分值作为SAX方法有效性判别的一个参照标准,并建立了一个有效的分析评估与可视化方法框架.

Abstract: Symbolic aggregate approximation(SAX) is a standard representation for time series data mining. However, very little work has been done about the intrinsic properties of this method. We proposed a statistical measurement, namely information embedding cost(IEC), to analyze the statistical behaviors of the symbolic dynamics. With IEC, we further build the Markov transition matrix from the SAX representation to visualize the time series as complex networks. The experiments on the benchmark datasets demonstrate that SAX can efficiently embed the time series with reduced complexity while preserve the core information. The IEC score provides a priori to determine if SAX is adequate for specific dataset, which can be generalized to evaluate other symbolic representations. We applied visualization approach together with IEC score to visu-ally understand, explore classification tasks and the intrinsic properties of SAX. A framework was provided to analyze, evaluate and further improve the symbolic dynamics for knowledge discovery in time series.

HTML全文

参考文献(0)

施引文献

资源附件(0)