基于变分自编码器的流形学习降维方法

冯琳琳; 王长鹏; 吴田军; 张讲社

doi:10.3724/SP.J.1089.2023-00089

基于变分自编码器的流形学习降维方法

Dimensionality Reduction Method for Manifold Learning Based on Variational Autoencoder

摘要

摘要: 针对科学数据集的规模和复杂性的迅速增长, 现有的降维方法存在“拥挤问题”以及不能嵌入新样本的问题, 提出了一种变分自编码器均匀流形近似与投影的数据降维方法. 首先, 为减小高维数据之间的耦合性, 利用变分自编码器将数据压缩为潜在变量;然后, 运用均匀流形近似与投影进一步将潜在变量降维, 使低维嵌入更好地保持原始数据之间的相似性关系;最后, 将所提方法用训练集进行拟合, 并嵌入一个样本外测试集来评估对新数据的泛化能力. 实验结果表明, 在MNIST和Fashion-MNIST数据集上, 与UMAP, DensMAP, VAE和AE这4个优秀降维方法相比, 所提方法的可信度得分分别达到0. 994 4和0. 993 9, 超越了当前最好方法UMAP 0.031 6和0.014 1, 同时在可视化、Kendall秩相关系数以及分类精度评价指标上也有显著的改进.

Abstract: Given the rapidly growing scale and complexity of scientific datasets, existing dimensionality reduction methods suffer from the “crowding problem” and the inability to embed new samples. A data dimensionality reduction method based on variational autoencoder uniform manifold approximation and projection (VAE-UMAP) has been proposed. First, to reduce the coupling between the high-dimensional data, the data is compressed into latent variables using a variational autoencoder (VAE). Then, the uniform manifold approximation and projection (UMAP) is used to further reduce the dimensionality of the latent variables, so that the low-dimensional embedding better maintains the similarity relationship within the original data. Finally, the proposed method is fitted with a training set and embedded in an out-of-sample test set to evaluate the generalization ability to the new data. Experimental results show that on the MNIST and Fashion-MNIST datasets, compared to four prominent dimensionality reduction methods UMAP, DensMAP, VAE and AE, the proposed method achieved trustworthiness scores of 0.994 4 and 0.993 9, surpassing the best current method UMAP by 0.031 6 and 0.014 1, respectively. Additionally, there were significant improvements in visualization, Kendall rank correlation coefficient, and classification accuracy metrics.

HTML全文

参考文献(25)

施引文献

资源附件(0)