面向近似最近邻搜索的码字扩展增强型残差量化
Codewords-Expanded Enhanced Residual Vector Quantization for Approximate Nearest Neighbor Search
-
摘要: 为了进一步提高图像特征向量的近似最近邻搜索精度,提出一种码字扩展增强型残差量化方法,将增强型残差量化与均值等分向量计算方法相结合,降低码书训练误差并提高特征向量量化精度.在码书训练阶段,除第1层码书训练外,利用均值等分向量计算方法将上一层码书训练的误差向量作为下一层码书训练的输入,在此基础上提出迭代优化方法降低码书训练的全局量化误差;在特征向量量化阶段,利用均值等分向量计算方法对每层码书进行扩展,用得到的新码字对该层输入特征向量进行量化以提高量化精度;最后对特征向量近似最近邻搜索,提出一种非对称欧几里得度量计算方法.在2个公开的SIFT和GIST数据集上与5种典型方法进行实验的结果表明,所提方法可降低码书训练误差10%~24%,提高近似最近邻搜索召回率1%~44%;另外,在获得相同召回率条件下,所提方法可使码书的规模减小50%.Abstract: A codewords-expanded enhanced residual vector quantization(CERVQ) is proposed to improve the accuracy of approximate nearest neighbor(ANN) search for feature vectors.It combines enhanced residual vector quantization(ERVQ) with the method of calculating mean-equisection vector to reduce training error and improves the quantization accuracy.Firstly,the mean-equisection vector is used to compute residual vector as the input to next layer in the training stage,except for the first layer.Based on this,an iterative method for optimizing codebooks is designed to reduce overall quantization error.Secondly,the codebook of each layer is expanded with the mean-equisection vectors to generate new codewords in quantization stage,which are used to quantize input feature vectors to improve quantization accuracy.Finally,a method of calculating asymmetric Euclidean distance is implemented for ANN search.The CERVQ is compared with 5 typical methods on two public SIFT and GIST datasets.Experiment results show that the training error reduces by 10%~24% and the recall rate of ANN search is increased by 1%~44%.In addition,the scale of codebook in the CERVQ can reduce by 50% under the condition of reaching the same recall rate.