Abstract:
For VLAD, higher search accuracy will be obtained by increasing the size of visual codebook, but more memory usage is entailed. To solve the contradiction between search quality and memory usage, a global image descriptor called EVLAD, aggregating finer residual by use of hierarchical visual codebook with two-layer structure, is proposed. In the offline preprocessing stage, firstly, the first layer visual codebook is learned with K-means in the local descriptor space, and then each visual sub-codebook of the second layer is generated non-uniformly based on the quantization residual minimization criterion. In the online generation stage, the idea of EVLAD is associating the residual generation and accumulation process to different layer visual words, i.e., for a local descriptor, the residual, which is generated by subtracting the second layer nearest visual word from the local descriptor, is summed to a vector corresponding to one of the first layer visual word, and then EVLAD is the concatenation of all vector. In order to suppress the burst phenomena in feature space, L2 -normalization is employed for each subsector and the final concatenation vector. The experimental result shows our EVLAD outperforms VLAD and other modified strategies.