融合局部和长距离信息的图像特征点提取
Fusing Local and Long-Range Information for Image Feature Point Extraction
-
摘要: 现有的基于学习的图像特征点提取方法忽略了图像中的长距离上下文信息, 在诸如石油生产场景等重复纹理和弱纹理场景中表现不佳, 为油田生产的安全视频监控分析和数字孪生平台的构建带来了较大困难. 为了解决上述问题, 提出一种融合局部和长距离信息的图像特征点提取方法. 首先利用高效的大卷积核捕捉长距离信息, 并将长距离信息与普通卷积捕捉到的局部信息进行有效融合, 得到判别性更强的图像局部描述子; 为了进一步提升图像特征点的匹配精度, 提出一种全图像范围内自适应最困难描述子三元组构建方法, 将描述子三元组的构造范围由共视区域扩展至整幅图像, 提升训练过程和实际应用场景的相似度. 在HPatches数据集上, 所提方法的平均匹配准确率高出对比方法2.89%. 在Aachen Day-Night v1.1数据集上, 所提方法的视觉定位准确率高出对比方法2.10%. 在InLoc数据集上, 所提方法的视觉定位准确率高出对比方法3.70%. 在富含重复纹理和弱纹理区域的石油生产场景数据集中, 所提方法的平均特征点匹配内点数高出对比方法20.37%.Abstract: Extracting feature points from images is a fundamental step in many computer vision tasks. However, existing learning-based feature point extraction methods mainly use convolutional neural networks to extract local features from images, which neglects long-range context information. Thus, existing learning-based feature point extraction methods perform poorly in repetitive-texture and low-texture scenes. There are many pipelines with repetitive textures and open areas with low textures, especially in oil field production scenes. This problem poses difficulties for the analysis of safety video surveillance and the construction of digital twin platforms in oil field production. To solve this problem, we propose a novel and efficient feature point extraction network that fuses local and long-range information. Firstly, the proposed network utilizes efficient large-kernel convolutional layers to capture long-range information. Then, the proposed network can extract local descriptors with higher discriminative ability by fusing the long-range information and the local information captured by vanilla convolutional layers. To further enhance the matching accuracy of feature points, we propose a novel method of adaptively constructing hardest descriptor triplets within entire images. The proposed method extends the scope of constructing triplets from covisible areas to entire images, which raises the similarity between the training process and real application scenarios. Experimental results on image matching, homography estimation, indoor and outdoor large-scale visual localization tasks show that our proposed method outperforms existing feature point extraction methods.