融合局部和长距离信息的图像特征点提取
Fusing Local and Long-Range Information for Image Feature Point Extraction
-
摘要: 现有的基于学习的图像特征点提取方法忽略了图像中的长距离上下文信息,在诸如石油生产场景等重复纹理和弱纹理场景中表现不佳,为油田生产的安全视频监控分析和数字孪生平台的构建带来了较大困难.为了解决上述问题,提出一种融合局部和长距离信息的图像特征点提取方法.首先利用高效的大卷积核捕捉长距离信息,并将长距离信息与普通卷积捕捉到的局部信息进行有效的融合,得到判别性更强的图像局部描述子;为了进一步提升图像特征点的匹配精度,提出一种全图像范围内自适应最困难描述子三元组构建方法,将描述子三元组的构造范围由共视区域扩展至整幅图像,提升训练过程和实际应用场景的相似度.在HPatches数据集上,所提方法的平均匹配准确率高出对比方法2.89%.在Aachen Day-Night v1.1数据集上,所提方法的视觉定位准确率高出对比方法2.10%.在InLoc数据集上,所提方法的视觉定位准确率高出对比方法3.70%.在富含重复纹理和弱纹理区域的石油生产场景数据集中,所提方法的平均特征点匹配内点数高出对比方法20.37%.Abstract: Existing learning-based feature point extraction methods neglect long-range context information. Thus, existing learning-based feature point extraction methods perform poorly in repetitive-texture and low-texture scenes such as oil production scenes. This problem poses difficulties for the analysis of safety video surveillance and the construction of digital twin platforms in oil field production. To solve this problem, we propose a novel and efficient feature point extraction network that fuses local and long-range information. Firstly, the proposed network utilizes efficient large-kernel convolutional layers to capture long-range information. Then, the proposed network can extract local descriptors with higher discriminative ability by fusing the long-range information and the local information captured by vanilla convolutional layers. To further enhance the matching accuracy of feature points, we propose a novel method of adaptively constructing hardest descriptor triplets within entire images. The proposed method extends the scope of con structing triplets from covisible areas to entire images, which raises the similarity between the training process and real application scenarios. On the HPatches dataset, the mean matching accuracy of the proposed method is 2.89% higher than those of the comparison methods. On the Aachen Day-Night v1.1 dataset, the visual localization accuracy of the proposed method is 2.10% higher than those of the comparison methods. On the InLoc dataset, the visual localization accuracy of the proposed method is 3.70% higher than those of the comparison methods. On the dataset of oil production scenes with repetitive and weak texture regions, the average number of feature point matching inliers of the proposed method is 20.37% higher than those of the comparison methods.
下载: