Fusing Local and Long-Range Information for Image Feature Point Extraction
-
Graphical Abstract
-
Abstract
Existing learning-based feature point extraction methods neglect long-range context information. Thus, existing learning-based feature point extraction methods perform poorly in repetitive-texture and low-texture scenes such as oil production scenes. This problem poses difficulties for the analysis of safety video surveillance and the construction of digital twin platforms in oil field production. To solve this problem, we propose a novel and efficient feature point extraction network that fuses local and long-range information. Firstly, the proposed network utilizes efficient large-kernel convolutional layers to capture long-range information. Then, the proposed network can extract local descriptors with higher discriminative ability by fusing the long-range information and the local information captured by vanilla convolutional layers. To further enhance the matching accuracy of feature points, we propose a novel method of adaptively constructing hardest descriptor triplets within entire images. The proposed method extends the scope of con structing triplets from covisible areas to entire images, which raises the similarity between the training process and real application scenarios. On the HPatches dataset, the mean matching accuracy of the proposed method is 2.89% higher than those of the comparison methods. On the Aachen Day-Night v1.1 dataset, the visual localization accuracy of the proposed method is 2.10% higher than those of the comparison methods. On the InLoc dataset, the visual localization accuracy of the proposed method is 3.70% higher than those of the comparison methods. On the dataset of oil production scenes with repetitive and weak texture regions, the average number of feature point matching inliers of the proposed method is 20.37% higher than those of the comparison methods.
-
-