Fusing Local and Long-Range Information for Image Feature Point Extraction
-
Graphical Abstract
-
Abstract
Extracting feature points from images is a fundamental step in many computer vision tasks. However, existing learning-based feature point extraction methods mainly use convolutional neural networks to extract local features from images, which neglects long-range context information. Thus, existing learning-based feature point extraction methods perform poorly in repetitive-texture and low-texture scenes. There are many pipelines with repetitive textures and open areas with low textures, especially in oil field production scenes. This problem poses difficulties for the analysis of safety video surveillance and the construction of digital twin platforms in oil field production. To solve this problem, we propose a novel and efficient feature point extraction network that fuses local and long-range information. Firstly, the proposed network utilizes efficient large-kernel convolutional layers to capture long-range information. Then, the proposed network can extract local descriptors with higher discriminative ability by fusing the long-range information and the local information captured by vanilla convolutional layers. To further enhance the matching accuracy of feature points, we propose a novel method of adaptively constructing hardest descriptor triplets within entire images. The proposed method extends the scope of constructing triplets from covisible areas to entire images, which raises the similarity between the training process and real application scenarios. Experimental results on image matching, homography estimation, indoor and outdoor large-scale visual localization tasks show that our proposed method outperforms existing feature point extraction methods.
-
-