Abstract:
Addressing the limitation of Deformable DETR, which solely relies on ResNet for basic feature extraction, thereby constraining the detection performance of subsequent modules, this paper proposes a target detection network based on feature enhancement and polynomial interpolation. Firstly, a feature extraction module is introduced to simultaneously capture both local and global information from images, aiding the network in more accurately identifying key image features. Secondly, a dual attention module is designed to dynamically adjust the weights of feature channels and spatial positions according to requirements, enabling the network to focus on image regions that are more critical to the current task. Lastly, a polynomial interpolation method is proposed to fit more feature vectors around the target points, thereby generating higher-quality feature vectors through computation. Experiments conducted on the COCO dataset under consistent conditions revealed that, compared to Deformable DETR, the proposed network achieves an average detection accuracy of 44.8%, with a 1.9 percentage point increase in large object detection accuracy. All detection accuracy metrics are improved, and the network outperforms other comparable networks in the series.