基于动态可变形卷积的轻量化道路缺陷检测方法
Lightweight Road Defect Detection Method Based on Dynamic Deformable Convolution
-
摘要: 为了提高道路缺陷的检测精度, 针对道路缺陷的多态性和不规则性, 提出了一种基于动态可变形卷积的轻量化道路缺陷实时Transformer检测网络.该网络遵循编码器-解码器网络框架, 选用ResNet18作为底层主干网络, 在此残差网络基础上, 创新设计了多路径坐标注意力机制(MPCA), 并将之与可变形卷积模块(DCNv2)深度融合, 形成性能更为出色的动态可变形卷积模块, 用以构建具备可变感受野能力的残差网络, 使其能灵活应对不同大小和形状的道路缺陷. 此外, 在尺度内特征交互模块(AIFI)中引入了可变形注意力机制, 以增强对图像中关键目标信息的捕捉与提取效能. 在跨尺度特征融合模块(CCFM)中引入轻量化模块, 以此实现颈部网络的轻量化. 在全球道路缺陷检测挑战赛(GRDDC2020)数据集上与YOLOv5-m等八种其他方法进行了比较, 结果表明该模型的检测效果更加理想, 且在模型的参数量与计算量上表现也更加出色; 该模型的mAP、Params、GFLOPs和FPS分别为61.1%、19.6M、50.2、42.3帧/ms, 可有效地检测道路缺陷, 为路面养护工作提供信息.Abstract: To enhance the accuracy of road defect detection while addressing the multifaceted and irregular nature of these defects, a novel lightweight road defect detection method based on dynamic deformable convolution has been proposed. This method follows an encoder-decoder network architecture, utilizing ResNet18 as the backbone network within the residual framework. Innovatively, it incorporates a Multi-Path Coordinate Attention Mechanism (MPCA), which is deeply integrated with the Deformable Convolutional Module (DCN), resulting in a more advanced dynamic deformable convolution module. This design allows for the creation of a residual network capable of adapting its receptive field to effectively handle road defects of varying sizes and shapes. Moreover, within the Adaptive Intra-Scale Feature Interaction module (AIFI), a deformable attention mechanism is introduced to bolster the ability to capture and extract critical target information from images. Additionally, lightweight modules have been integrated into the Cross-Scale Feature Fusion Module (CCFM) to facilitate the lightweight design of the neck network. The model was benchmarked against eight other methods, including YOLOv5-m, on the Global Road Defect Detection Challenge (GRDDC2020) dataset. The results demonstrated that this model not only provides superior detection performance but also exhibits improved metrics in terms of model parameters and computational complexity. Specifically, the model achieved a mean Average Precision (mAP) of 61.1%, had 19.6 million parameters (Params), 50.2 billion floating-point operations per second (GFLOPs), and operated at a frame rate of 42.3 milliseconds (FPS). These attributes collectively attest to its effectiveness in accurately detecting road defects and provide valuable insights for pavement maintenance tasks.