Abstract:
Manual examination of mammogram images is not only costly, but also time-consuming for radiologists to annotate lesions. As a result, patients often face delays in obtaining detection results, and these results are susceptible to subjective influence from doctors. To assist physicians in promptly obtaining the positional information of three key diagnostic targets—masses, calcification areas, and lymph nodes—a high-precision lesion localization method for mammography images based on RetinaNet, named MammoDet, is proposed. First, Swin Transformer is used as the backbone network to perform multi-level feature extraction on mammogram images. Then, the extracted features from different levels are passed to a bidirectional feature pyramid network, which incorporates cross-layer fusion modules and separate-layer fusion modules for multi-scale feature fusion. Finally, the fused features are decoded and predicted by the prediction head to obtain the specific locations of lesions and their corresponding category information. Experimental results on one dataset provided by the First People’s Hospital in Guangdong Province and two public datasets demonstrate that MammoDet achieves higher detection accuracy compared to classic models such as YoloV8 and Mask R-CNN. The mean average precision (mAP) for the three targets reaches 76.88%, which is an improvement of 6.26 percentage points over the classic RetinaNet.