Advanced Search
Hu Mengnan, Wang Rong, Zhang Wenjing, Zhang Qi. Multi-Scale Referring Image Segmentation Based on Dual Attention[J]. Journal of Computer-Aided Design & Computer Graphics, 2025, 37(1): 148-156. DOI: 10.3724/SP.J.1089.2023-00355
Citation: Hu Mengnan, Wang Rong, Zhang Wenjing, Zhang Qi. Multi-Scale Referring Image Segmentation Based on Dual Attention[J]. Journal of Computer-Aided Design & Computer Graphics, 2025, 37(1): 148-156. DOI: 10.3724/SP.J.1089.2023-00355

Multi-Scale Referring Image Segmentation Based on Dual Attention

  • This paper proposes a multi-scale referring image segmentation method based on dual attention to solve the problem of insufficient interaction between visual and linguistic modes, as well as different structural and semantic information required by objects of different sizes. Firstly, the dual attention mechanism is used to realize the intermodal and intramodal interaction between vision and text, which enhances the ability to align visual and linguistic features accurately by using different types of information words in the expression. Secondly, using language features as guidance, useful features are selected from other levels for information exchange to further enhance feature representation. Then, dual path ConvLSTM is used to fully integrate low-level visual details and high-level semantics from bottom-up and top-down paths. Finally, multi-scale information is fused by atrous spatial pyramid pooling, increasing the perception ability of the model for different scales. Experiments on the UNC, UNC+, GRef, and ReferIt reference data sets show that the proposed method oIoU improves by 1.81 percentage points on UNC, 1.26 percentage points on UNC+, 0.84 percentage points on GRef, and 0.32 percentage points on ReferIt. Extensive ablation studies have also validated the effectiveness of each component of our approach.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return