基于实体信息的弱监督指称图像分割方法
Weakly-Supervised Referring Image Segmentation based on Entity Information
-
摘要: 针对仅以文本标注为监督的弱监督指称图像分割方法中实体信息利用不充分的问题, 提出一种基于实体信息的弱监督指称图像分割方法, 在使用文本作为弱标签的基础上, 利用文本中实体单词隐藏的语义信息为视觉定位任务提供有效线索. 首先设计一个候选实体检测模块, 通过提取实体信息来识别图像中所有潜在的对象; 然后设计一个交互增强模块, 使文本和图像2个模态特征互相促进, 配合一个响应优化损失得到定位精准的响应图; 最后通过匹配操作获得最终的伪标签结果. 在4个大型公开数据集RefCOCO, RefCOCO+, RefCOCOg和ReferIt上的实验结果表明, 即便仅采用文本作为弱监督信号, 通过挖掘并利用实体信息, 与现有先进的弱监督方法相比, 所提方法的mIoU评价指标分别取得了3.9%, 22.9%, 8.1%和9.1%的提升.Abstract: To solve the problem of insufficient use of entity information in language based weakly supervised referring image segmentation, this paper proposes a weakly supervised referring image segmentation method based on entity information. On the basis of language supervision, this work utilizes the hidden semantic information of entity words in text to provide effective clues for the visual localization task. Firstly, a candidate entity detection module is proposed, which can recognize all potential objects in the image by extracting entity information. Then an interactive enhancement module is designed to make the features of two-modalities promote the representation ability of each other. This is followed by a response optimization loss which facilitates to generate accurate response map. Finally, the final prediction result is obtained by a matching operation. Experimental results on four large benchmarks, RefCOCO, RefCOCO+, RefCOCOg, and ReferIt, show that compared with existed weakly supervised methods, the proposed method achieves 3.9%, 22.9%, 8.1%, and 9.1% improvement on the mIoU metric by utilizing entity information with only language supervision.