高级检索

跨模态时尚检索的服装分层特征表示学习

Clothing Hierarchical Feature Representation Learning for Cross-Modal Fashion Retrieval

  • 摘要: 针对时尚服装的图像和文本具有服装信息粒度细、模态关联性弱且匹配视角单一导致跨模态时尚检索图文匹配不准确的问题, 提出跨模态时尚检索的服装分层特征表示学习. 首先以成对的服装图文及标签为输入, 通过构建包含CNN、Faster-RCNN、级联金字塔网络和Bi-GRU的服装分层特征表示模块, 分别提取服装图像的全局、款式、结构特征和服装文本的描述、主语、标签特征; 然后结合交叉注意、关联计算、图推理和关系融合, 对全局和描述、款式和主语、结构和标签三层特征进行关联学习, 通过分层的关联和融合, 计算匹配得分得到服装的图文匹配结果. 在跨模态时尚检索基准数据集Fashion-gen上的实验结果表明, 所提方法能够提升跨模态时尚检索的精度, 较最新基线方法在双向检索前1的召回率R@1上分别提升了10.26%和14.22%.

     

    Abstract: Aiming at the problem that the image and text information of fashion clothing has fine granularity of clothing information, weak modal association ability and single matching perspective, which leads to inaccurate image-text matching in cross-modal fashion retrieval, a clothing hierarchical feature representation learning for cross-modal fashion retrieval was proposed. Firstly, taking the paired clothing images, text and labels as input, a clothing hierarchical feature representation module including CNN, Faster-RCNN, cascaded pyramid network and Bi-GRU is constructed to extract the global, style and structure features of the clothing image and the description, subject and label features of the clothing text respectively. Then, combining cross-attention, correlation calculation, graph reasoning and relation fusion, the association learning is performed on the three-layer features of global and description, style and subject, structure and label. Through hierarchical association and fusion, the matching score is calculated to obtain the image-text matching result of clothing. Experimental results on the cross-modal fashion retrieval benchmark dataset Fashion-gen show that the proposed method can improve the accuracy of cross-modal fashion retrieval. Compared with the latest baseline method, the recall rate of the top-1 (R@1) in bidirectional retrieval is increased by 10.26% and 14.22%, respectively.

     

/

返回文章
返回