Clothing Hierarchical Feature Representation and Association Learning for Cross-Modal Fashion Retrieval

Jiang Aiping; Liu Li; Fu Xiaodong; Liu Lijun; Peng Wei

doi:10.3724/SP.J.1089.2023-00263

Jiang Aiping, Liu Li, Fu Xiaodong, Liu Lijun, Peng Wei. Clothing Hierarchical Feature Representation and Association Learning for Cross-Modal Fashion Retrieval[J]. Journal of Computer-Aided Design & Computer Graphics, 2025, 37(4): 654-667. DOI: 10.3724/SP.J.1089.2023-00263

Citation:

Clothing Hierarchical Feature Representation and Association Learning for Cross-Modal Fashion Retrieval

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Aiming at the problems that the images and texts of fashion clothing have a single matching perspective, fine granularity of clothing information and weak modal association ability, which lead to the inaccurate image-text matching of cross-modal fashion retrieval, a clothing hierarchical feature representation and association learning method for cross-modal fashion retrieval was proposed. First, with pairs of clothing images, text and labels as input, hierarchical visual and textual feature representation is carried out by constructing hierarchical feature representation module of clothing, and the global, style and structural features of clothing images and the description, subject and label features of clothing text are extracted respectively. Then, based on cross attention and vector similarity, the hierarchical association calculation is carried out to obtain the three-layer initial relationship of the clothing image-text pair, and through hierarchical association learning that combines relational reasoning and aggregation, the three-layer relationship of global and description, style and subject, structure and label is obtained. Finally, the correlation scores of the three layer are calculated, and the image-text matching results of the clothing are output. Experimental results on the cross-modal fashion retrieval benchmark dataset Fashion-gen show that the proposed method can improve the accuracy of cross-modal fashion retrieval. Compared with the latest baseline method, the recall rate of the top-1 (R@1) in bidirectional retrieval is increased by 10.26 percentage points and 14.22 percentage points, respectively.

FullText(HTML)

References (28)

Cited By

Turn off MathJax

Article Contents

Clothing Hierarchical Feature Representation and Association Learning for Cross-Modal Fashion Retrieval

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content