民族服装图像描述生成的局部属性注意网络

张绪辉; 刘骊; 付晓东; 刘利军; 彭玮

民族服装图像描述生成的局部属性注意网络

Local Attribute Attention Network for Minority Clothing Image Caption Generation

摘要

摘要: 针对民族服装图像属性信息复杂、类间相似度高且语义属性与视觉信息关联性低导致图像描述生成结果不准确的问题, 提出民族服装图像描述生成的局部属性注意网络. 首先构建包含55个类别、30 000幅图像, 约3 600MB的民族服装图像描述生成数据集; 同时定义民族服装208种局部关键属性词汇和30 089条文本信息, 通过局部属性学习模块进行视觉特征提取和文本信息嵌入, 并采用多实例学习得到局部属性; 然后基于双层长短期记忆网络定义包含语义、视觉、门控注意力的注意力感知模块, 将局部属性、基于属性的视觉特征和文本编码信息进行融合, 优化得到民族服装图像描述生成结果. 在构建的民族服装描述生成数据集上的实验结果表明, 所提方法能够生成包含民族类别、服装风格等关键属性的图像描述, 较已有方法在精确性指标Bleu和语义丰富程度指标CIDEr上分别提升了1.4%和2.2%.

Abstract: Aiming at the problem of inaccurate image caption generation results due to the complex attribute information, high similarity of classes and low correlation between semantic attributes and visual information of minority clothing images, a local attribute attention network for minority clothing image caption generation is proposed. Firstly, a national clothing image description generation dataset containing 55 categories, 30 000 images, and about 3 600 MB is constructed; at the same time, 208 kinds of local key attribute vocabulary and 30 089 text information of minority clothing are defined, and visual features are extracted through the local attribute learning module and text information embedding and use multi-instance learning to obtain local attributes. Then, an attention-aware module including semantics, vision, and gated attention was defined based on the double-layer long short-term memory network. And the image caption generation results of minority clothing were optimized by combining the local attributes, attribute-based visual features, and text encoding information. Experimental results on our established dataset for minority clothing image caption generation show that the proposed methods can generate image captions including key attributes such as minority category and clothing style, and can improve the accuracy index Bleu and semantic richness index CIDEr by 1.4% and 2.2% respectively compared with existing methods.

HTML全文

参考文献(0)

施引文献

资源附件(0)