高级检索
崔怀磊, 刘丽, 张化祥, 刘冬梅, 马跃, 王泽康. 基于语义一致性的细节保持图像生成方法[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1497-1505. DOI: 10.3724/SP.J.1089.2022.19724
引用本文: 崔怀磊, 刘丽, 张化祥, 刘冬梅, 马跃, 王泽康. 基于语义一致性的细节保持图像生成方法[J]. 计算机辅助设计与图形学学报, 2022, 34(10): 1497-1505. DOI: 10.3724/SP.J.1089.2022.19724
Cui Huailei, Liu Li, Zhang Huaxiang, Liu Dongmei, Ma Yue, Wang Zekang. Detail Preserving Image Generation Method Based on Semantic Consistency[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1497-1505. DOI: 10.3724/SP.J.1089.2022.19724
Citation: Cui Huailei, Liu Li, Zhang Huaxiang, Liu Dongmei, Ma Yue, Wang Zekang. Detail Preserving Image Generation Method Based on Semantic Consistency[J]. Journal of Computer-Aided Design & Computer Graphics, 2022, 34(10): 1497-1505. DOI: 10.3724/SP.J.1089.2022.19724

基于语义一致性的细节保持图像生成方法

Detail Preserving Image Generation Method Based on Semantic Consistency

  • 摘要: 生成对抗网络被广泛应用于文本生成图像领域,但在生成过程中容易导致部分图形缺失必要的细节.为了生成包含更多细节特征的细粒度图像,提高文本与图像的语义一致性,提出一种基于语义一致性的细节保持图像生成方法.首先,挖掘文本描述中的潜在语义,引入特征提取模块选择文本中的重要单词和句子,获取单词和句子之间的语义结构信息;其次,构造细节保持模块关联图像与文本信息,结合混合注意力机制,定位特定文本对应的图像区域,将定位区域与文本信息关联,增强和优化生成图像的细节;最后,融合语义损失和感知损失,将句子的图像和单词的子区域映射到共同语义空间.实验结果表明,在CUB数据集上,IS和FID指标分别达到4.77和15.47;在COCO数据集上,IS和FID指标分别达到35.56和27.63.

     

    Abstract: Generative adversarial network is widely used in the field of image generation,but it is easy to lose some image details in the process of image generation.In this paper a detail preserving image generation method based on semantic consistency is proposed to generate fine-grained images that contain more detailed features and improve the semantic consistency of image-text.Firstly,in order to fully explore the potential semantics in text description,feature extraction module is introduced to select important words and sentences,and extract semantic structure feature information between words and sentences.Secondly,the detail preserving module combined with attention mechanism is used to associate the image with the text in-formation,and effectively selects the regions corresponding to the given text.Finally,semantic loss and perceptual loss are utilized to optimize the image-text consistency at the word level and reduce the randomness of image generation.The experimental results show that the IS and FID indexes reach 4.77 and 15.47 on CUB dataset,and 35.56 and 27.63 on COCO dataset,respectively.

     

/

返回文章
返回