目标尺度自适应Transformer的图像彩色化方法
Image Colorization via Object Scale Adaptive Transformer
-
摘要: 针对主流的基于Transformer的彩色化方法仅关注图像的全局上下文建模, 对图像场景中目标尺度信息的有效性研究较少的问题, 通过构建一个端到端学习的编码-解码网络, 提出一种目标尺度自适应Transformer的灰度图像彩色化方法. 在编码器部分采用目标尺度自适应Transformer块, 利用缩放因子自适应地选取模块, 捕获目标尺度信息和计算注意力层的缩放因子, 实现目标尺度自适应的注意力计算, 全局建模不同尺度目标的同时降低计算复杂度; 在解码器部分, 结合语义类别感知模块加强解码特征的语义约束, 使得模型能够产生正确语义理解, 避免颜色溢出、紊乱等现象. 在ImageNet和COCO-Stuff数据集上的大量实验结果表明, 与现有的彩色化方法相比, 所提方法可以生成更生动、语义更匹配的彩色图像.Abstract: The existing image colorization methods based on Transformer solely focus on global contextual modeling of images and have less investigation on the validness of object scale information in image scenes. We propose an object scale adaptive Transformer-based grayscale image colorization method by constructing an end-to-end trained encoder-decoder network architecture. At the encoder stage, we adopt object scale adaptive Transformer blocks that use a scaling factor adaptive selection module to capture the object scale information and compute the reduction ratio of the attention layers. This allows Transformer blocks to compute object-scale adaptive attention, facilitating global modeling of objects with different scales while reducing computational complexity. At the decoder stage, we add a semantic-aware module to strengthen the semantic constraints of decoding features, which enable the model to understand object sematic well, avoiding the color bleeding and color ambiguity artifacts. Extensive experiments on ImageNet and COCO-Stuff datasets demonstrate that our method can generate more vivid and semantically matched color images compared with state-of-the-art methods.