融合移位窗口注意力的光流计算方法
Optical Flow Estimation Method with Shifted Windows Transformer
-
摘要: 针对端到端的光流计算方法容易受限于运动模糊、遮挡和大位移的问题,通过引入注意力机制实现对遮挡像素进行更准确的预测,提出一种融合移位窗口注意力的光流计算方法.首先使用移位窗口注意力对原有的特征图进行特征增强,获取更具全局自相似性的特征,弥补了卷积特征的局部性特点;然后使用移位窗口注意力进行相关体解析,包括2D运动向量解析和光流增量的计算,获得更准确的光流增量;最后引入遮挡图作为位置编码,在计算注意力时考虑更多的像素位置关系.实验结果表明,在Sintel数据集上,端到端的误差达到1.33;在FlyingChairs数据集上,单帧计算时间为69 ms,比全局运动聚合方法减少4.2%,超过了常见光流计算方法的精度和效率.Abstract: A fusion optical flow estimation method combining shifted windows Transformer(SWin) and convolution is proposed to address the problems of motion blur, occlusion and large displacement, which leads to more accurate results on occluded areas. Firstly, original feature map is processed by SWin to get the enhanced features which include more self-similarities between pixels and makes up for the local characteristics of convolution features; Then, correlation volume is parsed by SWin to get more accurate flow increment which include 2D motion feature parse and flow increment calculation; Finally, occlusion map is introduced to calculate the position embedding, which brings more pixel relationship to the calculation of attention. End point error on Sintel is 1.33; Average reference time on FlyingChairs is 69ms, 4.2% lower than Global Motion Aggregation, which outperforms common optical flow estimation methods.