联合视口预测和超分辨率重建的实时VR全景视频流式传输方法
Live VR Panoramic Video Streaming Method Combing Viewport Prediction and Super-Resolution
-
摘要: 现有基于视口预测的VR全景视频流式传输方法尚未有效考虑VR视频的时间维度信息, 并且忽略了客户端在提高视频重建质量上应发挥的作用. 为进一步提高传输系统的综合性能及用户体验质量, 提出一种联合视口预测和超分辨率重建的实时VR全景视频流式传输方法. 该方法在服务器端通过将所提出的时间非局部注意力模块TNAM嵌入GhostNet中对VR全景视频进行全局上下文建模, 从而捕获时间维度上的长距离依赖; 在客户端利用所提出的轻量级VR全景视频超分辨率重建模型LVRSR对来自服务器端预测视口内的次要内容进行质量增强及投影失真优化. 在VR全景视频用户头部运动数据集上的实验结果表明, 该方法的平均视口预测精度和平均带宽占用分别为95.6%和52.9%, 相较于5种代表性传输方法, 该方法能够获得更高的视口预测精度和更低的带宽占用, 同时具有良好的视频重建质量和较低的计算资源消耗.Abstract: The existing VR panoramic video streaming methods based on viewport prediction have not effectively considered the temporal dimension information of VR video, and ignored the role that the client end should play in improving the video reconstruction quality. To further improve the comprehensive performance of the streaming system and the user quality of experience, a live VR panoramic video streaming method that combines viewport prediction and super-resolution reconstruction is proposed. At the server end, the method captures the long-distance dependency in the temporal dimension by embedding the proposed temporal non-local attention module TNAM into GhostNet to model the global context of VR panoramic video; At the client end, the proposed lightweight VR panoramic video super-resolution reconstruction model LVRSR is used to enhance the quality and optimize the projection distortion of the secondary content within the predicted viewport from the server end. The experimental results on the VR panoramic video user head mo-tion dataset show that the average viewport prediction accuracy and average bandwidth usage of the method are 95.6% and 52.9%, respectively. Compared with five representative streaming methods, the method can achieve higher viewport prediction accuracy and lower bandwidth usage, while having good video reconstruction quality and low computational resource consumption.