HEVC自适应去方块滤波器的VLSI设计与实现
VLSI Design and Implementation of Adaptive Deblocking Filter for HEVC
-
摘要: 去方块滤波是高效视频编码HEVC的重要组成部分, 能够有效地改善编码图像的主观质量, 是提升视频整体编码性能的重要手段之一. 针对HEVC硬件编码器中去方块滤波技术复杂度较高的问题, 为了在节省资源消耗的同时减少处理周期, 改善滤波效率, 提出HEVC自适应去方块滤波的硬件算法和VLSI架构. 首先基于HEVC编码结构的边界规则, 提出一种无需递归循环计算的快速边界判断算法, 降低硬件实现的复杂度; 然后基于上述边界判断结果, 提出一种可自主选择滤波边界进行去方块滤波的4级流水结构, 减少滤波处理周期; 最后将亮度和色度并行滤波, 并设计一种高度并行且兼容共享的存储架构, 改善滤波效率且节约存储资源消耗. 实验结果表明在TSMC90 nm工艺下, 所设计的去方块滤波结构的硬件面积比已有结构减少了60%左右, 并且最高能达到250 MHz的工作频率, 可满足8K@60帧/s的超高清视频的实时编码.Abstract: Deblocking filtering plays a crucial role in high efficiency video coding (HEVC) by effectively enhancing the subjective quality of encoded images. It is one of the important means to improve overall video encoding performance. To address the high complexity associated with deblocking filtering technology in HEVC hardware encoders, in order to save resource consumption, reduced processing cycles, and improved filtering efficiency, a hardware algorithm and VLSI architecture for an adaptive deblocking filter in HEVC is proposed. First, based on the boundary rules of HEVC coding structure, a fast boundary judgment algorithm without recursive loop calculation is proposed to reduce the complexity of hardware implementation. Furthermore, based on the above boundary judgment results, a four-stage pipeline structure that can independently select the filter boundary for deblocking filtering is proposed to reduce the filter processing cycle. Finally, the luma and chroma components are filtered in parallel, utilizing a highly parallel and compatible shared memory architecture. This design not only enhances filtering efficiency but also minimizes memory resource consumption. Experimental results demonstrate that the hardware area of the designed deblock filtering structure is about 60% less than that of the existing structure under the TSMC90 nm process, and the maximum working frequency can reach 250 MHz, which can meet the real-time encoding of 8K@60 fps ultra-high-definition video.