SURF算法并行优化及硬件实现
Parallel Optimized Method and Hardware Implementation of SURF Algorithm
-
摘要: 加速鲁棒特征(SURF)算法计算复杂度高、硬件实现需要大量的逻辑和存储资源,且描述符构建过程难以并行实现、无法满足实时性要求.针对上述问题,提出一种SURF算法的并行优化方法,并给出基于FPGA器件的硬件实现方法.首先采用圆形特征区域和径向梯度变换等方法实现旋转不变性,达到取消主方向计算和特征区域旋转的目的,实现SURF算法从积分图像计算到描述符生成的全过程并行优化;然后基于FPGA器件,采用多存储器和多路并行流水结构实时实现SURF优化算法.对比实验结果表明,SURF优化算法的匹配性能与SURF算法相当,虽然匹配点数比SURF算法低5%~20%,但匹配正确率比SURF算法高5%~10%;SURF优化算法硬件实现仅采用13.5MHz的时钟,对于分辨率为720×576的视频流,处理速度达到25帧/s,满足了实时性要求.Abstract: SURF algorithm has high computational complexity, and requires a lot of logic and memory resources. Moreover the process of descriptor extraction is difficult to implement in parallel and unable to meet real-time requirements. To solve the above disadvantages, an optimized SURF algorithm is put forward and the FPGA implementation is also provided. A rotation invariant and fully parallel optimized SURF algorithm is achieved using circular feature region and radial gradient transform method, which cancels the processes of main direction calculation and feature region rotation. Then the optimized SURF algorithm is implemented based on FPGA by using multi-memory and multi-channel parallel pipelined architecture. By experimental comparison, the matching performance of the optimized SURF algorithm is as good as the original SURF algorithm. Compared with the original SURF descriptor, the number of matching points reduces in 5% to 20%, but the accuracy of matching improves in 5% to 10%. The FPGA implementation of proposed SURF algorithm meets real-time requirements by using 13.5 MHz clock. For a video stream with resolution of 720×576, the processing speed reaches 25 fps.