高级检索
曹广界, 杜慧敏, 王鹏超, 杜琴琴, 丁家隆. 分段三次多项式逼近初等函数的硬件实现[J]. 计算机辅助设计与图形学学报, 2016, 28(1): 180-187.
引用本文: 曹广界, 杜慧敏, 王鹏超, 杜琴琴, 丁家隆. 分段三次多项式逼近初等函数的硬件实现[J]. 计算机辅助设计与图形学学报, 2016, 28(1): 180-187.
Cao Guangjie, Du Huimin, Wang Pengchao, Du Qinqin, Ding Jialong. Hardware Implementation of Elementary Function Approximation by Using a Piecewise Cubic Polynomial Interpolator[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(1): 180-187.
Citation: Cao Guangjie, Du Huimin, Wang Pengchao, Du Qinqin, Ding Jialong. Hardware Implementation of Elementary Function Approximation by Using a Piecewise Cubic Polynomial Interpolator[J]. Journal of Computer-Aided Design & Computer Graphics, 2016, 28(1): 180-187.

分段三次多项式逼近初等函数的硬件实现

Hardware Implementation of Elementary Function Approximation by Using a Piecewise Cubic Polynomial Interpolator

  • 摘要: 针对分段二次多项式逼近初等函数需要较大的查找表面积和电路面积的问题,提出基于极大极小分段三次多项式逼近单精度浮点初等函数的算法,实现了单精度浮点倒数、平方根、平方根倒数、指数、对数和三角函数的逼近运算.首先缩小参数范围到一个特定的区间,并对该区间进行均匀分段,在每一分段区间上采用极大极小分段三次多项式逼近;然后在对应分段上综合考虑各种误差,在满足精度要求的情况下,通过多次Remes算法迭代优化出多项式系数的最优截取位宽,使查找表的面积最小;再对乘法器、平方器和立方器的输出位宽进行最优截取,使电路的面积最小;最后设计出硬件电路的整体架构.实验结果表明,与分段二次多项式逼近相比较,在同等精度要求下,该算法能够使电路时延减少17.25%,同时使查找表的面积减少53.60%、电路的总面积减少19.73%.

     

    Abstract: The elementary function approximation using piecewise quadratic polynomial interpolation requires larger area of the look-up table(LUT) and circuit. To solve the problem, this paper presents an algorithm for elementary function approximation in single-precision floating-point format, which is based on Minimax piecewise cubic polynomial approximation. The algorithm can efficiently achieve the approximation of reciprocal, square root, square root reciprocal, exponentials, logarithm, and trigonometric function in single-precision floating-point format. At the beginning of the algorithm, the range of parameters is narrowed to a specific interval, and this interval is evenly segmented. The Minimax piecewise cubic polynomial approximation is then used in each segment. The errors in corresponding segments are synthetically considered, and the optimal truncated bit width of coefficients is obtained by successive optimization using Remes algorithm, so as to reduce the area of LUT and circuit. Also, the output bit width of multiplier, squarer unit and cubic unit is truncated, in order to reduce the area of circuit. At last, the overall framework of hardware circuit is designed. The analysis and experimental results show that compared with piecewise quadratic polynomial approximation in the same precision, the circuit delay is reduced by 17.25%, the area of LUT is decreased by 53.60% and total area of circuits is reduced by 19.73%.

     

/

返回文章
返回