高级检索
唐亮, 潘月斗, 王嘉琪, 骆祖莹. 片上P/G网求解算法及其GPU上的并行化[J]. 计算机辅助设计与图形学学报, 2014, 26(7): 1203-1210.
引用本文: 唐亮, 潘月斗, 王嘉琪, 骆祖莹. 片上P/G网求解算法及其GPU上的并行化[J]. 计算机辅助设计与图形学学报, 2014, 26(7): 1203-1210.
Tang Liang, Pan Yuedou, Wang Jiaqi, Luo Zuying. Algorithm Parallelization of on-Die Power/Ground Network Solving Based on GPU Parallel Computing[J]. Journal of Computer-Aided Design & Computer Graphics, 2014, 26(7): 1203-1210.
Citation: Tang Liang, Pan Yuedou, Wang Jiaqi, Luo Zuying. Algorithm Parallelization of on-Die Power/Ground Network Solving Based on GPU Parallel Computing[J]. Journal of Computer-Aided Design & Computer Graphics, 2014, 26(7): 1203-1210.

片上P/G网求解算法及其GPU上的并行化

Algorithm Parallelization of on-Die Power/Ground Network Solving Based on GPU Parallel Computing

  • 摘要: 为了得到片上电源线/地线网络(P/G网)快速而准确的求解算法,根据结构化供电网的局部性效应,重新分析了连续过松弛迭代法(SOR)和变向隐含迭代法(ADI)在P/G网中的求解效率及并行性,提出了利于GPU加速的并行算法:G_RBSOR和G_ADI.它们均采用规则的数据结构,以利于GPU并行读写数据,并采用合并归约来并行计算迭代结束标志位.为了避免GPU计算的数据冲突,G_RBSOR算法采用棋盘格方式对电路节点进行红黑分类,并对红黑节点进行交错松弛.实验结果表明,在不损失精度的前提下,与各自对应的CPU串行算法相比,G_RBSOR和G_ADI算法均取得了超过50倍的加速效果;与高效的P/G分析串行求解算法ICCG相比,也取得了超过5倍的加速效果.

     

    Abstract: In order to study fast and accurate algorithms for power/ground network(P/G network)analyses,based on the locality effect of structure P/G networks,this work rethinks the efficiency and parallelism of successive over relaxation(SOR) algorithm and alternating direction implicit(ADI)algorithm.And then it proposes the optimized GPU-friendly parallel algorithms:G_RBSOR and G_ADI.The algorithms both use the regular data structure to facilitate GPU parallel data reading/writing.And they both use the merging reduction technique for GPU parallel computing to fast calculate the iteration-ending flags,too.Furthermore,in order to avoid the data collision in GPU parallel calculating,G_RBSOR uses the checkerboard strategy to classify all P/G network nodes into red and black groups and then,relax red nodes and black nodes step-by-step.Experimental results show that without any precision penalty,G_RBSOR and G_ADI algorithms can achieve more than 50 X speedup over their serial CPU counterparts.In comparison with the efficient serial algorithm ICCG,both can also achieve more than 5X speedup.

     

/

返回文章
返回