基于深度确定性策略梯度算法的航空母舰舰载机导引路径规划与仿真

薛均晓; 陈金浦; 董博威; 曾晨; 徐明亮

doi:10.3724/SP.J.1089.2024-00348

基于深度确定性策略梯度算法的航空母舰舰载机导引路径规划与仿真

Path Planning and Simulation of Carrier-based Aircraft Based on DDPG

摘要

摘要: 舰载机的高效出动能力是衡量航空母舰综合战斗力的重要指标之一. 为了提高舰载机的出动架次率, 提出一种基于深度确定性策略梯度算法的舰载机导引路径规划方法. 首先将舰载机路径规划问题建模为序列决策问题, 构建航母甲板环境的状态空间和舰载机的连续动作空间; 然后基于舰载机运动模型、转弯角度、避障等多重约束, 设计了兼顾即时奖励和长期累积收益的奖励函数, 提高强化学习算法的收敛速度; 再结合相对速度障碍法和动态采样策略, 增强了算法的避障学习能力; 为了提高路径的平滑性, 采用B样条拟合算法对规划路径进行优化, 以适应真实作业任务的需求. 在Unity3D上进行仿真实验的结果表明, 所提方法在收敛速度、路径长度、平滑度等多个评价指标上优于对比算法.

Abstract: The efficient launch capability of carrier-based aircraft is one of the important indicators to measure the comprehensive combat effectiveness of aircraft carriers. In order to improve the launch rate of carrier-based aircraft, a guidance path planning method based on Deep Deterministic Policy Gradient is proposed. Firstly, the proposed method models the path planning problem of carrier-based aircraft as a sequential decision-making problem, constructing the state space of the aircraft deck environment and the continuous action space for carrier-based aircraft. Then, based on the carrier-based aircraft motion model, turning angle, obstacle avoidance and other multiple constraints, a reward function is designed that takes into account both immediate reward and long-term cumulative return, which improves the convergence speed of reinforcement learning algorithm. Combined with the relative velocity obstacle method and dynamic sampling strategy, the obstacle avoidance learning ability of the algorithm is enhanced. In order to improve the smoothness of the path, the B-spline curve fitting is used to optimize the planned path to meet the needs of real tasks. The results of simulation experiments on Unity3D show that the proposed method is superior to the compared algorithms in terms of convergence speed, path length, smoothness and other evaluation indicators.

HTML全文

参考文献(0)

施引文献

资源附件(0)