基于深度强化学习的舰载机动态避障方法

薛均晓; 孔祥燕; 郭毅博; 鲁爱国; 李鉴; 万曦; 徐明亮

doi:10.3724/SP.J.1089.2021.18637

基于深度强化学习的舰载机动态避障方法

Dynamic Obstacle Avoidance Method for Carrier Aircraft Based on Deep Reinforcement Learning

摘要

摘要: 针对高度异构、动态的航母甲板作业场景中的舰载机避障问题,提出一种结合预测算法和深度强化学习的避障方法.该方法包含场景建模、奖励模型和轨迹预测模型等模块.首先基于智能体状态和动作空间对航母甲板场景进行建模;然后利用最小二乘法对场景中动态障碍物的位置进行实时轨迹预测,并构造了包含路径预测模块的深度强化学习方法——环境预测深度Q网络(PDQN);最后利用该方法实现航母甲板作业场景中的舰载机动态避障.利用Python绘图集Matplotlib进行仿真实验,实验数据结果表明,相比于Q-learning,SARSA等方法,所提方法的准确率提升了15%~25%,路径长度短9%~39%,平均奖励值高30%~100%,收敛速度快1~2倍且训练平稳后准确率的标准差小2%~50%.

Abstract: Aiming at the obstacle avoidance problem of carrier aircraft in the highly heterogeneous and dynamic aircraft carrier deck operation scene,a deep reinforcement learning obstacle avoidance method combined with a prediction algorithm is proposed.The method includes scene modeling,reward model and trajectory prediction model.First,the aircraft carrier deck scene is modeled based on the agent state and action space.Then the least square method is used to predict the position of dynamic obstacles in the scene in real-time and a deep reinforcement learning algorithm—environmental prediction deep Q network(PDQN)is constructed which includes a path prediction module.Finally,the algorithm is used to achieve dynamic obstacle avoidance in the aircraft carrier deck operation scene.The Python drawing set Matplotlib is used for simulation experiments.The experimental results show that,compared with Q-learning,SARSA,the accuracy of the proposed method is improved by 15%–25%,the path length is shorter by 9%–39%,the average reward value is higher by 30%–100%,the convergence speed is 1–2 times faster,and the standard deviation of the accuracy after training is small by 2%–50%.

HTML全文

参考文献(0)

施引文献

资源附件(0)