基于强化学习的基于空中战斗机动生成

论文标题

基于强化学习的基于空中战斗机动生成

Reinforcement Learning based Air Combat Maneuver Generation

论文作者

Ozbek, Muhammed Murat, Koyuncu, Emre

论文摘要

人工智能技术的出现铺平了在空战部门进行的许多研究的方式。院士和许多其他研究人员对无人机的自动操纵决定进行了研究。精明的研究产生了一些结果，但是包括加强学习（RL）在内的决定更加有效。已经进行了许多研究和实验，以使代理以最佳方式达到其目标，最突出的是遗传算法（GA），恒星，RRT和其他各种优化技术。但是，强化学习是其成功而闻名的。在Darpha Alpha Dogfight试验中，强化学习占据了由波音训练的真正的资深F16人类飞行员。该继任模型是由Heron Systems开发的。成就之后，强化学习引起了极大的关注。在这项研究中，我们瞄准了具有Dubin车辆动态特性的无人机，可以使用双延迟的深层确定性策略梯度（TD3）在最佳路径中以二维空间移动到目标，并用于经验重播后视观点体验重播（她）。我们在两个不同的环境和使用的模拟上进行了测试。

The advent of artificial intelligence technology paved the way of many researches to be made within air combat sector. Academicians and many other researchers did a research on a prominent research direction called autonomous maneuver decision of UAV. Elaborative researches produced some outcomes, but decisions that include Reinforcement Learning(RL) came out to be more efficient. There have been many researches and experiments done to make an agent reach its target in an optimal way, most prominent are Genetic Algorithm(GA) , A star, RRT and other various optimization techniques have been used. But Reinforcement Learning is the well known one for its success. In DARPHA Alpha Dogfight Trials, reinforcement learning prevailed against a real veteran F16 human pilot who was trained by Boeing. This successor model was developed by Heron Systems. After this accomplishment, reinforcement learning bring tremendous attention on itself. In this research we aimed our UAV which has a dubin vehicle dynamic property to move to the target in two dimensional space in an optimal path using Twin Delayed Deep Deterministic Policy Gradients (TD3) and used in experience replay Hindsight Experience Replay(HER).We did tests on two different environments and used simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题