SAC-AP：基于软演员的评论家的深入强化学习，以进行警报优先级

论文标题

SAC-AP：基于软演员的评论家的深入强化学习，以进行警报优先级

SAC-AP: Soft Actor Critic based Deep Reinforcement Learning for Alert Prioritization

论文作者

Chavali, Lalitha, Gupta, Tanay, Saxena, Paresh

论文摘要

入侵检测系统（IDS）会产生大量的错误警报，这使得很难检查真正的阳性。因此，警报优先级在确定哪些警报中从ID生成的大量警报中进行调查时起着至关重要的作用。最近，与其他最先进的方法相比，基于深度强化学习（DRL）的深层确定性策略梯度（DDPG）非政策方法已显示出更好的结果以获得警报优先级。但是，DDPG容易出现过度拟合的问题。此外，它的勘探能力也很差，因此不适合随机环境的问题。为了解决这些局限性，我们提出了一种基于最大的熵增强学习框架，旨在使预期的奖励最大化，同时最大程度地提高熵。此外，对手和防守者之间的相互作用被建模为零和游戏，并利用双oracle框架来获得近似的混合策略NASH平衡（MSNE）。 SAC-AP找到了强大的警报调查政策，并计算纯粹的策略最佳反应，以应对对手的混合策略。我们介绍了SAC-AP的整体设计，并与其他最先进的警报优先级方法相比，评估其性能。我们认为辩护人的损失，即辩护人无法调查由于攻击而触发的警报，作为绩效指标。我们的结果表明，与基于DDPG的警报优先级方法相比，SAC-AP的防守者损失下降了30％，因此提供了更好的保护防止入侵。此外，当将SAC-AP与其他传统的警报优先级方法进行比较时，好处甚至更高，包括统一，Gain，Rio和Suricata。

Intrusion detection systems (IDS) generate a large number of false alerts which makes it difficult to inspect true positives. Hence, alert prioritization plays a crucial role in deciding which alerts to investigate from an enormous number of alerts that are generated by IDS. Recently, deep reinforcement learning (DRL) based deep deterministic policy gradient (DDPG) off-policy method has shown to achieve better results for alert prioritization as compared to other state-of-the-art methods. However, DDPG is prone to the problem of overfitting. Additionally, it also has a poor exploration capability and hence it is not suitable for problems with a stochastic environment. To address these limitations, we present a soft actor-critic based DRL algorithm for alert prioritization (SAC-AP), an off-policy method, based on the maximum entropy reinforcement learning framework that aims to maximize the expected reward while also maximizing the entropy. Further, the interaction between an adversary and a defender is modeled as a zero-sum game and a double oracle framework is utilized to obtain the approximate mixed strategy Nash equilibrium (MSNE). SAC-AP finds robust alert investigation policies and computes pure strategy best response against opponent's mixed strategy. We present the overall design of SAC-AP and evaluate its performance as compared to other state-of-the art alert prioritization methods. We consider defender's loss, i.e., the defender's inability to investigate the alerts that are triggered due to attacks, as the performance metric. Our results show that SAC-AP achieves up to 30% decrease in defender's loss as compared to the DDPG based alert prioritization method and hence provides better protection against intrusions. Moreover, the benefits are even higher when SAC-AP is compared to other traditional alert prioritization methods including Uniform, GAIN, RIO and Suricata.

下载PDF全文

下载文献需遵守相关版权规定

论文标题