论文标题
从不完整的数据中预测恶意事件的挑战
Challenges in Forecasting Malicious Events from Incomplete Data
论文作者
论文摘要
准确预测网络攻击的能力将使组织能够减轻其日益增长的威胁并避免其造成的财务损失和破坏。但是网络攻击有多可预测?研究人员试图将外部数据结合起来 - 从脆弱性披露到Twitter和DarkWeb上的讨论 - 机器学习算法以了解即将进行的网络攻击的指标。但是,成功的网络攻击代表了所有尝试攻击的一小部分:绝大多数被停止或被部署在目标的安全设备过滤。正如我们在本文中所示,过滤的过程降低了网络攻击的可预测性。与整个数据相比,确实渗透到目标的防御过程的少量攻击遵循了一个不同的生成过程,这对于预测模型来说更难学习。这可能是由于这样的事实,即产生的时间序列还取决于过滤过程,除了原始时间序列依赖于所有不同的因素。我们从经验上量化了由于使用来自两个组织的现实世界数据过滤而导致的可预测性损失。我们的工作确定了预测高度过滤数据的网络攻击的限制。
The ability to accurately predict cyber-attacks would enable organizations to mitigate their growing threat and avert the financial losses and disruptions they cause. But how predictable are cyber-attacks? Researchers have attempted to combine external data -- ranging from vulnerability disclosures to discussions on Twitter and the darkweb -- with machine learning algorithms to learn indicators of impending cyber-attacks. However, successful cyber-attacks represent a tiny fraction of all attempted attacks: the vast majority are stopped, or filtered by the security appliances deployed at the target. As we show in this paper, the process of filtering reduces the predictability of cyber-attacks. The small number of attacks that do penetrate the target's defenses follow a different generative process compared to the whole data which is much harder to learn for predictive models. This could be caused by the fact that the resulting time series also depends on the filtering process in addition to all the different factors that the original time series depended on. We empirically quantify the loss of predictability due to filtering using real-world data from two organizations. Our work identifies the limits to forecasting cyber-attacks from highly filtered data.