动态网络重新配置，用于使用深度加固学习的熵最大化

论文标题

动态网络重新配置，用于使用深度加固学习的熵最大化

Dynamic Network Reconfiguration for Entropy Maximization using Deep Reinforcement Learning

论文作者

Doorman, Christoffel, Darvariu, Victor-Alexandru, Hailes, Stephen, Musolesi, Mirco

论文摘要

网络理论中的一个关键问题是如何重新配置图表以优化可量化的目标。鉴于网络系统的普遍性，此类工作在各种情况下都具有广泛的实际应用，从药物和材料设计到电信。但是，可能的重新配置的庞大决策空间使该问题在计算上进行了密集。在本文中，我们提出了网络重新布线的问题，以优化指定的结构属性作为马尔可夫决策过程（MDP），在该过程中，决策者的修改预算是顺序执行的。然后，我们提出了一种基于深Q网络（DQN）算法和图神经网络（GNN）的通用方法，该方法可以有效地学习重新布线网络的策略。然后，我们讨论一个网络安全案例研究，即，在计算机网络重新配置问题上应用于入侵保护。在典型的情况下，攻击者可能会在他们计划穿透的系统上（部分）图。如果网络有效地“炒”，他们将无法浏览它，因为他们的先验知识将过时。这可以看作是一个熵最大化问题，其中的目标是增加网络的惊喜。实际上，熵是对网络拓扑难度的替代度量。我们证明了所提出的方法获得更好的熵增益的一般能力，而不是随机重新布线在合成和现实世界图上，同时计算便宜，并且能够比训练期间看到的概括到更大的图表。攻击场景的模拟证实了学习的重新布线策略的有效性。

A key problem in network theory is how to reconfigure a graph in order to optimize a quantifiable objective. Given the ubiquity of networked systems, such work has broad practical applications in a variety of situations, ranging from drug and material design to telecommunications. The large decision space of possible reconfigurations, however, makes this problem computationally intensive. In this paper, we cast the problem of network rewiring for optimizing a specified structural property as a Markov Decision Process (MDP), in which a decision-maker is given a budget of modifications that are performed sequentially. We then propose a general approach based on the Deep Q-Network (DQN) algorithm and graph neural networks (GNNs) that can efficiently learn strategies for rewiring networks. We then discuss a cybersecurity case study, i.e., an application to the computer network reconfiguration problem for intrusion protection. In a typical scenario, an attacker might have a (partial) map of the system they plan to penetrate; if the network is effectively "scrambled", they would not be able to navigate it since their prior knowledge would become obsolete. This can be viewed as an entropy maximization problem, in which the goal is to increase the surprise of the network. Indeed, entropy acts as a proxy measurement of the difficulty of navigating the network topology. We demonstrate the general ability of the proposed method to obtain better entropy gains than random rewiring on synthetic and real-world graphs while being computationally inexpensive, as well as being able to generalize to larger graphs than those seen during training. Simulations of attack scenarios confirm the effectiveness of the learned rewiring strategies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题