论文标题
了解网络在线学习的局限性
Understanding the Limitations of Network Online Learning
论文作者
论文摘要
网络现象的研究(例如在线社交媒体中的互动)通常依赖于不完整的数据,因为这些现象是部分观察到的,或者数据太大或太昂贵,无法一次获取。分析不完整的数据会导致偏斜或误导性结果。在本文中,我们研究了通过节点查询完成部分观察到的网络的学习局限性。具体而言,我们研究了以下问题:给定(i)部分观察到的网络,(ii)能够查询节点的连接(例如,通过访问API)以及(iii)此类查询数量的预算,依次了解哪些节点以最大程度地提高可观察力。我们称此查询过程网络在线学习,并介绍一种称为NOL*的算法系列。这些算法学会选择哪些部分观察到的节点是基于参数化模型通过探索和开发过程在线训练的参数化模型。关于合成和现实世界网络的广泛实验表明,(i)可以顺序学习选择哪些节点最好在网络中查询,以及(ii)网络的某些宏观特性,例如程度分布和模块化结构,影响学习的潜力和最佳的随机探索。
Studies of networked phenomena, such as interactions in online social media, often rely on incomplete data, either because these phenomena are partially observed, or because the data is too large or expensive to acquire all at once. Analysis of incomplete data leads to skewed or misleading results. In this paper, we investigate limitations of learning to complete partially observed networks via node querying. Concretely, we study the following problem: given (i) a partially observed network, (ii) the ability to query nodes for their connections (e.g., by accessing an API), and (iii) a budget on the number of such queries, sequentially learn which nodes to query in order to maximally increase observability. We call this querying process Network Online Learning and present a family of algorithms called NOL*. These algorithms learn to choose which partially observed node to query next based on a parameterized model that is trained online through a process of exploration and exploitation. Extensive experiments on both synthetic and real world networks show that (i) it is possible to sequentially learn to choose which nodes are best to query in a network and (ii) some macroscopic properties of networks, such as the degree distribution and modular structure, impact the potential for learning and the optimal amount of random exploration.