论文标题
与自组织地图的多标签流分类
Multi-label Stream Classification with Self-Organizing Maps
论文作者
论文摘要
已经提出了几种学习算法,用于离线多标签分类。但是,在流量监控,社交网络和传感器等领域的应用程序连续生产数据,即所谓的数据流,从而对批量多标签学习提出了挑战。由于数据流的分布缺乏平稳性,因此需要新的算法才能在线适应此类更改(概念漂移)。同样,在现实的应用程序中,在无限延迟标签的场景中发生了变化,那里的到达实例的真实类别永远不可用。我们提出了一种基于自组织图的在线无监督的增量方法,用于使用无限延迟标签的多标签流分类。在分类阶段,我们使用K-neart最邻居策略来计算地图中获胜的神经元,从而通过在线调整神经元重量向量和数据集标签的基数来适应概念漂移。我们使用贝叶斯规则和每个神经元的输出来预测每个实例的标签,从而适应流中类的概率和条件概率。使用合成数据集和真实数据集的实验表明,在固定和概念漂移方案中,我们的方法在文献中具有高度竞争力。
Several learning algorithms have been proposed for offline multi-label classification. However, applications in areas such as traffic monitoring, social networks, and sensors produce data continuously, the so called data streams, posing challenges to batch multi-label learning. With the lack of stationarity in the distribution of data streams, new algorithms are needed to online adapt to such changes (concept drift). Also, in realistic applications, changes occur in scenarios of infinitely delayed labels, where the true classes of the arrival instances are never available. We propose an online unsupervised incremental method based on self-organizing maps for multi-label stream classification with infinitely delayed labels. In the classification phase, we use a k-nearest neighbors strategy to compute the winning neurons in the maps, adapting to concept drift by online adjusting neuron weight vectors and dataset label cardinality. We predict labels for each instance using the Bayes rule and the outputs of each neuron, adapting the probabilities and conditional probabilities of the classes in the stream. Experiments using synthetic and real datasets show that our method is highly competitive with several ones from the literature, in both stationary and concept drift scenarios.