图形神经网络的信息伪标记很少

论文标题

图形神经网络的信息伪标记很少

Informative Pseudo-Labeling for Graph Neural Networks with Few Labels

论文作者

Li, Yayong, Yin, Jie, Chen, Ling

论文摘要

图形神经网络（GNN）已在图形上获得了半监督节点分类的最新结果。然而，如何有效学习很少的标签的挑战仍然不足。作为普遍的半监督方法之一，已经提出了伪标记来明确解决标签稀缺问题。它的目的是通过高信任地使用伪标记的未标记节点来增强训练集，以便在自我训练周期中重新培训监督模型。但是，现有的伪标记方法通常遭受两个主要缺点。首先，他们倾向于通过仅选择高信誉的无标记节点而无需评估其信息性来保守地扩展标签。不幸的是，这些高信心节点通常会带有给定标签的重叠信息，从而导致模型重新训练的较小改进。其次，这些方法将伪标记与同一损耗函数和真实标签一起融合在一起，而忽略了它们对分类任务的独特贡献。在本文中，我们提出了一个新颖的伪标记框架，称为Infognn，以促进具有极少标签的GNN的学习。我们的关键思想是对最有用的节点进行伪造，通过相互信息最大化可以最大程度地代表本地社区。为了减轻伪标记引起的潜在标签噪声和级别的不平衡问题，我们还仔细地设计了具有类平衡的正则化的广义交叉熵损失，以将产生的伪标签纳入模型重新训练。在六个现实世界图数据集上进行的广泛实验表明，我们所提出的方法显着优于最先进的基线和图形上强大的自我监督方法。

Graph Neural Networks (GNNs) have achieved state-of-the-art results for semi-supervised node classification on graphs. Nevertheless, the challenge of how to effectively learn GNNs with very few labels is still under-explored. As one of the prevalent semi-supervised methods, pseudo-labeling has been proposed to explicitly address the label scarcity problem. It aims to augment the training set with pseudo-labeled unlabeled nodes with high confidence so as to re-train a supervised model in a self-training cycle. However, the existing pseudo-labeling approaches often suffer from two major drawbacks. First, they tend to conservatively expand the label set by selecting only high-confidence unlabeled nodes without assessing their informativeness. Unfortunately, those high-confidence nodes often convey overlapping information with given labels, leading to minor improvements for model re-training. Second, these methods incorporate pseudo-labels to the same loss function with genuine labels, ignoring their distinct contributions to the classification task. In this paper, we propose a novel informative pseudo-labeling framework, called InfoGNN, to facilitate learning of GNNs with extremely few labels. Our key idea is to pseudo label the most informative nodes that can maximally represent the local neighborhoods via mutual information maximization. To mitigate the potential label noise and class-imbalance problem arising from pseudo labeling, we also carefully devise a generalized cross entropy loss with a class-balanced regularization to incorporate generated pseudo labels into model re-training. Extensive experiments on six real-world graph datasets demonstrate that our proposed approach significantly outperforms state-of-the-art baselines and strong self-supervised methods on graphs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题