论文标题
Gudn:具有标签的强化策略的新型指南网络,用于极端多标签文本分类
GUDN: A novel guide network with label reinforcement strategy for extreme multi-label text classification
论文作者
论文摘要
在自然语言处理中,极端的多标签文本分类是一项新兴但必不可少的任务。极端多标签文本分类(XMTC)的问题是回忆一些来自极大标签集的文本的一些最相关的标签。大规模的预训练模型为这个问题带来了新的趋势。尽管大规模的预训练模型在此问题上取得了重大成就,但宝贵的微调方法尚未研究。尽管已经在XMTC中引入了标签语义,但是文本和标签之间的巨大语义差距尚未引起足够的关注。本文构建了一个新的指南网络(GUDN),以帮助微调预训练的模型以稍后指导分类。此外,Gudn使用原始标签语义与有用的标签增强策略相结合,以有效地探索文本和标签之间的潜在空间,从而缩小语义差距,从而可以进一步提高预测的准确性。实验结果表明,GUDN在Eurolex-4K上的表现优于最先进的方法,并且在其他流行数据集上具有竞争性结果。在另一个实验中,我们研究了输入长度对基于变压器模型的准确性的影响。我们的源代码在https://t.hk.uy/afsh上发布。
In natural language processing, extreme multi-label text classification is an emerging but essential task. The problem of extreme multi-label text classification (XMTC) is to recall some of the most relevant labels for a text from an extremely large label set. Large-scale pre-trained models have brought a new trend to this problem. Though the large-scale pre-trained models have made significant achievements on this problem, the valuable fine-tuned methods have yet to be studied. Though label semantics have been introduced in XMTC, the vast semantic gap between texts and labels has yet to gain enough attention. This paper builds a new guide network (GUDN) to help fine-tune the pre-trained model to instruct classification later. Furthermore, GUDN uses raw label semantics combined with a helpful label reinforcement strategy to effectively explore the latent space between texts and labels, narrowing the semantic gap, which can further improve predicted accuracy. Experimental results demonstrate that GUDN outperforms state-of-the-art methods on Eurlex-4k and has competitive results on other popular datasets. In an additional experiment, we investigated the input lengths' influence on the Transformer-based model's accuracy. Our source code is released at https://t.hk.uy/aFSH.