论文标题
HFT-ONLSTM:分层和微调多标签文本分类
HFT-ONLSTM: Hierarchical and Fine-Tuning Multi-label Text Classification
论文作者
论文摘要
现实世界中的许多重要分类问题包括层次结构或分类学中的大量紧密相关类别。在层次结构或分类法中组织的大量紧密相关类别的层次多标签文本分类(HMTC)已成为一个具有挑战性的问题。在本文中,我们提出了一种基于有序神经LSTM神经网络的层次结构和微调方法,缩写为HFT-ONLSTM,以更准确地逐级HMTC。首先,我们提出了一种基于父类别标签和文本数据来学习关节嵌入的新方法,以准确捕获类别标签和文本的联合特征。其次,采用了一种微调技术来训练参数,以使文本分类在上层中导致较低的分类有助于分类。最后,与最先进的层次结构和扁平的多标签文本分类方法相比,基于广泛的实验进行了全面的分析,而实验结果表明,我们的HFT-Onlstm方法胜过这些方法,特别是降低了计算成本,同时又降低了卓越的性能。
Many important classification problems in the real-world consist of a large number of closely related categories in a hierarchical structure or taxonomy. Hierarchical multi-label text classification (HMTC) with higher accuracy over large sets of closely related categories organized in a hierarchy or taxonomy has become a challenging problem. In this paper, we present a hierarchical and fine-tuning approach based on the Ordered Neural LSTM neural network, abbreviated as HFT-ONLSTM, for more accurate level-by-level HMTC. First, we present a novel approach to learning the joint embeddings based on parent category labels and textual data for accurately capturing the joint features of both category labels and texts. Second, a fine tuning technique is adopted for training parameters such that the text classification results in the upper level should contribute to the classification in the lower one. At last, the comprehensive analysis is made based on extensive experiments in comparison with the state-of-the-art hierarchical and flat multi-label text classification approaches over two benchmark datasets, and the experimental results show that our HFT-ONLSTM approach outperforms these approaches, in particular reducing computational costs while achieving superior performance.