长文档分类的分层神经网络方法

论文标题

长文档分类的分层神经网络方法

Hierarchical Neural Network Approaches for Long Document Classification

论文作者

Khandve, Snehal, Wagh, Vedangi, Wani, Apurva, Joshi, Isha, Joshi, Raviraj

论文摘要

文本分类算法研究单词或短语之间的复杂关系，并试图推断文档的解释。在过去的几年中，这些算法取得了巨大进展。变形金刚架构和句子编码者已证明可以在自然语言处理任务上给出卓越的结果。但是，这些体系结构的主要限制是它们的文本不超过几百个单词。在本文中，我们探讨了长期文档分类的层次转移学习方法。我们在层次设置中采用来自变形金刚（BERT）的预训练的通用句子编码器（使用）和双向编码器表示，以有效地捕获更好的表示。我们提出的模型在概念上很简单，在其中我们将输入数据分为块，然后通过BERT和使用的基本模型将其传递。然后，通过包含LSTMS或CNN的浅神经网络来传播每个块的输出表示形式，用于对文本数据进行分类。在6个基准数据集上评估了这些扩展。我们显示使用 + CNN/LSTM的性能优于其独立基线。而BERT + CNN/LSTM与独立对应物的表现相当。但是，层次的BERT模型仍然是可取的，因为它避免了BERT注意机制的二次复杂性。除了分层方法外，这项工作还提供了不同深度学习算法（如使用，bert，han，longformer和bigbird）的比较，以进行长文档分类。 Longformer方法在大多数数据集上始终如一地表现良好。

Text classification algorithms investigate the intricate relationships between words or phrases and attempt to deduce the document's interpretation. In the last few years, these algorithms have progressed tremendously. Transformer architecture and sentence encoders have proven to give superior results on natural language processing tasks. But a major limitation of these architectures is their applicability for text no longer than a few hundred words. In this paper, we explore hierarchical transfer learning approaches for long document classification. We employ pre-trained Universal Sentence Encoder (USE) and Bidirectional Encoder Representations from Transformers (BERT) in a hierarchical setup to capture better representations efficiently. Our proposed models are conceptually simple where we divide the input data into chunks and then pass this through base models of BERT and USE. Then output representation for each chunk is then propagated through a shallow neural network comprising of LSTMs or CNNs for classifying the text data. These extensions are evaluated on 6 benchmark datasets. We show that USE + CNN/LSTM performs better than its stand-alone baseline. Whereas the BERT + CNN/LSTM performs on par with its stand-alone counterpart. However, the hierarchical BERT models are still desirable as it avoids the quadratic complexity of the attention mechanism in BERT. Along with the hierarchical approaches, this work also provides a comparison of different deep learning algorithms like USE, BERT, HAN, Longformer, and BigBird for long document classification. The Longformer approach consistently performs well on most of the datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题