树结构的关注，分层积累

论文标题

树结构的关注，分层积累

Tree-structured Attention with Hierarchical Accumulation

论文作者

Nguyen, Xuan-Phi, Joty, Shafiq, Hoi, Steven C. H., Socher, Richard

论文摘要

结合了诸如选区树之类的层次结构，已被证明对各种自然语言处理（NLP）任务有效。但是，很明显，基于最新的（SOTA）序列模型（例如变压器）努力固有地编码此类结构。另一方面，诸如Tree-LSTM之类的专用模型虽然对层次结构进行显式建模，但并不像变压器那样有效。在本文中，我们试图用“分层积累”弥合这一差距，以将析式树结构编码为恒定时间复杂性的自我注意力。我们的方法在四个IWSLT翻译任务和WMT'14英语 - 德语翻译任务中的表现优于SOTA方法。在三个文本分类任务上，它还对变压器和Tree-LSTM产生了改进。我们进一步证明，使用层次先验可以补偿数据短缺，并且我们的模型更喜欢短语级别的注意力，而不是令牌级别的注意力。

Incorporating hierarchical structures like constituency trees has been shown to be effective for various natural language processing (NLP) tasks. However, it is evident that state-of-the-art (SOTA) sequence-based models like the Transformer struggle to encode such structures inherently. On the other hand, dedicated models like the Tree-LSTM, while explicitly modeling hierarchical structures, do not perform as efficiently as the Transformer. In this paper, we attempt to bridge this gap with "Hierarchical Accumulation" to encode parse tree structures into self-attention at constant time complexity. Our approach outperforms SOTA methods in four IWSLT translation tasks and the WMT'14 English-German translation task. It also yields improvements over Transformer and Tree-LSTM on three text classification tasks. We further demonstrate that using hierarchical priors can compensate for data shortage, and that our model prefers phrase-level attentions over token-level attentions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题