神经语言建模用隐式缓存指针

论文标题

神经语言建模用隐式缓存指针

Neural Language Modeling With Implicit Cache Pointers

论文作者

Li, Ke, Povey, Daniel, Khudanpur, Sanjeev

论文摘要

为神经语言模型（LMS）提出了一种缓存启发的方法，以改善长期依赖性，并从长篇小说中更好地预测稀有单词。这种方法是基于注意力的指针机制的更简单替代方法，它使神经LMS能够从最近的历史中复制单词。该方法无需使用注意力和混合结构，仅涉及将历史上代表单词的额外令牌附加到神经LM的输出层并相应地修改训练主管。引入了一个记忆授权单元，以学习特别可能重复的单词。我们试验基于反复的神经网络和基于变压器的LMS。对宾夕法尼亚州立树库和Wikitext-2上的困惑评估显示，所提出的模型的表现优于LSTM和LSTM，具有基于注意的指针机制，并且在稀有词上更有效。 n-最佳的撤退实验表明，它有益于非常罕见和频繁的单词。但是，对于拟议的模型以及具有基于注意的指针机制的其他两个模型，可以获得良好的总体降低，这是一项挑战。

A cache-inspired approach is proposed for neural language models (LMs) to improve long-range dependency and better predict rare words from long contexts. This approach is a simpler alternative to attention-based pointer mechanism that enables neural LMs to reproduce words from recent history. Without using attention and mixture structure, the method only involves appending extra tokens that represent words in history to the output layer of a neural LM and modifying training supervisions accordingly. A memory-augmentation unit is introduced to learn words that are particularly likely to repeat. We experiment with both recurrent neural network- and Transformer-based LMs. Perplexity evaluation on Penn Treebank and WikiText-2 shows the proposed model outperforms both LSTM and LSTM with attention-based pointer mechanism and is more effective on rare words. N-best rescoring experiments on Switchboard indicate that it benefits both very rare and frequent words. However, it is challenging for the proposed model as well as two other models with attention-based pointer mechanism to obtain good overall WER reductions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题