找到神经机器翻译的最佳词汇尺寸

论文标题

找到神经机器翻译的最佳词汇尺寸

Finding the Optimal Vocabulary Size for Neural Machine Translation

论文作者

Gowda, Thamme, May, Jonathan

论文摘要

我们将神经机器翻译（NMT）作为自回归设置中的分类任务进行了分类，并分析了分类和自动收入组件的局限性。众所周知，在培训期间，分类器的表现更好。由于语言的Zipfian性质会导致类不平衡的类别，因此我们探索了其对NMT的影响。我们分析了各种词汇大小对NMT性能对具有多种数据大小的多种语言的影响，并揭示了为什么某些词汇大小比其他词汇大小更好的解释。

We cast neural machine translation (NMT) as a classification task in an autoregressive setting and analyze the limitations of both classification and autoregression components. Classifiers are known to perform better with balanced class distributions during training. Since the Zipfian nature of languages causes imbalanced classes, we explore its effect on NMT. We analyze the effect of various vocabulary sizes on NMT performance on multiple languages with many data sizes, and reveal an explanation for why certain vocabulary sizes are better than others.

下载PDF全文

下载文献需遵守相关版权规定

论文标题