神经机器翻译的胶囊转换器

论文标题

神经机器翻译的胶囊转换器

Capsule-Transformer for Neural Machine Translation

论文作者

Duan, Sufeng, Cao, Juncheng, Zhao, Hai

论文摘要

变压器从其多头自发项网络（SAN）的关键设计中受益匪浅，该设计通过将给定的输入转换为不同的子空间来从各个角度提取信息。但是，其简单的线性转换汇总策略仍然可能无法完全捕获更深层次的上下文化信息。在本文中，我们提出了胶囊转换器，该胶囊变换器将线性转换扩展到更通用的胶囊路由算法中，通过将SAN作为胶囊网络的特殊情况。因此，所得的胶囊变换器能够通过不同的头和单词之间的信息聚集获得更好的注意力分布表示。具体而言，我们看到SAN中的一组注意力重量为低层胶囊。通过应用迭代胶囊路由算法，它们可以进一步汇总到具有更深层次的上下文化信息的高层胶囊中。广泛使用的机器翻译数据集的实验结果表明，我们提出的胶囊转换器的表现明显优于强大的变压器基线。

Transformer hugely benefits from its key design of the multi-head self-attention network (SAN), which extracts information from various perspectives through transforming the given input into different subspaces. However, its simple linear transformation aggregation strategy may still potentially fail to fully capture deeper contextualized information. In this paper, we thus propose the capsule-Transformer, which extends the linear transformation into a more general capsule routing algorithm by taking SAN as a special case of capsule network. So that the resulted capsule-Transformer is capable of obtaining a better attention distribution representation of the input sequence via information aggregation among different heads and words. Specifically, we see groups of attention weights in SAN as low layer capsules. By applying the iterative capsule routing algorithm they can be further aggregated into high layer capsules which contain deeper contextualized information. Experimental results on the widely-used machine translation datasets show our proposed capsule-Transformer outperforms strong Transformer baseline significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题