语言生成的离散变异注意模型

论文标题

语言生成的离散变异注意模型

Discrete Variational Attention Models for Language Generation

论文作者

Fang, Xianghong, Bai, Haoli, Xu, Zenglin, Lyu, Michael, King, Irwin

论文摘要

各种自动编码器已被广泛应用于自然语言的产生，但是，有两个长期存在的问题：信息不足和后部崩溃。前者源于以下事实：只有编码器中的最后一个隐藏状态才能转换为潜在空间，这不足以汇总数据。后者是由于重建损失与目标函数中的KL差异之间的不平衡量表的结果。为了解决这些问题，在本文中，我们提出了离散的变分注意模型，其由于语言中的离散性质而在注意机制上进行了分类分布。我们的方法在从观察值中捕获顺序依赖性之前将其与自动回归相结合，这可以增强语言生成的潜在空间。此外，由于离散性的财产，我们提出的方法的培训不会后倒塌。此外，我们仔细地分析了离散潜在空间的优势，其与常见高斯分布相比。关于语言生成的广泛实验表明，与最先进的对应物相比，我们提出的方法具有较高的优势。

Variational autoencoders have been widely applied for natural language generation, however, there are two long-standing problems: information under-representation and posterior collapse. The former arises from the fact that only the last hidden state from the encoder is transformed to the latent space, which is insufficient to summarize data. The latter comes as a result of the imbalanced scale between the reconstruction loss and the KL divergence in the objective function. To tackle these issues, in this paper we propose the discrete variational attention model with categorical distribution over the attention mechanism owing to the discrete nature in languages. Our approach is combined with an auto-regressive prior to capture the sequential dependency from observations, which can enhance the latent space for language generation. Moreover, thanks to the property of discreteness, the training of our proposed approach does not suffer from posterior collapse. Furthermore, we carefully analyze the superiority of discrete latent space over the continuous space with the common Gaussian distribution. Extensive experiments on language generation demonstrate superior advantages of our proposed approach in comparison with the state-of-the-art counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题