低资源文本分类的离散潜在变量表示

论文标题

低资源文本分类的离散潜在变量表示

Discrete Latent Variable Representations for Low-Resource Text Classification

论文作者

Jin, Shuning, Wiseman, Sam, Stratos, Karl, Livescu, Karen

论文摘要

尽管文本的深层变量模型进行了许多工作，但使用连续的潜在变量，但离散的潜在变量很有趣，因为它们更容易解释，并且通常更有效。在这些变量上的精确边缘化是棘手的情况下，我们考虑了学习文本的离散潜在变量模型的几种方法。我们将学习表示的性能作为低资源文档和句子分类的功能进行比较。我们的最佳模型优于以前的最佳报告结果，并在这些低资源设置中连续表示，同时学习了更大的压缩表示。有趣的是，我们发现，在最低的资源制度中，硬性EM的摊销变体尤其表现出色。

While much work on deep latent variable models of text uses continuous latent variables, discrete latent variables are interesting because they are more interpretable and typically more space efficient. We consider several approaches to learning discrete latent variable models for text in the case where exact marginalization over these variables is intractable. We compare the performance of the learned representations as features for low-resource document and sentence classification. Our best models outperform the previous best reported results with continuous representations in these low-resource settings, while learning significantly more compressed representations. Interestingly, we find that an amortized variant of Hard EM performs particularly well in the lowest-resource regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题