论文标题
通过对抗文本分类的对抗培训学习可解释和离散的表示
Learning Interpretable and Discrete Representations with Adversarial Training for Unsupervised Text Classification
论文作者
论文摘要
越来越多地研究了来自未标记的文本数据的持续表示,以使半监督学习受益。尽管解释离散表示形式相对容易,但由于培训的困难,但未广泛探讨未标记的文本数据的离散表示形式。这项工作提出了Tigan,该Tigan学会将文本编码为两个分开的表示形式,包括离散代码和连续噪声,其中离散代码代表可解释的主题,并且噪声控制主题内的差异。 Tigan学到的离散代码可用于无监督的文本分类。与其他无监督的基线相比,拟议中的Tigan在六个不同的Corpora上取得了出色的性能。此外,该性能与最近提出的弱监督文本分类方法相当。提取的代表潜在主题的局部单词表明,蒂根(Tigan)学习了连贯且高度可解释的主题。
Learning continuous representations from unlabeled textual data has been increasingly studied for benefiting semi-supervised learning. Although it is relatively easier to interpret discrete representations, due to the difficulty of training, learning discrete representations for unlabeled textual data has not been widely explored. This work proposes TIGAN that learns to encode texts into two disentangled representations, including a discrete code and a continuous noise, where the discrete code represents interpretable topics, and the noise controls the variance within the topics. The discrete code learned by TIGAN can be used for unsupervised text classification. Compared to other unsupervised baselines, the proposed TIGAN achieves superior performance on six different corpora. Also, the performance is on par with a recently proposed weakly-supervised text classification method. The extracted topical words for representing latent topics show that TIGAN learns coherent and highly interpretable topics.