论文标题

Doublemix:简单的基于插值的数据扩展用于文本分类

DoubleMix: Simple Interpolation-Based Data Augmentation for Text Classification

论文作者

Chen, Hui, Han, Wei, Yang, Diyi, Poria, Soujanya

论文摘要

本文提出了一种简单而有效的基于插值的数据增强方法,称为Doublemix,以提高模型在文本分类中的鲁棒性。 Doublemix首先利用几个简单的增强操作为每个培训数据生成几个扰动的样本,然后使用扰动的数据和原始数据在神经模型的隐藏空间中进行两步插值。具体而言,它首先将扰动的数据混合到合成样本中,然后将原始数据和合成的扰动数据混合在一起。 Doublemix通过学习隐藏空间中的“移动”功能来增强模型的鲁棒性。在六个文本分类基准数据集上,我们的方法的表现优于几种流行的文本增强方法,包括令牌级,句子级别和隐藏级数据增强技术。同样,低资源设置的实验表明,当培训数据稀缺时,我们的方法一致地改善了模型的性能。广泛的消融研究和案例研究证实,我们方法的每个组成部分都有助于最终表现,并表明我们的方法在具有挑战性的反例中表现出卓越的表现。此外,视觉分析表明,我们方法生成的文本特征是高度可解释的。我们的本文代码可以在https://github.com/declare-lab/doublemix.git上找到。

This paper proposes a simple yet effective interpolation-based data augmentation approach termed DoubleMix, to improve the robustness of models in text classification. DoubleMix first leverages a couple of simple augmentation operations to generate several perturbed samples for each training data, and then uses the perturbed data and original data to carry out a two-step interpolation in the hidden space of neural models. Concretely, it first mixes up the perturbed data to a synthetic sample and then mixes up the original data and the synthetic perturbed data. DoubleMix enhances models' robustness by learning the "shifted" features in hidden space. On six text classification benchmark datasets, our approach outperforms several popular text augmentation methods including token-level, sentence-level, and hidden-level data augmentation techniques. Also, experiments in low-resource settings show our approach consistently improves models' performance when the training data is scarce. Extensive ablation studies and case studies confirm that each component of our approach contributes to the final performance and show that our approach exhibits superior performance on challenging counterexamples. Additionally, visual analysis shows that text features generated by our approach are highly interpretable. Our code for this paper can be found at https://github.com/declare-lab/DoubleMix.git.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源