论文标题
通过人工数据改善文本关系建模
Improving Text Relationship Modeling with Artificial Data
论文作者
论文摘要
数据增强使用人工创建的示例来支持监督的机器学习,为最终的模型增添了稳健性,并有助于计算有限的标记数据可用性。我们在数字库中应用和评估一种合成数据方法来分类,从而生成具有数字库中常见关系的人造书籍,但并不容易从现有的元数据中推断出来。我们发现,对于书籍之间的整个零件关系分类,合成数据将深度神经网络分类器提高了91%。此外,我们考虑合成数据从完全人造培训数据中学习有用的新文本关系类别的能力。
Data augmentation uses artificially-created examples to support supervised machine learning, adding robustness to the resulting models and helping to account for limited availability of labelled data. We apply and evaluate a synthetic data approach to relationship classification in digital libraries, generating artificial books with relationships that are common in digital libraries but not easier inferred from existing metadata. We find that for classification on whole-part relationships between books, synthetic data improves a deep neural network classifier by 91%. Further, we consider the ability of synthetic data to learn a useful new text relationship class from fully artificial training data.