芬兰的问题回答和问题产生

论文标题

芬兰的问题回答和问题产生

Question Answering and Question Generation for Finnish

论文作者

Kylliäinen, Ilmari, Yangarber, Roman

论文摘要

语言建模领域的最新进展改善了有关的最新答案（QA）和问题生成（QG）。但是，现代神经模型，其基准和培训数据集的开发主要集中在英语上。与许多其他语言一样，芬兰人面临着大型质量检查/QG模型培训资源的短缺，这阻止了尝试最先进的QA/QA/QG微调方法。我们介绍了与Finnish一起使用的第一个神经质量检查和QG模型。要训练模型，我们会自动翻译小队数据集，然后使用归一化方法来减少翻译过程中创建的有问题数据的量。使用合成数据，以及TYDI-QA数据集的芬兰分区，我们将基于变压器的几个模型调整为QA和QG，并评估其性能。据我们所知，由此产生的数据集是第一个用于Finnish的大规模QA/QG资源。本文还为Finnish语言QA和QG设定了初始基准。

Recent advances in the field of language modeling have improved the state-of-the-art in question answering (QA) and question generation (QG). However, the development of modern neural models, their benchmarks, and datasets for training them has mainly focused on English. Finnish, like many other languages, faces a shortage of large QA/QG model training resources, which has prevented experimenting with state-of-the-art QA/QG fine-tuning methods. We present the first neural QA and QG models that work with Finnish. To train the models, we automatically translate the SQuAD dataset and then use normalization methods to reduce the amount of problematic data created during the translation. Using the synthetic data, together with the Finnish partition of the TyDi-QA dataset, we fine-tune several transformer-based models to both QA and QG and evaluate their performance. To the best of our knowledge, the resulting dataset is the first large-scale QA/QG resource for Finnish. This paper also sets the initial benchmarks for Finnish-language QA and QG.

下载PDF全文

下载文献需遵守相关版权规定

论文标题