论文标题

跨语性质量检查是冰岛的单语QA的垫脚石

Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic

论文作者

Snæbjarnarson, Vésteinn, Einarsson, Hafsteinn

论文摘要

为英语以外的其他语言构建有效的开放式问题答案(开放质量质量)系统可能会很具有挑战性,这主要是由于缺乏标记的培训数据。我们提出了一种数据有效的方法来引导此类系统以使用英语以外的其他语言。我们的方法只需要在给定语言中,以及机器翻译的数据以及至少一个双语语言模型中有限的质量检查资源。为了评估我们的方法,我们为冰岛语言建立了这样的系统,并评估了Trivia风格数据集的性能。用于培训的语料库是英语的,但机器被翻译成冰岛。我们训练双语的冰岛/英语模型,以嵌入英语环境和冰岛问题,并在用密集语引入的方法之后(Lee等,2021)。最终的系统是冰岛语和英语之间的开放式域杂志系统。最后,该系统适用于仅冰岛开放式质量检查,以说明如何有效地创建开放式质量保证系统,而使用感兴趣的语言对策划数据集的访问有限。

It can be challenging to build effective open question answering (open QA) systems for languages other than English, mainly due to a lack of labeled data for training. We present a data efficient method to bootstrap such a system for languages other than English. Our approach requires only limited QA resources in the given language, along with machine-translated data, and at least a bilingual language model. To evaluate our approach, we build such a system for the Icelandic language and evaluate performance over trivia style datasets. The corpora used for training are English in origin but machine translated into Icelandic. We train a bilingual Icelandic/English language model to embed English context and Icelandic questions following methodology introduced with DensePhrases (Lee et al., 2021). The resulting system is an open domain cross-lingual QA system between Icelandic and English. Finally, the system is adapted for Icelandic only open QA, demonstrating how it is possible to efficiently create an open QA system with limited access to curated datasets in the language of interest.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源