引理的神经歧义和形态丰富的语言中语音的一部分

论文标题

引理的神经歧义和形态丰富的语言中语音的一部分

Neural disambiguation of lemma and part of speech in morphologically rich languages

论文作者

Quecedo, José María Hoya, Koppatz, Maximilian W., Furlan, Giacomo, Yangarber, Roman

论文摘要

我们考虑了歧义引理和歧义语言中含糊不清的语言的一部分的问题。我们提出了一种在上下文中使用大量未经通知的文本语料库和形态学分析仪的歧义词的方法 - 没有手动歧义或数据注释。我们假设形态分析仪对歧义单词产生了多种分析。这个想法是训练复发性神经网络，以形态分析仪为明确单词产生的输出。我们介绍了使用无手动注释的数据达到或超过最新技术的POS和LEMMA歧义歧义的性能 - 包括监督模型。我们评估了几种形态丰富的语言的方法。

We consider the problem of disambiguating the lemma and part of speech of ambiguous words in morphologically rich languages. We propose a method for disambiguating ambiguous words in context, using a large un-annotated corpus of text, and a morphological analyser -- with no manual disambiguation or data annotation. We assume that the morphological analyser produces multiple analyses for ambiguous words. The idea is to train recurrent neural networks on the output that the morphological analyser produces for unambiguous words. We present performance on POS and lemma disambiguation that reaches or surpasses the state of the art -- including supervised models -- using no manually annotated data. We evaluate the method on several morphologically rich languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题