论文标题

图片:用于短语理解和语义搜索的文字中的短语数据集

PiC: A Phrase-in-Context Dataset for Phrase Understanding and Semantic Search

论文作者

Pham, Thang M., Yoon, Seunghyun, Bui, Trung, Nguyen, Anh

论文摘要

虽然上下文化的单词嵌入一直是事实上的标准,但是学习上下文化的短语嵌入越少越少,并且由于缺乏人类通知的基准测试而受到阻碍,该基准测试了机器对语言语义的理解,而这些基准的理解是在上下文句子或段落的情况下(而不是单独的短语)。为了填补这一空白,我们提出了图片 - 一个约28K名词短语的数据集,伴随着它们的上下文Wikipedia页面以及一套训练和评估短语嵌入的三个任务。 PIC的培训提高了排名模型的准确性,并明显推动跨度选择(即预测目标短语的开始和最终索引)近人类准确性,在疑问短语和段落中,在语义搜索上是95%精确匹配(EM)。有趣的是,我们发现这种令人印象深刻的性能的证据是因为SS模型学会了更好地捕获短语的共同含义,而不管其实际情况如何。 SOTA模型在两种情况下(〜60%EM)区分相同短语的两种感觉以及在相同情况下估计两个不同短语之间的相似性(〜70%EM)方面的表现较差。

While contextualized word embeddings have been a de-facto standard, learning contextualized phrase embeddings is less explored and being hindered by the lack of a human-annotated benchmark that tests machine understanding of phrase semantics given a context sentence or paragraph (instead of phrases alone). To fill this gap, we propose PiC -- a dataset of ~28K of noun phrases accompanied by their contextual Wikipedia pages and a suite of three tasks for training and evaluating phrase embeddings. Training on PiC improves ranking models' accuracy and remarkably pushes span-selection (SS) models (i.e., predicting the start and end index of the target phrase) near-human accuracy, which is 95% Exact Match (EM) on semantic search given a query phrase and a passage. Interestingly, we find evidence that such impressive performance is because the SS models learn to better capture the common meaning of a phrase regardless of its actual context. SotA models perform poorly in distinguishing two senses of the same phrase in two contexts (~60% EM) and in estimating the similarity between two different phrases in the same context (~70% EM).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源