SPARQL语义解析的现代基线

论文标题

SPARQL语义解析的现代基线

Modern Baselines for SPARQL Semantic Parsing

论文作者

Banerjee, Debayan, Nair, Pranav Ajit, Kaur, Jivat Neet, Usbeck, Ricardo, Biemann, Chris

论文摘要

在这项工作中，我们专注于从自然语言问题中生成SPARQL查询的任务，然后可以在知识图（KGS）上执行。我们假设已经提供了黄金实体和关系，其余的任务是与Sparql词汇一起以正确的顺序排列它们，并输入令牌以产生正确的SPARQL查询。到目前为止，尚未对此任务进行深入探索，因此我们使用BERT嵌入式的BART，T5和PGNS（指针生成器网络）进行了实验，在PLM时代寻找此任务的新基线，以实现DBPEDIA和WIKIDATA KGS。我们表明T5需要特殊的输入令牌化，但在LC-Quad 1.0和LC-Quad 2.0数据集上产生最先进的性能，并且从以前的工作中优于特定于任务的模型。此外，这些方法可以使语义解析对于需要复制到输出查询的一部分的问题，从而在KG语义解析中启用了新的范式。

In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not been explored in depth on this task so far, so we experiment with BART, T5 and PGNs (Pointer Generator Networks) with BERT embeddings, looking for new baselines in the PLM era for this task, on DBpedia and Wikidata KGs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题