Grappa：语法提出的餐桌语义解析预训练

论文标题

Grappa：语法提出的餐桌语义解析预训练

GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing

论文作者

Yu, Tao, Wu, Chien-Sheng, Lin, Xi Victoria, Wang, Bailin, Tan, Yi Chern, Yang, Xinyi, Radev, Dragomir, Socher, Richard, Xiong, Caiming

论文摘要

我们提出了格拉帕（Grappa），这是一种有效的表语义解析预训练方法，它在文本和表格数据的联合表示中学习了组成感应偏见。我们通过从现有的文本到SQL数据集引起的无同步语法（SCFG）构建了高质量表上的合成问题-SQL对。我们使用新型的文本式链接目标对合成数据进行了预先训练，该目标可以预测每个问题-SQL对在SQL中的表字段的句法作用。为了维持模型表示真实数据的能力，我们还包括在几个现有的表和语言数据集上包括蒙版语言建模（MLM），以正规化预训练过程。在四个受欢迎的全面监督和弱监督的桌子语义解析基准上，格拉帕（Grappa）的表现极大地超过了罗伯塔（Roberta）的特征代表层，并为所有这些层建立了新的最先进的结果。

We present GraPPa, an effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data. We construct synthetic question-SQL pairs over high-quality tables via a synchronous context-free grammar (SCFG) induced from existing text-to-SQL datasets. We pre-train our model on the synthetic data using a novel text-schema linking objective that predicts the syntactic role of a table field in the SQL for each question-SQL pair. To maintain the model's ability to represent real-world data, we also include masked language modeling (MLM) over several existing table-and-language datasets to regularize the pre-training process. On four popular fully supervised and weakly supervised table semantic parsing benchmarks, GraPPa significantly outperforms RoBERTa-large as the feature representation layers and establishes new state-of-the-art results on all of them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题