以概念为中心的常识的培训前文本到文本变压器

论文标题

以概念为中心的常识的培训前文本到文本变压器

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

论文作者

Zhou, Wangchunshu, Lee, Dong-Ho, Selvam, Ravi Kiran, Lee, Seyeon, Lin, Bill Yuchen, Ren, Xiang

论文摘要

预训练的语言模型（PTLM）在一系列自然语言理解（NLU）和发电（NLG）任务中取得了令人印象深刻的结果。但是，当前的预训练预测（对于BERT风格的PTLM）和蒙版跨度填充（对于T5风格的PTLMS）诸如掩盖令牌预测（对于T5风格的PTLM）并没有明确地对需要对日常概念进行建模的关系认识，这对于需要常识或产生常识或生成的许多下游任务至关重要。为了增强以概念为中心的常识性知识增强PTLM，在本文中，我们提出了从文本中学习常识的生成和对比目标，并将其用作中间的自我监督的学习任务，以逐步培训PTLM（在DownStream DataSet上进行任务特定于特定于任务的微型调节之前）。此外，我们开发了一个联合培训框架，以统一生成和对比目标，以便它们可以相互加强。广泛的实验结果表明，我们的方法，概念意识语言模型（平静）可以将更多的常识性知识包含在预先训练的文本到文本变压器的参数中，而无需依赖外部知识图，从而在NLU和NLG任务上都能提供更好的性能。我们表明，虽然仅在相对较小的语料库上进行了几个步骤，但平静的表现优于基线方法，超过了基线方法，甚至可以与一些较大的PTLM相当，这表明平静可以作为一种普通的，插件和播放的方法来提高PTLM的常分推理能力。

Pre-trained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks. However, current pre-training objectives such as masked token prediction (for BERT-style PTLMs) and masked span infilling (for T5-style PTLMs) do not explicitly model the relational commonsense knowledge about everyday concepts, which is crucial to many downstream tasks that need common sense to understand or generate. To augment PTLMs with concept-centric commonsense knowledge, in this paper, we propose both generative and contrastive objectives for learning common sense from the text, and use them as intermediate self-supervised learning tasks for incrementally pre-training PTLMs (before task-specific fine-tuning on downstream datasets). Furthermore, we develop a joint pre-training framework to unify generative and contrastive objectives so that they can mutually reinforce each other. Extensive experimental results show that our method, concept-aware language model (CALM), can pack more commonsense knowledge into the parameters of a pre-trained text-to-text transformer without relying on external knowledge graphs, yielding better performance on both NLU and NLG tasks. We show that while only incrementally pre-trained on a relatively small corpus for a few steps, CALM outperforms baseline methods by a consistent margin and even comparable with some larger PTLMs, which suggests that CALM can serve as a general, plug-and-play method for improving the commonsense reasoning ability of a PTLM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题