论文标题
自然语言处理的知识有效的深度学习
Knowledge Efficient Deep Learning for Natural Language Processing
论文作者
论文摘要
深度学习已成为广泛自然语言处理应用程序的主力军。但是深度学习的大部分成功都取决于注释的示例。注释耗时且规模昂贵。在这里,我们对减少所需数量的注释数据的方法感兴趣 - 通过使学习方法更有效地知识,以使其更适用于低注释(低资源)设置。有多种经典方法可以使模型更有效地有效,例如多任务学习,转移学习,弱监督和无监督的学习等。本文的重点是将这种经典方法调整为现代深度学习模型和算法。 本文描述了四项旨在使机器学习模型更有效的作品。首先,我们建议知识丰富的深度学习模型(KRDL)作为将先验知识纳入深层模型的统一学习框架。特别是,我们将KRDL应用于马尔可夫逻辑网络构建,以降低弱监督。其次,我们采用KRDL模型来协助机器阅读模型,以找到可以支持其决定的正确证据句子。第三,我们研究了多语言环境中的知识传递技术,我们提出了一种可以根据双语词典改善预训练的多语言BERT的方法。第四,我们提出了一个用于语言建模的情节记忆网络,其中我们为预训练的GPT编码了大型外部知识。
Deep learning has become the workhorse for a wide range of natural language processing applications. But much of the success of deep learning relies on annotated examples. Annotation is time-consuming and expensive to produce at scale. Here we are interested in methods for reducing the required quantity of annotated data -- by making the learning methods more knowledge efficient so as to make them more applicable in low annotation (low resource) settings. There are various classical approaches to making the models more knowledge efficient such as multi-task learning, transfer learning, weakly supervised and unsupervised learning etc. This thesis focuses on adapting such classical methods to modern deep learning models and algorithms. This thesis describes four works aimed at making machine learning models more knowledge efficient. First, we propose a knowledge rich deep learning model (KRDL) as a unifying learning framework for incorporating prior knowledge into deep models. In particular, we apply KRDL built on Markov logic networks to denoise weak supervision. Second, we apply a KRDL model to assist the machine reading models to find the correct evidence sentences that can support their decision. Third, we investigate the knowledge transfer techniques in multilingual setting, where we proposed a method that can improve pre-trained multilingual BERT based on the bilingual dictionary. Fourth, we present an episodic memory network for language modelling, in which we encode the large external knowledge for the pre-trained GPT.