论文标题

小心!通过自我发起的负责人进行无监督的选区解析

Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

论文作者

Li, Bowen, Kim, Taeuk, Amplayo, Reinald Kim, Keller, Frank

论文摘要

基于变压器的预训练的语言模型(PLM)在许多任务中都大大改善了NLP中最新技术的状态。这引起了分析句法知识PLM学习的重大兴趣。以前解决此问题的方法受到限制,主要是使用测试套件或探针。在这里,我们提出了一种全新的无监督解析方法,从PLM注意力头上提取选区树。我们根据其固有的属性对变压器的注意力头进行排名,并创建一个高级头部的合奏以产生最终树。我们的方法适用于低资源语言,因为它不依赖开发集,这对于注释可能很昂贵。我们的实验表明,如果没有开发设置,则提出的方法通常优于现有方法。我们的无监督解析器也可以用作分析语法PLMS隐式学习的工具。为此,我们使用通过方法引起的解析树来训练神经PCFG,并将其与源自人类注销的树库中的语法进行比较。

Transformer-based pre-trained language models (PLMs) have dramatically improved the state of the art in NLP across many tasks. This has led to substantial interest in analyzing the syntactic knowledge PLMs learn. Previous approaches to this question have been limited, mostly using test suites or probes. Here, we propose a novel fully unsupervised parsing approach that extracts constituency trees from PLM attention heads. We rank transformer attention heads based on their inherent properties, and create an ensemble of high-ranking heads to produce the final tree. Our method is adaptable to low-resource languages, as it does not rely on development sets, which can be expensive to annotate. Our experiments show that the proposed method often outperform existing approaches if there is no development set present. Our unsupervised parser can also be used as a tool to analyze the grammars PLMs learn implicitly. For this, we use the parse trees induced by our method to train a neural PCFG and compare it to a grammar derived from a human-annotated treebank.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源