论文标题
与判别性跨度对齐
Iterative Paraphrastic Augmentation with Discriminative Span Alignment
论文作者
论文摘要
我们介绍了一种基于句子级词汇限制的释义和判别性跨度对齐的新型论词增强策略。我们的方法允许大规模扩展现有资源,或者从一个小型的,手动生产的种子语料库中快速创造新资源。我们说明了我们在伯克利弗拉梅内特项目上的框架,这是一种大规模的语言,理解了跨越二十年人类劳动的努力。基于大约四天的对齐模型收集训练数据,大约有一天的平行计算,我们会自动生成495,300个独特的(帧,触发)组合,在上下文中注释,Framenet V1.7的大约50倍膨胀。
We introduce a novel paraphrastic augmentation strategy based on sentence-level lexically constrained paraphrasing and discriminative span alignment. Our approach allows for the large-scale expansion of existing resources, or the rapid creation of new resources from a small, manually-produced seed corpus. We illustrate our framework on the Berkeley FrameNet Project, a large-scale language understanding effort spanning more than two decades of human labor. Based on roughly four days of collecting training data for the alignment model and approximately one day of parallel compute, we automatically generate 495,300 unique (Frame, Trigger) combinations annotated in context, a roughly 50x expansion atop FrameNet v1.7.