论文标题
文本修订通过即时表示优化
Text Revision by On-the-Fly Representation Optimization
论文作者
论文摘要
文本修订是指自然语言生成任务的家庭,其中源和目标序列以表面形式具有中等相似之处,但在属性中有区别,例如文本形式和简单性。当前的最新方法将这些任务作为顺序到序列学习问题制定,依赖于大规模的并行训练语料库。在本文中,我们提出了文本修订的迭代入境编辑方法,该方法不需要并行数据。在这种方法中,我们只需通过掩盖语言建模和属性分类微调预训练的变压器。在推断过程中,每次迭代的编辑都通过更换两步跨度来实现。在第一步中,文本的分布式表示形式可以直接优化属性函数。在第二步中,掩盖了文本跨度,并在优化的表示形式下提出了另一个新的条件。关于两个典型和重要的文本修订任务,文本形式化和文本简化的经验实验,显示了我们方法的有效性。与文本简化有关的最先进的监督方法相比,它可以在\ url {https://github.com/jingjingjingli01/oreo}}上获得竞争性和甚至更好的性能,并且获得了更高的性能,并且获得了更好的性能。
Text revision refers to a family of natural language generation tasks, where the source and target sequences share moderate resemblance in surface form but differentiate in attributes, such as text formality and simplicity. Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems, which rely on large-scale parallel training corpus. In this paper, we present an iterative in-place editing approach for text revision, which requires no parallel data. In this approach, we simply fine-tune a pre-trained Transformer with masked language modeling and attribute classification. During inference, the editing at each iteration is realized by two-step span replacement. At the first step, the distributed representation of the text optimizes on the fly towards an attribute function. At the second step, a text span is masked and another new one is proposed conditioned on the optimized representation. The empirical experiments on two typical and important text revision tasks, text formalization and text simplification, show the effectiveness of our approach. It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification, and gains better performance than strong unsupervised methods on text formalization \footnote{Code and model are available at \url{https://github.com/jingjingli01/OREO}}.