论文标题
从拼写到语法:中文语法错误校正的新框架
From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction
论文作者
论文摘要
中国语法误差校正(CGEC)旨在从错误的序列中产生正确的句子,其中混合了不同种类的错误。本文将CGEC任务分为两个步骤,即拼写误差校正和语法误差校正。具体而言,我们提出了一种新颖的零击方法来拼写误差校正,这很简单但有效,获得了高精度,以避免管道结构的误差积累。为了处理语法误差校正,我们设计了言论部分(POS)特征和语义类特征,以增强神经网络模型,并提出一个辅助任务来预测目标句子的POS序列。我们提出的框架在不使用任何合成数据或数据增强方法的情况下,在CGEC数据集上达到了42.11 F0.5分数,这表现优于先前的最新幅度1.30点。此外,我们的模型会产生有意义的POS表示,以捕获不同的POS单词并传达合理的POS过渡规则。
Chinese Grammatical Error Correction (CGEC) aims to generate a correct sentence from an erroneous sequence, where different kinds of errors are mixed. This paper divides the CGEC task into two steps, namely spelling error correction and grammatical error correction. Specifically, we propose a novel zero-shot approach for spelling error correction, which is simple but effective, obtaining a high precision to avoid error accumulation of the pipeline structure. To handle grammatical error correction, we design part-of-speech (POS) features and semantic class features to enhance the neural network model, and propose an auxiliary task to predict the POS sequence of the target sentence. Our proposed framework achieves a 42.11 F0.5 score on CGEC dataset without using any synthetic data or data augmentation methods, which outperforms the previous state-of-the-art by a wide margin of 1.30 points. Moreover, our model produces meaningful POS representations that capture different POS words and convey reasonable POS transition rules.