有助于完成农业失语句：使用基于神经语言学的合成数据集的转移学习方法

论文标题

有助于完成农业失语句：使用基于神经语言学的合成数据集的转移学习方法

Assistive Completion of Agrammatic Aphasic Sentences: A Transfer Learning Approach using Neurolinguistics-based Synthetic Dataset

论文作者

Misra, Rohit, Mishra, Sapna S, Gandhi, Tapan K.

论文摘要

损坏下额回（Broca的区域）可能会引起农业症状，尽管患者能够理解，但缺乏形成完整句子的能力。这种无能会导致沟通差距，从而在日常生活中遇到困难。辅助设备的使用可以帮助缓解这些问题，并使患者能够有效沟通。但是，由于缺乏大规模的语言缺陷研究，对这种辅助技术的研究相对有限。在这项工作中，我们提出了两项贡献，旨在重新启动该领域的研发。首先，我们提出了一个模型，该模型使用了关于失语症患者的小规模研究中的语言特征，并从语法正确的数据集中生成了合成性失调的大规模数据集。我们表明，我们的合成数据集的平均话语长度，名词/动词比和简单/复杂的句子比对应于失语性语音的报告特征。此外，我们证明了如何利用合成数据集开发辅助设备以进行失语症患者。使用生成的数据集对预训练的T5变压器进行微调，以建议5个校正的句子，并以失语性话语作为输入。我们使用BLEU和余弦语义相似性评分评估T5模型的功效。 BLEU得分为0.827/1.00，获得了0.904/1.00的肯定结果。这些结果为以下概念奠定了坚实的基础：基于小规模研究的合成数据集可用于开发有效的辅助技术。

Damage to the inferior frontal gyrus (Broca's area) can cause agrammatic aphasia wherein patients, although able to comprehend, lack the ability to form complete sentences. This inability leads to communication gaps which cause difficulties in their daily lives. The usage of assistive devices can help in mitigating these issues and enable the patients to communicate effectively. However, due to lack of large scale studies of linguistic deficits in aphasia, research on such assistive technology is relatively limited. In this work, we present two contributions that aim to re-initiate research and development in this field. Firstly, we propose a model that uses linguistic features from small scale studies on aphasia patients and generates large scale datasets of synthetic aphasic utterances from grammatically correct datasets. We show that the mean length of utterance, the noun/verb ratio, and the simple/complex sentence ratio of our synthetic datasets correspond to the reported features of aphasic speech. Further, we demonstrate how the synthetic datasets may be utilized to develop assistive devices for aphasia patients. The pre-trained T5 transformer is fine-tuned using the generated dataset to suggest 5 corrected sentences given an aphasic utterance as input. We evaluate the efficacy of the T5 model using the BLEU and cosine semantic similarity scores. Affirming results with BLEU score of 0.827/1.00 and semantic similarity of 0.904/1.00 were obtained. These results provide a strong foundation for the concept that a synthetic dataset based on small scale studies on aphasia can be used to develop effective assistive technology.

下载PDF全文

下载文献需遵守相关版权规定

论文标题