论文标题

自发对话中的标点符号预测:我们可以用翻新的单词嵌入来减轻ASR错误吗?

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

论文作者

Augustyniak, Łukasz, Szymanski, Piotr, Morzy, Mikołaj, Zelasko, Piotr, Szymczak, Adrian, Mizgajski, Jan, Carmiel, Yishay, Dehak, Najim

论文摘要

自动语音识别(ASR)系统引入了单词错误,这通常会使标点符号预测模型混淆,将标点符号的恢复变成了一项具有挑战性的任务。这些错误通常采用同音词的形式。我们展示了在特定于域的数据上嵌入单词嵌入的翻新如何减轻ASR错误。我们的主要贡献是一种更好地对齐同音嵌入的方法,以及在标点预测任务中验证的方法的验证。与最先进的模型相比,我们记录了标点符号预测准确性的绝对改善(对于问号)至9%(对于期间)。

Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for better alignment of homonym embeddings and the validation of the presented method on the punctuation prediction task. We record the absolute improvement in punctuation prediction accuracy between 6.2% (for question marks) to 9% (for periods) when compared with the state-of-the-art model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源