自发对话中的标点符号预测：我们可以用翻新的单词嵌入来减轻ASR错误吗？

论文标题

自发对话中的标点符号预测：我们可以用翻新的单词嵌入来减轻ASR错误吗？

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

论文作者

Augustyniak, Łukasz, Szymanski, Piotr, Morzy, Mikołaj, Zelasko, Piotr, Szymczak, Adrian, Mizgajski, Jan, Carmiel, Yishay, Dehak, Najim

论文摘要

自动语音识别（ASR）系统引入了单词错误，这通常会使标点符号预测模型混淆，将标点符号的恢复变成了一项具有挑战性的任务。这些错误通常采用同音词的形式。我们展示了在特定于域的数据上嵌入单词嵌入的翻新如何减轻ASR错误。我们的主要贡献是一种更好地对齐同音嵌入的方法，以及在标点预测任务中验证的方法的验证。与最先进的模型相比，我们记录了标点符号预测准确性的绝对改善（对于问号）至9％（对于期间）。

Automatic Speech Recognition (ASR) systems introduce word errors, which often confuse punctuation prediction models, turning punctuation restoration into a challenging task. These errors usually take the form of homonyms. We show how retrofitting of the word embeddings on the domain-specific data can mitigate ASR errors. Our main contribution is a method for better alignment of homonym embeddings and the validation of the presented method on the punctuation prediction task. We record the absolute improvement in punctuation prediction accuracy between 6.2% (for question marks) to 9% (for periods) when compared with the state-of-the-art model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题