泰国wav2vec2.0带有普通voice v8

论文标题

泰国wav2vec2.0带有普通voice v8

Thai Wav2Vec2.0 with CommonVoice V8

论文作者

Phatthiyaphaibun, Wannaphong, Chaksangchaichot, Chompakorn, Limkonchotiwat, Peerat, Chuangsuwanich, Ekapol, Nutanong, Sarana

论文摘要

最近，将音频转换为文本的系统自动语音识别（ASR）在机器学习社区引起了很多关注。因此，HuggingFace发布了许多公开模型。但是，这些ASR模型中的大多数都提供英文。泰国只有少数模型可用。此外，大多数泰国ASR模型都是封闭的，现有开源型号的性能缺乏稳健性。为了解决这个问题，我们使用泰语CommonVoice Corpus V8训练新的ASR模型在预训练的XLSR-WAV2VEC模型上，并训练Trigram语言模型以提高我们的ASR模型的性能。我们希望我们的模型对泰国的个人和ASR社区有益。

Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. We hope that our models will be beneficial to individuals and the ASR community in Thailand.

下载PDF全文

下载文献需遵守相关版权规定

论文标题