论文标题
泰国wav2vec2.0带有普通voice v8
Thai Wav2Vec2.0 with CommonVoice V8
论文作者
论文摘要
最近,将音频转换为文本的系统自动语音识别(ASR)在机器学习社区引起了很多关注。因此,HuggingFace发布了许多公开模型。但是,这些ASR模型中的大多数都提供英文。泰国只有少数模型可用。此外,大多数泰国ASR模型都是封闭的,现有开源型号的性能缺乏稳健性。为了解决这个问题,我们使用泰语CommonVoice Corpus V8训练新的ASR模型在预训练的XLSR-WAV2VEC模型上,并训练Trigram语言模型以提高我们的ASR模型的性能。我们希望我们的模型对泰国的个人和ASR社区有益。
Recently, Automatic Speech Recognition (ASR), a system that converts audio into text, has caught a lot of attention in the machine learning community. Thus, a lot of publicly available models were released in HuggingFace. However, most of these ASR models are available in English; only a minority of the models are available in Thai. Additionally, most of the Thai ASR models are closed-sourced, and the performance of existing open-sourced models lacks robustness. To address this problem, we train a new ASR model on a pre-trained XLSR-Wav2Vec model with the Thai CommonVoice corpus V8 and train a trigram language model to boost the performance of our ASR model. We hope that our models will be beneficial to individuals and the ASR community in Thailand.