论文标题
测试立陶宛新闻集群的预训练的变压器模型
Testing pre-trained Transformer models for Lithuanian news clustering
论文作者
论文摘要
变压器深度学习架构的最新介绍在各种自然语言处理任务中取得了突破。但是,非英语语言无法通过英语文本预先培训的模型来利用这种新机会。这是通过重点在多语言模型上的研究改变的,其中口语不太受益人是主要受益者。我们将预先训练的多语言BERT,XLM-R和较旧的学习文本表示方法作为立陶宛新闻集群任务的编码。我们的结果表明,可以对公开训练的多语言变压器模型进行微调以超越单词向量,但得分仍比经过特殊训练的doc2vec嵌入量低得多。
A recent introduction of Transformer deep learning architecture made breakthroughs in various natural language processing tasks. However, non-English languages could not leverage such new opportunities with the English text pre-trained models. This changed with research focusing on multilingual models, where less-spoken languages are the main beneficiaries. We compare pre-trained multilingual BERT, XLM-R, and older learned text representation methods as encodings for the task of Lithuanian news clustering. Our results indicate that publicly available pre-trained multilingual Transformer models can be fine-tuned to surpass word vectors but still score much lower than specially trained doc2vec embeddings.