论文标题
通过对比度学习的低资源语言的bitext挖掘
Bitext Mining for Low-Resource Languages via Contrastive Learning
论文作者
论文摘要
挖掘低资源语言的高质量bitexts具有挑战性。本文表明,语言模型的句子表示,并用多个负面等级损失(一个对比目标)进行了微调,有助于检索清洁的bitexts。实验表明,从我们的方法挖掘出的并行数据基本上优于低资源语言高价和Pashto的先前最新方法。
Mining high-quality bitexts for low-resource languages is challenging. This paper shows that sentence representation of language models fine-tuned with multiple negatives ranking loss, a contrastive objective, helps retrieve clean bitexts. Experiments show that parallel data mined from our approach substantially outperform the previous state-of-the-art method on low resource languages Khmer and Pashto.