论文标题
低资源语言对的增强翻译技术:梵文到印地语翻译
An Augmented Translation Technique for low Resource language pair: Sanskrit to Hindi translation
论文作者
论文摘要
神经机器翻译(NMT)是使用巨大的人工神经网络进行机器翻译(MT)的持续技术。它表现出了有希望的结果,并且在解决挑战机器翻译练习方面表现出了令人难以置信的潜力。一种这样的练习是为语言套装提供少量准备信息的最佳方法。在这项工作中,对低资源语言对检查了零射击翻译(ZST)。通过使用基准的高资源语言对,即西班牙语到葡萄牙语,以及对数据集(西班牙语 - 英语和英语 - 葡萄牙语)进行培训,我们为ZST系统准备了一个证明状态,从而为可用的数据提供了适当的结果。随后,通过训练英语印地语和梵语 - 英语语言对的模型,对梵语进行了相同的体系结构的梵语进行梵语。为了用ZST系统进行准备和解密,我们扩大了tensorflow中NMT SEQ2SEQ模型的准备和解释管道,并结合了ZST功能。进行单词嵌入的维度降低是为了减少数据存储的内存使用量并实现更快的训练和翻译周期。在这项工作中,现有的有用技术以富有想象力的方式使用了我们的梵语NLP问题,以执行印地语翻译。建造了一个用于测试的梵文印地语平行语料库。建造平行语料库所需的数据是从印度中央邦州马德里邦州政府公共信息部发表的Telecastred News,印度的网站。
Neural Machine Translation (NMT) is an ongoing technique for Machine Translation (MT) using enormous artificial neural network. It has exhibited promising outcomes and has shown incredible potential in solving challenging machine translation exercises. One such exercise is the best approach to furnish great MT to language sets with a little preparing information. In this work, Zero Shot Translation (ZST) is inspected for a low resource language pair. By working on high resource language pairs for which benchmarks are available, namely Spanish to Portuguese, and training on data sets (Spanish-English and English-Portuguese) we prepare a state of proof for ZST system that gives appropriate results on the available data. Subsequently the same architecture is tested for Sanskrit to Hindi translation for which data is sparse, by training the model on English-Hindi and Sanskrit-English language pairs. In order to prepare and decipher with ZST system, we broaden the preparation and interpretation pipelines of NMT seq2seq model in tensorflow, incorporating ZST features. Dimensionality reduction of word embedding is performed to reduce the memory usage for data storage and to achieve a faster training and translation cycles. In this work existing helpful technology has been utilized in an imaginative manner to execute our NLP issue of Sanskrit to Hindi translation. A Sanskrit-Hindi parallel corpus of 300 is constructed for testing. The data required for the construction of parallel corpus has been taken from the telecasted news, published on Department of Public Information, state government of Madhya Pradesh, India website.