语言建模的一声学习

论文标题

语言建模的一声学习

One-Shot Learning for Language Modelling

论文作者

Ucar, Talip, Gonzalez-Martin, Adrian, Lee, Matthew, Szwarc, Adrian Daniel

论文摘要

人类可以使用周围单词的语法和语义来推断一个单词的含义，即使这是他们第一次阅读或听到它。我们还可以将该单词的概念推广到新任务。尽管在某些任务中实现人类水平的表现方面取得了长足的进步（Silver等，2016），但从一个或几个示例中学习仍然是机器学习的关键挑战，并且在自然语言处理（NLP）中尚未彻底探讨。在这项工作中，我们通过利用机器学习最新发展的想法来解决NLP任务的一项问题：嵌入，注意机制（SoftMax）和相似性度量（Cosine，Euclidean，Poincare和Minkowski）。我们适应了匹配网络中建议的框架（Vinyals等，2016），并通过使用Wikitext-2-2数据集（Vinyals等人，2016年）中的任务（Vinyals等，2016）探索了上述方法中上述方法的有效性。我们的工作有两种贡献：我们的第一个贡献是我们探索不同距离指标在K-shot学习中的有效性，并表明K-Shot Learning没有单一的最佳距离度量，这挑战了共同的信念。我们发现，距离度量的性能取决于训练过程中使用的镜头数量。我们工作的第二个贡献是，我们在语言任务上建立了一个，具有可公开可用数据集的语言任务的基准，该数据集可用于对未来的研究中的基准进行基准测试。

Humans can infer a great deal about the meaning of a word, using the syntax and semantics of surrounding words even if it is their first time reading or hearing it. We can also generalise the learned concept of the word to new tasks. Despite great progress in achieving human-level performance in certain tasks (Silver et al., 2016), learning from one or few examples remains a key challenge in machine learning, and has not thoroughly been explored in Natural Language Processing (NLP). In this work we tackle the problem of oneshot learning for an NLP task by employing ideas from recent developments in machine learning: embeddings, attention mechanisms (softmax) and similarity measures (cosine, Euclidean, Poincare, and Minkowski). We adapt the framework suggested in matching networks (Vinyals et al., 2016), and explore the effectiveness of the aforementioned methods in one, two and three-shot learning problems on the task of predicting missing word explored in (Vinyals et al., 2016) by using the WikiText-2 dataset. Our work contributes in two ways: Our first contribution is that we explore the effectiveness of different distance metrics on k-shot learning, and show that there is no single best distance metric for k-shot learning, which challenges common belief. We found that the performance of a distance metric depends on the number of shots used during training. The second contribution of our work is that we establish a benchmark for one, two, and three-shot learning on a language task with a publicly available dataset that can be used to benchmark against in future research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题