公制：文本表示通过自我监督的三胞胎培训学习

论文标题

公制：文本表示通过自我监督的三胞胎培训学习

MetricBERT: Text Representation Learning via Self-Supervised Triplet Training

论文作者

Malkiel, Itzik, Ginzburg, Dvir, Barkan, Oren, Caciularu, Avi, Weill, Yoni, Koenigstein, Noam

论文摘要

我们提出了Metricbert，这是一个基于BERT的模型，该模型学会了以明确的相似性度量嵌入文本，同时遵守``传统''掩盖语言任务。我们专注于学习相似之处的下游任务，以表明公制的表现优于最先进的替代方案，有时要大幅度。我们对我们的方法及其不同的变体进行了广泛的评估，这表明我们的训练目标在传统的对比损失，标准的余弦相似性目标和其他六个基线方面非常有益。作为另一个贡献，我们发布了视频游戏描述的数据集以及由域专家精心制作的一系列相似性注释。

We present MetricBERT, a BERT-based model that learns to embed text under a well-defined similarity metric while simultaneously adhering to the ``traditional'' masked-language task. We focus on downstream tasks of learning similarities for recommendations where we show that MetricBERT outperforms state-of-the-art alternatives, sometimes by a substantial margin. We conduct extensive evaluations of our method and its different variants, showing that our training objective is highly beneficial over a traditional contrastive loss, a standard cosine similarity objective, and six other baselines. As an additional contribution, we publish a dataset of video games descriptions along with a test set of similarity annotations crafted by a domain expert.

下载PDF全文

下载文献需遵守相关版权规定

论文标题