论文标题

记录矢量空间密度的相似性

Document Similarity from Vector Space Densities

论文作者

Rushkin, Ilia

论文摘要

我们提出了一种计算光方法,用于估计文本文档之间的相似性,我们称之为密度相似(DS)方法。该方法基于在高维欧几里得空间和内核回归中嵌入的单词,并考虑了单词之间的语义关系。我们发现,该方法的准确性几乎与最先进的方法相同,而速度的增长非常大。此外,我们介绍了TOP-K精度度量的广义版本和相似性模型之间一致性的JACCARD度量。

We propose a computationally light method for estimating similarities between text documents, which we call the density similarity (DS) method. The method is based on a word embedding in a high-dimensional Euclidean space and on kernel regression, and takes into account semantic relations among words. We find that the accuracy of this method is virtually the same as that of a state-of-the-art method, while the gain in speed is very substantial. Additionally, we introduce generalized versions of the top-k accuracy metric and of the Jaccard metric of agreement between similarity models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源