基于重新铰链损失的低排名在线距离/相似性学习

论文标题

基于重新铰链损失的低排名在线距离/相似性学习

Low-Rank Robust Online Distance/Similarity Learning based on the Rescaled Hinge Loss

论文作者

Zabihzadeh, Davood, Tuama, Amar, Karami-Mollaee, Ali

论文摘要

公制学习的一个重要挑战是对输入数据的大小和维度的可伸缩性。提出了在线公制学习算法来应对这一挑战。现有方法通常基于（被动攻击性）PA方法。因此，他们可以以自适应学习率快速处理大量数据。但是，这些算法基于铰链损失，因此对异常值和标签噪声并不强大。此外，现有的在线方法通常假定培训三重态或成对约束。但是，实际应用程序中的许多数据集都以输入数据及其相关标签的形式。我们通过使用可靠的铰链损耗功能来提出在线距离相似性学习问题来解决这些挑战。所提出的模型相当一般，可以应用于任何基于PA的在线距离相似算法。此外，我们开发了一种有效的强大的一通三重构建算法。最后，为了在高维DML环境中提供可伸缩性，提出了所提出的方法的低级别版本，不仅会大大降低计算成本，而且还可以保持学习指标的预测性能。此外，它为我们的深度相似性学习的方法提供了直接的扩展。我们在来自各种应用程序的数据集上进行了多个实验。结果证实，在存在标签噪声和离群值的情况下，提出的方法显着优于最先进的在线DML方法。

An important challenge in metric learning is scalability to both size and dimension of input data. Online metric learning algorithms are proposed to address this challenge. Existing methods are commonly based on (Passive Aggressive) PA approach. Hence, they can rapidly process large volumes of data with an adaptive learning rate. However, these algorithms are based on the Hinge loss and so are not robust against outliers and label noise. Also, existing online methods usually assume training triplets or pairwise constraints are exist in advance. However, many datasets in real-world applications are in the form of input data and their associated labels. We address these challenges by formulating the online Distance-Similarity learning problem with the robust Rescaled hinge loss function. The proposed model is rather general and can be applied to any PA-based online Distance-Similarity algorithm. Also, we develop an efficient robust one-pass triplet construction algorithm. Finally, to provide scalability in high dimensional DML environments, the low-rank version of the proposed methods is presented that not only reduces the computational cost significantly but also keeps the predictive performance of the learned metrics. Also, it provides a straightforward extension of our methods for deep Distance-Similarity learning. We conduct several experiments on datasets from various applications. The results confirm that the proposed methods significantly outperform state-of-the-art online DML methods in the presence of label noise and outliers by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题