通过顺序决策的镜头采样

论文标题

通过顺序决策的镜头采样

Sampling Through the Lens of Sequential Decision Making

论文作者

Dou, Jason Xiaotian, Pan, Alvin Qingkai, Bao, Runxue, Mao, Haiyi Harry, Luo, Lei, Mao, Zhi-Hong

论文摘要

采样在机器学习方法中无处不在。由于大数据集的增长和模型复杂性，我们希望在训练A表示时学习和适应抽样过程。为了实现这一宏伟的目标，已经提出了各种抽样技术。但是，他们中的大多数要么使用固定采样方案，要么基于简单的启发式方法调整采样方案。他们不能在不同阶段选择最佳模型培训样本。受认知科学中的“思考，快速和系统2）的启发，我们提出了一种奖励指导的采样策略，称为自适应样本，并奖励（ASR）来应对这一挑战。据我们所知，这是利用强化学习（RL）解决代表学习中抽样问题的第一项工作。我们的方法最佳地调整了采样过程以实现最佳性能。我们通过基于距离的采样来探索样品之间的地理关系，以最大程度地提高整体累积奖励。我们将ASR应用于基于相似性的损失函数中的长期抽样问题。信息检索和聚类的经验结果证明了ASR在不同数据集中的出色性能。我们还讨论了一种令人着迷的现象，在实验中，我们将其称为“ ASR重力”。

Sampling is ubiquitous in machine learning methodologies. Due to the growth of large datasets and model complexity, we want to learn and adapt the sampling process while training a representation. Towards achieving this grand goal, a variety of sampling techniques have been proposed. However, most of them either use a fixed sampling scheme or adjust the sampling scheme based on simple heuristics. They cannot choose the best sample for model training in different stages. Inspired by "Think, Fast and Slow" (System 1 and System 2) in cognitive science, we propose a reward-guided sampling strategy called Adaptive Sample with Reward (ASR) to tackle this challenge. To the best of our knowledge, this is the first work utilizing reinforcement learning (RL) to address the sampling problem in representation learning. Our approach optimally adjusts the sampling process to achieve optimal performance. We explore geographical relationships among samples by distance-based sampling to maximize overall cumulative reward. We apply ASR to the long-standing sampling problems in similarity-based loss functions. Empirical results in information retrieval and clustering demonstrate ASR's superb performance across different datasets. We also discuss an engrossing phenomenon which we name as "ASR gravity well" in experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题