论文标题
用于模型多样化的增强数据采样
Reinforced Data Sampling for Model Diversification
论文作者
论文摘要
随着机器学习竞赛的增加,世界已经为最佳算法看到了激动人心的比赛。但是,所涉及的数据选择过程从根本上可能遭受证据歧义和概念漂移问题的困扰,从而可能导致对各种模型性能的有害影响。本文提出了一种新的增强数据采样(RDS)方法,以了解如何在搜索有用模型和见解时充分采样数据。我们在数据采样中制定了模型多元化$δ{-div} $的优化问题,以通过注入模型多样性来最大化学习势和最佳分配。这项工作主张将不同的基础学习者作为价值功能,例如神经网络,决策树或逻辑回归,以增强具有多模式信念的数据子集的选择过程。我们引入了不同的合奏奖励机制,包括软投票和随机选择,以近似最佳的抽样策略。在四个数据集上进行的评估显然强调了使用RDS方法比传统抽样方法的好处。我们的实验结果表明,模型多元化的可训练采样对竞争组织者,研究人员甚至起动器都有用,以寻求各种机器学习任务(例如分类和回归)的全部潜力。源代码可在https://github.com/probeu/rds上找到。
With the rising number of machine learning competitions, the world has witnessed an exciting race for the best algorithms. However, the involved data selection process may fundamentally suffer from evidence ambiguity and concept drift issues, thereby possibly leading to deleterious effects on the performance of various models. This paper proposes a new Reinforced Data Sampling (RDS) method to learn how to sample data adequately on the search for useful models and insights. We formulate the optimisation problem of model diversification $δ{-div}$ in data sampling to maximise learning potentials and optimum allocation by injecting model diversity. This work advocates the employment of diverse base learners as value functions such as neural networks, decision trees, or logistic regressions to reinforce the selection process of data subsets with multi-modal belief. We introduce different ensemble reward mechanisms, including soft voting and stochastic choice to approximate optimal sampling policy. The evaluation conducted on four datasets evidently highlights the benefits of using RDS method over traditional sampling approaches. Our experimental results suggest that the trainable sampling for model diversification is useful for competition organisers, researchers, or even starters to pursue full potentials of various machine learning tasks such as classification and regression. The source code is available at https://github.com/probeu/RDS.