论文标题

梳子:通过连续优化的最佳子集选择

COMBSS: Best Subset Selection via Continuous Optimization

论文作者

Moka, Sarat, Liquet, Benoit, Zhu, Houying, Muller, Samuel

论文摘要

考虑到最佳子集回归中最佳子集选择的问题是为了找到最适合响应的固定尺寸子集。与数据样本数量相比,当可用的功能总数非常大时,这尤其具有挑战性。解决此问题的现有最佳方法往往很慢,而快速方法的精度往往较低。理想情况下,新方法比现有最佳方法更快地执行最佳子集选择,但准确性可比,或者比可比的计算速度的方法更准确。在这里,我们提出了一种新型的连续优化方法,该方法可以识别一个子集解决方案路径,这是一小部分不同大小的模型,该方法由单个最佳特征子集的候选者组成,这在线性回归中是最佳的。事实证明,我们的方法是快速的,当功能数量超过数千个时,最佳的子集选择可能。由于总体表现出色,将最佳子集选择挑战构建为连续优化问题,为各种回归模型开辟了新的研究方向。

The problem of best subset selection in linear regression is considered with the aim to find a fixed size subset of features that best fits the response. This is particularly challenging when the total available number of features is very large compared to the number of data samples. Existing optimal methods for solving this problem tend to be slow while fast methods tend to have low accuracy. Ideally, new methods perform best subset selection faster than existing optimal methods but with comparable accuracy, or, being more accurate than methods of comparable computational speed. Here, we propose a novel continuous optimization method that identifies a subset solution path, a small set of models of varying size, that consists of candidates for the single best subset of features, that is optimal in a specific sense in linear regression. Our method turns out to be fast, making the best subset selection possible when the number of features is well in excess of thousands. Because of the outstanding overall performance, framing the best subset selection challenge as a continuous optimization problem opens new research directions for feature extraction for a large variety of regression models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源