论文标题
强大的回归问题提升
Robust Boosting for Regression Problems
论文作者
论文摘要
梯度增强算法使用``基础学习者''的线性组合构建回归预测变量。 Boosting还提供了一种获得可靠的非参数回归估计器的方法,这些估计值可扩展到具有许多解释变量的应用。强大的增强算法基于两阶段方法,类似于可靠的线性回归:它首先最大程度地减少了稳健的残差估计器,然后通过优化有界损耗函数来改进它。与以前的强大提升建议不同,这种方法不需要计算每个提升迭代中的临时残差估计器。由于这种可靠的增强算法中涉及的损耗函数通常是非凸的,因此需要一个可靠的初始化步骤,例如L1回归树,该树也很快计算。强大的可变重要性度量也可以通过置换过程计算。彻底的仿真研究和一些数据分析表明,当不存在非典型观察结果时,强大的增强方法和标准梯度促进了平方损失。此外,当数据包含离群值时,强大的增强估计器在预测误差和可变选择精度方面优于替代方案。
Gradient boosting algorithms construct a regression predictor using a linear combination of ``base learners''. Boosting also offers an approach to obtaining robust non-parametric regression estimators that are scalable to applications with many explanatory variables. The robust boosting algorithm is based on a two-stage approach, similar to what is done for robust linear regression: it first minimizes a robust residual scale estimator, and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad-hoc residual scale estimator in each boosting iteration. Since the loss functions involved in this robust boosting algorithm are typically non-convex, a reliable initialization step is required, such as an L1 regression tree, which is also fast to compute. A robust variable importance measure can also be calculated via a permutation procedure. Thorough simulation studies and several data analyses show that, when no atypical observations are present, the robust boosting approach works as well as the standard gradient boosting with a squared loss. Furthermore, when the data contain outliers, the robust boosting estimator outperforms the alternatives in terms of prediction error and variable selection accuracy.