强大的回归问题提升

论文标题

强大的回归问题提升

Robust Boosting for Regression Problems

论文作者

Ju, Xiaomeng, Salibián-Barrera, Matías

论文摘要

梯度增强算法使用``基础学习者''的线性组合构建回归预测变量。 Boosting还提供了一种获得可靠的非参数回归估计器的方法，这些估计值可扩展到具有许多解释变量的应用。强大的增强算法基于两阶段方法，类似于可靠的线性回归：它首先最大程度地减少了稳健的残差估计器，然后通过优化有界损耗函数来改进它。与以前的强大提升建议不同，这种方法不需要计算每个提升迭代中的临时残差估计器。由于这种可靠的增强算法中涉及的损耗函数通常是非凸的，因此需要一个可靠的初始化步骤，例如L1回归树，该树也很快计算。强大的可变重要性度量也可以通过置换过程计算。彻底的仿真研究和一些数据分析表明，当不存在非典型观察结果时，强大的增强方法和标准梯度促进了平方损失。此外，当数据包含离群值时，强大的增强估计器在预测误差和可变选择精度方面优于替代方案。

Gradient boosting algorithms construct a regression predictor using a linear combination of ``base learners''. Boosting also offers an approach to obtaining robust non-parametric regression estimators that are scalable to applications with many explanatory variables. The robust boosting algorithm is based on a two-stage approach, similar to what is done for robust linear regression: it first minimizes a robust residual scale estimator, and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad-hoc residual scale estimator in each boosting iteration. Since the loss functions involved in this robust boosting algorithm are typically non-convex, a reliable initialization step is required, such as an L1 regression tree, which is also fast to compute. A robust variable importance measure can also be calculated via a permutation procedure. Thorough simulation studies and several data analyses show that, when no atypical observations are present, the robust boosting approach works as well as the standard gradient boosting with a squared loss. Furthermore, when the data contain outliers, the robust boosting estimator outperforms the alternatives in terms of prediction error and variable selection accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题