论文标题

强大的回归问题提升

Robust Boosting for Regression Problems

论文作者

Ju, Xiaomeng, Salibián-Barrera, Matías

论文摘要

梯度增强算法使用``基础学习者''的线性组合构建回归预测变量。 Boosting还提供了一种获得可靠的非参数回归估计器的方法,这些估计值可扩展到具有许多解释变量的应用。强大的增强算法基于两阶段方法,类似于可靠的线性回归:它首先最大程度地减少了稳健的残差估计器,然后通过优化有界损耗函数来改进它。与以前的强大提升建议不同,这种方法不需要计算每个提升迭代中的临时残差估计器。由于这种可靠的增强算法中涉及的损耗函数通常是非凸的,因此需要一个可靠的初始化步骤,例如L1回归树,该树也很快计算。强大的可变重要性度量也可以通过置换过程计算。彻底的仿真研究和一些数据分析表明,当不存在非典型观察结果时,强大的增强方法和标准梯度促进了平方损失。此外,当数据包含离群值时,强大的增强估计器在预测误差和可变选择精度方面优于替代方案。

Gradient boosting algorithms construct a regression predictor using a linear combination of ``base learners''. Boosting also offers an approach to obtaining robust non-parametric regression estimators that are scalable to applications with many explanatory variables. The robust boosting algorithm is based on a two-stage approach, similar to what is done for robust linear regression: it first minimizes a robust residual scale estimator, and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad-hoc residual scale estimator in each boosting iteration. Since the loss functions involved in this robust boosting algorithm are typically non-convex, a reliable initialization step is required, such as an L1 regression tree, which is also fast to compute. A robust variable importance measure can also be calculated via a permutation procedure. Thorough simulation studies and several data analyses show that, when no atypical observations are present, the robust boosting approach works as well as the standard gradient boosting with a squared loss. Furthermore, when the data contain outliers, the robust boosting estimator outperforms the alternatives in terms of prediction error and variable selection accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源