论文标题

XGBOOST中使用加速故障时间模型的生存回归

Survival regression with accelerated failure time model in XGBoost

论文作者

Barnwal, Avinash, Cho, Hyunsu, Hocking, Toby Dylan

论文摘要

生存回归用于估计事件时间和特征变量之间的关系,并且在医学,营销,风险管理和销售管理等应用领域中很重要。基于非线性树的机器学习算法在XGBoost,Scikit-Learn,LightGBM和Catboost等库中实现了,通常在实践中比线性模型更准确。但是,现有的基于树模型的最新实现为生存回归提供了有限的支持。在这项工作中,我们在XGBoost中实施了学习加速失败时间(AFT)模型的损失功能,以增加对不同标签审查的生存建模的支持。我们通过两个方面的XGBoost在XGBoost中AFT在XGBoost中的有效性进行了证明:概括性能和训练速度。此外,我们利用对XGBoost中NVIDIA GPU的支持,在多核CPU上实现了实质性加速。据我们所知,我们的工作是利用NVIDIA GPU的处理能力的第一次实施。从1.2.0版本开始,XGBoost软件包本身支持AFT模型。在XGBoost中添加AFT对开源社区产生了重大影响,现在一些统计数据包使用XGBOOST AFT模型。

Survival regression is used to estimate the relation between time-to-event and feature variables, and is important in application domains such as medicine, marketing, risk management and sales management. Nonlinear tree based machine learning algorithms as implemented in libraries such as XGBoost, scikit-learn, LightGBM, and CatBoost are often more accurate in practice than linear models. However, existing state-of-the-art implementations of tree-based models have offered limited support for survival regression. In this work, we implement loss functions for learning accelerated failure time (AFT) models in XGBoost, to increase the support for survival modeling for different kinds of label censoring. We demonstrate with real and simulated experiments the effectiveness of AFT in XGBoost with respect to a number of baselines, in two respects: generalization performance and training speed. Furthermore, we take advantage of the support for NVIDIA GPUs in XGBoost to achieve substantial speedup over multi-core CPUs. To our knowledge, our work is the first implementation of AFT that utilizes the processing power of NVIDIA GPUs. Starting from the 1.2.0 release, the XGBoost package natively supports the AFT model. The addition of AFT in XGBoost has had significant impact in the open source community, and a few statistics packages now utilize the XGBoost AFT model.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源