使用Treeshap解释可解释的AI集成特征选择

论文标题

使用Treeshap解释可解释的AI集成特征选择

Explainable AI Integrated Feature Selection for Landslide Susceptibility Mapping using TreeSHAP

论文作者

Inan, Muhammad Sakib Khan, Rahman, Istiakur

论文摘要

在人为的全球变暖时代，滑坡一直是对人类生命和财产的常规危险，也是对人类生命和财产的惊人威胁。使用数据驱动方法对滑坡敏感性的早期预测是时间的需求。在这项研究中，我们探讨了最能以最先进的机器学习方法来描述滑坡敏感性的雄辩特征。在我们的研究中，我们采用了最先进的机器学习算法，包括XGBoost，LR，KNN，SVM和Adaboost，用于滑坡易感性预测。为了找到每个单独分类器的最佳超参数以进行优化性能，我们将网格搜索方法融合了10倍的交叉验证。在这种情况下，优化的XGBoost版本以交叉验证加权F1得分为94.62％的所有其他分类器。随后是这个经验证据，我们通过合并Treeshap（一种基于游戏理论的统计算法，用于解释机器学习模型，以识别诸如Slope，Elevation，Twi之类的雄辩功能，与XGBOOST分类器的性能相辅相成，并且较少的功能，SPI效果更少的效果。根据特征的treeshap解释，我们选择了15个最重要的压倒性因果因素。显然，优化版本的XGBoost版本以及功能降低40％的功能使所有其他分类器都超过了流行的所有分类器，而在流行评估指标方面，跨效率的F1评分在培训中的95.01％和95.01％的评分和95.01％的分类量超过了95.01％。

Landslides have been a regular occurrence and an alarming threat to human life and property in the era of anthropogenic global warming. An early prediction of landslide susceptibility using a data-driven approach is a demand of time. In this study, we explored the eloquent features that best describe landslide susceptibility with state-of-the-art machine learning methods. In our study, we employed state-of-the-art machine learning algorithms including XgBoost, LR, KNN, SVM, and Adaboost for landslide susceptibility prediction. To find the best hyperparameters of each individual classifier for optimized performance, we have incorporated the Grid Search method, with 10 Fold Cross-Validation. In this context, the optimized version of XgBoost outperformed all other classifiers with a Cross-validation Weighted F1 score of 94.62 %. Followed by this empirical evidence, we explored the XgBoost classifier by incorporating TreeSHAP, a game-theory-based statistical algorithm used to explain Machine Learning models, to identify eloquent features such as SLOPE, ELEVATION, TWI that complement the performance of the XGBoost classifier mostly and features such as LANDUSE, NDVI, SPI which has less effect on models performance. According to the TreeSHAP explanation of features, we selected the 9 most significant landslide causal factors out of 15. Evidently, an optimized version of XgBoost along with feature reduction by 40 % has outperformed all other classifiers in terms of popular evaluation metrics with a Cross-Validation Weighted F1 score of 95.01 % on the training and AUC score of 97 %

下载PDF全文

下载文献需遵守相关版权规定

论文标题