论文标题

用于回归,分类和生存的随机森林(RF)内核

Random Forest (RF) Kernel for Regression, Classification and Survival

论文作者

Feng, Dai, Baumgartner, Richard

论文摘要

Breiman的随机森林(RF)可以解释为隐式内核发生器,随后的接近矩阵代表数据驱动的RF内核。对RF的内核观点已被用来为其统计特性的理论研究开发一个原则性的框架。但是,核与RF之间的联系的实际实用性尚未得到广泛探索和系统评估。我们的工作的重点是研究内核方法与RF之间的相互作用。我们阐明了正规化线性模型在全面的模拟研究中使用的数据驱动的RF内核的性能和性能,该研究包括连续,二元和生存目标。我们表明,对于连续和生存目标,RF内核在较高维度的情况下具有较高数量的嘈杂功能的RF竞争。对于二进制目标,RF内核和RF表现出可比的性能。随着RF渐近核会收敛到Laplace内核,我们将其包括在评估中。对于大多数仿真设置,RF和RFKernel的表现优于Laplace内核。然而,在某些情况下,拉普拉斯内核具有竞争力,显示了其对应用的潜在价值。我们还为回归,分类和生存提供了现实生活数据集的结果,以说明如何在实践中利用这些见解。在本文中,我们讨论了在可解释的原型和地标的分类,回归和生存的背景下,我们讨论了RF内核的进一步扩展。我们概述了RF的贝叶斯对应物提供的核的未来研究线。

Breiman's random forest (RF) can be interpreted as an implicit kernel generator,where the ensuing proximity matrix represents the data-driven RF kernel. Kernel perspective on the RF has been used to develop a principled framework for theoretical investigation of its statistical properties. However, practical utility of the links between kernels and the RF has not been widely explored and systematically evaluated.Focus of our work is investigation of the interplay between kernel methods and the RF. We elucidate the performance and properties of the data driven RF kernels used by regularized linear models in a comprehensive simulation study comprising of continuous, binary and survival targets. We show that for continuous and survival targets, the RF kernels are competitive to RF in higher dimensional scenarios with larger number of noisy features. For the binary target, the RF kernel and RF exhibit comparable performance. As the RF kernel asymptotically converges to the Laplace kernel, we included it in our evaluation. For most simulation setups, the RF and RFkernel outperformed the Laplace kernel. Nevertheless, in some cases the Laplace kernel was competitive, showing its potential value for applications. We also provide the results from real life data sets for the regression, classification and survival to illustrate how these insights may be leveraged in practice.Finally, we discuss further extensions of the RF kernels in the context of interpretable prototype and landmarking classification, regression and survival. We outline future line of research for kernels furnished by Bayesian counterparts of the RF.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源