在随机森林回归中靶向预测因子

论文标题

在随机森林回归中靶向预测因子

Targeting predictors in random forest regression

论文作者

Borup, Daniel, Christensen, Bent Jesper, Mühlbach, Nicolaj Nørgaard, Nielsen, Mikkel Slot

论文摘要

随机森林回归（RF）是分析高维数据的极为流行工具。尽管如此，由于预测变量较弱，在稀疏设置中可能会降低其益处，并且需要降低预估计维度（目标）步骤。我们表明，适当的靶向控制沿强预测变量放置拆分的可能性，从而为RF的特征抽样提供了重要的补充。使用代表性有限样本的模拟支持了这。此外，我们根据单个树木的强度来量化靶向的立即增益。宏观经济和财务应用表明，由于森林中树木之间的相关性增加而暗示的偏见变化权衡取舍，在中等程度的目标方面平衡，选择最佳的10---30 \％的常用预测指标。靶向RF相对于普通RF的预测准确性的提高是相当大的，最高12-13 \％，在衰退和扩展中，尤其是在长时间的衰退和膨胀中。

Random forest regression (RF) is an extremely popular tool for the analysis of high-dimensional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting) step is required. We show that proper targeting controls the probability of placing splits along strong predictors, thus providing an important complement to RF's feature sampling. This is supported by simulations using representative finite samples. Moreover, we quantify the immediate gain from targeting in terms of increased strength of individual trees. Macroeconomic and financial applications show that the bias-variance trade-off implied by targeting, due to increased correlation among trees in the forest, is balanced at a medium degree of targeting, selecting the best 10--30\% of commonly applied predictors. Improvements in predictive accuracy of targeted RF relative to ordinary RF are considerable, up to 12-13\%, occurring both in recessions and expansions, particularly at long horizons.

下载PDF全文

下载文献需遵守相关版权规定

论文标题