论文标题

高维因子回归中的预测变量

Interpolating Predictors in High-Dimensional Factor Regression

论文作者

Bunea, Florentina, Strimas-Mackey, Seth, Wegkamp, Marten

论文摘要

这项工作研究了高维回归模型中最小值插值预测因子风险的有限样本特性。如果$ p $回归特征的协方差矩阵$σ$的有效等级比样本大小$ n $大得多,我们表明,最小 - 核心插值预测因子是不可取的,因为它的风险却恰当地预测了0的响应风险。但是,我们的详细样本分析既不存在,当时,当时,这种行为均未揭示,令人惊讶地揭示了它的功能,并且它的特征是{令人惊讶的{遵循广泛使用的因子回归模型的低维度。在这个受欢迎的模型类中,当有效排名$σ$小于$ n $时,同时仍允许$ p \ gg n $,可以控制偏见和多余风险的差异条款,并且最小值互入预测器方法的风险最佳基准。此外,通过对偏差项的详细分析,我们展示了模型类别,在这些模型类别下,我们对多余风险的上限接近零,而在最近的工作中,相应的上限:1906.11300分歧。此外,我们表明,在因子回归模型下分析的最小值插值预测变量,尽管是模型不稳定且缺乏调谐参数,但基于主组件的回归和脊回归的预测变量可能具有相似的风险,并且可以在高度评分方面的基于套管的预测变量,并且可以改善基于套件的预测变量。

This work studies finite-sample properties of the risk of the minimum-norm interpolating predictor in high-dimensional regression models. If the effective rank of the covariance matrix $Σ$ of the $p$ regression features is much larger than the sample size $n$, we show that the min-norm interpolating predictor is not desirable, as its risk approaches the risk of trivially predicting the response by 0. However, our detailed finite-sample analysis reveals, surprisingly, that this behavior is not present when the regression response and the features are {\it jointly} low-dimensional, following a widely used factor regression model. Within this popular model class, and when the effective rank of $Σ$ is smaller than $n$, while still allowing for $p \gg n$, both the bias and the variance terms of the excess risk can be controlled, and the risk of the minimum-norm interpolating predictor approaches optimal benchmarks. Moreover, through a detailed analysis of the bias term, we exhibit model classes under which our upper bound on the excess risk approaches zero, while the corresponding upper bound in the recent work arXiv:1906.11300 diverges. Furthermore, we show that the minimum-norm interpolating predictor analyzed under the factor regression model, despite being model-agnostic and devoid of tuning parameters, can have similar risk to predictors based on principal components regression and ridge regression, and can improve over LASSO based predictors, in the high-dimensional regime.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源