论文标题

带有普通高斯设计的拉索,并应用于假设检验

The Lasso with general Gaussian designs with applications to hypothesis testing

论文作者

Celentano, Michael, Montanari, Andrea, Wei, Yuting

论文摘要

套索是一种高维回归的方法,当时,当协变量$ p $的订单数或大于观测值$ n $时,通常使用它。由于两个基本原因,经典的渐近态性理论不适用于该模型:$(1)$正规风险是不平滑的; $(2)$估计器$ \ wideHat {\boldsymbolθ} $与真实参数向量$ \boldsymbolθ^*$无法忽略。结果,标准的扰动论点是渐近正态性的传统基础。 另一方面,套索估计器可以精确地在$ n $和$ p $的政权中进行特征,而$ n/p $均为订单。这种表征首先是在I.I.D.高斯设计的情况下获得的。协变量:在这里,我们将其推广到具有非偏差协方差结构的高斯相关设计。这是根据更简单的``固定设计''模型表示的。我们在两个模型中各个数量的分布之间的距离上建立了非扰动界限,这些模型在合适的稀疏类别中均匀地固定在信号上$ \boldsymbolθ^*$与正则化参数的值。 作为应用程序,我们研究了债券套索的分布,并表明要计算有效的置信区间是必要的自由度校正程度。

The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates $p$ is of the same order or larger than the number of observations $n$. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: $(1)$ The regularized risk is non-smooth; $(2)$ The distance between the estimator $\widehat{\boldsymbolθ}$ and the true parameters vector $\boldsymbolθ^*$ cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail. On the other hand, the Lasso estimator can be precisely characterized in the regime in which both $n$ and $p$ are large and $n/p$ is of order one. This characterization was first obtained in the case of Gaussian designs with i.i.d. covariates: here we generalize it to Gaussian correlated designs with non-singular covariance structure. This is expressed in terms of a simpler ``fixed-design'' model. We establish non-asymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals $\boldsymbolθ^*$ in a suitable sparsity class and over values of the regularization parameter. As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源