对广义线性模型偏离偏见的拉索的重新审视

论文标题

对广义线性模型偏离偏见的拉索的重新审视

A Revisit to De-biased Lasso for Generalized Linear Models

论文作者

Xia, Lu, Nan, Bin, Li, Yi

论文摘要

De偏见的Lasso已成为一种流行的工具，用于绘制高维回归模型的统计推断。然而，模拟表明，对于广义线性模型（GLM），偏见的套索不充分消除偏见并产生不可靠的置信区间。这促使我们仔细检查了在高维GLM中的偏差套索的应用。当$ p> n $时，我们检测到反向信息矩阵上的关键稀疏条件通常不在GLM设置中，这可能解释了偏见的套索的低标准性能。即使在挑战性较小的“大$ n $，分歧$ p $”的情况下，我们发现偏见的套索和最大似然方法通常会产生置信区间，并且覆盖范围概率不令人满意。在这种情况下，我们通过直接反转Hessian矩阵而不施加矩阵稀疏性假设来检查一种替代方法，以进一步偏见校正。我们建立了由此产生的估计值的任何线性组合的渐近分布，这些分布构成了理论上的理论基础。模拟表明，这种精致的去除估计器在消除偏见方面表现良好，并产生诚实的置信区间覆盖范围。我们通过分析一项前瞻性医院的波士顿肺癌研究来说明该方法，这是一个大规模流行病学队列，研究了遗传变异对肺癌风险的关节影响。

De-biased lasso has emerged as a popular tool to draw statistical inference for high-dimensional regression models. However, simulations indicate that for generalized linear models (GLMs), de-biased lasso inadequately removes biases and yields unreliable confidence intervals. This motivates us to scrutinize the application of de-biased lasso in high-dimensional GLMs. When $p >n$, we detect that a key sparsity condition on the inverse information matrix generally does not hold in a GLM setting, which likely explains the subpar performance of de-biased lasso. Even in a less challenging "large $n$, diverging $p$" scenario, we find that de-biased lasso and the maximum likelihood method often yield confidence intervals with unsatisfactory coverage probabilities. In this scenario, we examine an alternative approach for further bias correction by directly inverting the Hessian matrix without imposing the matrix sparsity assumption. We establish the asymptotic distributions of any linear combinations of the resulting estimates, which lay the theoretical groundwork for drawing inference. Simulations show that this refined de-biased estimator performs well in removing biases and yields an honest confidence interval coverage. We illustrate the method by analyzing a prospective hospital-based Boston Lung Cancer Study, a large scale epidemiology cohort investigating the joint effects of genetic variants on lung cancer risk.

下载PDF全文

下载文献需遵守相关版权规定

论文标题