竞争的价格：效果大小异质性在高维度

论文标题

竞争的价格：效果大小异质性在高维度

The Price of Competition: Effect Size Heterogeneity Matters in High Dimensions

论文作者

Wang, Hua, Yang, Yachong, Su, Weijie J.

论文摘要

在高维稀疏回归中，在固定稀疏度时会增加信噪比，总是会导致更好的模型选择？对于高维的稀疏回归问题，令人惊讶的是，在本文中，我们依靠一个新概念，在lasso方法的线性稀疏性方面回答了这个问题，我们术语效应大小异质性。粗略地说，如果其非零条目的幅度显着不同，则回归系数矢量具有很高的效果大小的异质性。从这一新措施的角度来看，我们证明，当这种措施在某种意义上是最大的，而当所有非零效应尺寸都大致相等的意义上时，错误和真实的正利率在最大的意义上是最大的折衷而实现的最佳权衡。此外，我们证明，当效果大小异质性比最大值时，第一个错误选择发生得更早。从隐喻的角度来看，这两种现象的根本原因是变量之间的``竞争''在输入模型时具有相同大小的效果大小。综上所述，我们的发现表明，在分析高维回归问题时，效应大小异质性应作为回归系数稀疏性的重要互补度量。我们的证明使用近似消息传递理论的技术以及一种估计第一个错误变量等级的新技术。

In high-dimensional sparse regression, would increasing the signal-to-noise ratio while fixing the sparsity level always lead to better model selection? For high-dimensional sparse regression problems, surprisingly, in this paper we answer this question in the negative in the regime of linear sparsity for the Lasso method, relying on a new concept we term effect size heterogeneity. Roughly speaking, a regression coefficient vector has high effect size heterogeneity if its nonzero entries have significantly different magnitudes. From the viewpoint of this new measure, we prove that the false and true positive rates achieve the optimal trade-off uniformly along the Lasso path when this measure is maximal in a certain sense, and the worst trade-off is achieved when it is minimal in the sense that all nonzero effect sizes are roughly equal. Moreover, we demonstrate that the first false selection occurs much earlier when effect size heterogeneity is minimal than when it is maximal. The underlying cause of these two phenomena is, metaphorically speaking, the ``competition'' among variables with effect sizes of the same magnitude in entering the model. Taken together, our findings suggest that effect size heterogeneity shall serve as an important complementary measure to the sparsity of regression coefficients in the analysis of high-dimensional regression problems. Our proofs use techniques from approximate message passing theory as well as a novel technique for estimating the rank of the first false variable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题