论文标题
在因子模型中双重稳健的最近邻居
Doubly robust nearest neighbors in factor models
论文作者
论文摘要
我们介绍并分析了最近邻居(NN)的改进变体,以估算潜在因子模型中缺少数据。我们考虑丢失数据的矩阵完成问题,其中$(i,t)$ - th条目在观察时,由其平均$ f(u_i,v_t)$加上未知函数$ f $ and liet factor $ u_i $和$ u_i $ and $ v_t $的平均$ f(u_i,v_t)$。估计平均$ f(u_i,v_t)$的nn策略,例如单位单位nn,依赖于其他行$ j $的存在,其中$ u_j \ of cout u_i $。同样,时间时间NN策略依赖于列的存在$ t'$ with $ v_ {t'} \ of v_t $。当没有类似的行或类似列时,这些策略分别提供较差的性能。我们的估计以两种方式对这种赤字有双重鲁棒:(1)只要存在良好的行或良好的列邻居,我们的估计值就可以提供一致的估计值。 (2)此外,如果存在良好的行和良好的圆柱邻居,则它在非反应误差方面提供了(几乎)二次的改进,并且与单位单位或时间时间NN相比,渐近置信区间明显更狭窄。
We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the $(i, t)$-th entry, when observed, is given by its mean $f(u_i, v_t)$ plus mean-zero noise for an unknown function $f$ and latent factors $u_i$ and $v_t$. Prior NN strategies, like unit-unit NN, for estimating the mean $f(u_i, v_t)$ relies on existence of other rows $j$ with $u_j \approx u_i$. Similarly, time-time NN strategy relies on existence of columns $t'$ with $v_{t'} \approx v_t$. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.