论文标题

可通过差差滤波进行列表可解码的稀疏平均值估计

List-Decodable Sparse Mean Estimation via Difference-of-Pairs Filtering

论文作者

Diakonikolas, Ilias, Kane, Daniel M., Karmalkar, Sushrut, Pensia, Ankit, Pittas, Thanasis

论文摘要

我们研究列表可调的稀疏平均估计问题。具体而言,对于(0,1/2)$的参数$α\,我们获得了$ \ mathbb {r}^n $,$ \lfloorαm\ rfloor $ in i.i.d.来自分销$ d $的样本,带有未知$ k $ -sparse的平均$μ$。没有对剩余点的假设,该点构成了数据集的大多数。目的是返回一小部分包含vector $ \widehatμ$的候选人,以便$ \ | \wideHatμ-μ\ | _2 $很小。先前的工作研究了在密集设置中可定义的平均值估计的问题。在这项工作中,我们开发了一种新颖的,概念上更简单的技术,用于列表可解码的均值估计。作为我们方法的主要应用,我们为列表可解码的稀疏平均估计提供了第一个样本和计算有效算法。特别是,对于具有$ k $ -sparse方向和足够轻的尾巴的“认证有限”的分布,我们的算法达到了$(1/α)^{o(1/t)} $的错误,带有样品复杂性$ m =(k \ log log log log(k \ log log(k \ log log log log log)和时间) $ \ mathrm {poly}(mn^t)$。对于高斯嵌入式的特殊情况,我们的算法实现了$θ(\ sqrt {\ log(1/α)})$的最佳错误保证,具有准多物质样本和计算复杂性。我们通过几乎匹配的统计查询和低度多项式测试的下限来补充上限。

We study the problem of list-decodable sparse mean estimation. Specifically, for a parameter $α\in (0, 1/2)$, we are given $m$ points in $\mathbb{R}^n$, $\lfloor αm \rfloor$ of which are i.i.d. samples from a distribution $D$ with unknown $k$-sparse mean $μ$. No assumptions are made on the remaining points, which form the majority of the dataset. The goal is to return a small list of candidates containing a vector $\widehat μ$ such that $\| \widehat μ- μ\|_2$ is small. Prior work had studied the problem of list-decodable mean estimation in the dense setting. In this work, we develop a novel, conceptually simpler technique for list-decodable mean estimation. As the main application of our approach, we provide the first sample and computationally efficient algorithm for list-decodable sparse mean estimation. In particular, for distributions with "certifiably bounded" $t$-th moments in $k$-sparse directions and sufficiently light tails, our algorithm achieves error of $(1/α)^{O(1/t)}$ with sample complexity $m = (k\log(n))^{O(t)}/α$ and running time $\mathrm{poly}(mn^t)$. For the special case of Gaussian inliers, our algorithm achieves the optimal error guarantee of $Θ(\sqrt{\log(1/α)})$ with quasi-polynomial sample and computational complexity. We complement our upper bounds with nearly-matching statistical query and low-degree polynomial testing lower bounds.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源