在无限支持下学习离散分布

论文标题

在无限支持下学习离散分布

Learning discrete distributions with infinite support

论文作者

Cohen, Doron, Kontorovich, Aryeh, Wolfer, Geoffrey

论文摘要

我们提出了一种新的方法，可以在总变化度量中使用（可能）无限支持（潜在的）支持。在偏离已建立的范式的情况下，我们对采样分布没有任何结构性假设。在这种情况下，不可能的无分配风险范围是不可能的，并且人们所希望的最好的是完全经验的数据依赖性界限。我们精确地得出了这种界限，并证明了这些范围在明确的意义上是最好的。我们的主要发现是，经验分布的半确定提供了对经验风险的严格估计。此外，该数量与真实分布的函数以几乎最佳的速率衰减。最优性来自最小值的结果，即可能的独立利益。提供了其他结构结果，包括精确的Rademacher复杂性计算，显然是总变化风险与缺失质量之间的第一个联系。

We present a novel approach to estimating discrete distributions with (potentially) infinite support in the total variation metric. In a departure from the established paradigm, we make no structural assumptions whatsoever on the sampling distribution. In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound. We derive precisely such bounds, and demonstrate that these are, in a well-defined sense, the best possible. Our main discovery is that the half-norm of the empirical distribution provides tight upper and lower estimates on the empirical risk. Furthermore, this quantity decays at a nearly optimal rate as a function of the true distribution. The optimality follows from a minimax result, of possible independent interest. Additional structural results are provided, including an exact Rademacher complexity calculation and apparently a first connection between the total variation risk and the missing mass.

下载PDF全文

下载文献需遵守相关版权规定

论文标题