论文标题
在无限支持下学习离散分布
Learning discrete distributions with infinite support
论文作者
论文摘要
我们提出了一种新的方法,可以在总变化度量中使用(可能)无限支持(潜在的)支持。在偏离已建立的范式的情况下,我们对采样分布没有任何结构性假设。在这种情况下,不可能的无分配风险范围是不可能的,并且人们所希望的最好的是完全经验的数据依赖性界限。我们精确地得出了这种界限,并证明了这些范围在明确的意义上是最好的。我们的主要发现是,经验分布的半确定提供了对经验风险的严格估计。此外,该数量与真实分布的函数以几乎最佳的速率衰减。最优性来自最小值的结果,即可能的独立利益。提供了其他结构结果,包括精确的Rademacher复杂性计算,显然是总变化风险与缺失质量之间的第一个联系。
We present a novel approach to estimating discrete distributions with (potentially) infinite support in the total variation metric. In a departure from the established paradigm, we make no structural assumptions whatsoever on the sampling distribution. In such a setting, distribution-free risk bounds are impossible, and the best one could hope for is a fully empirical data-dependent bound. We derive precisely such bounds, and demonstrate that these are, in a well-defined sense, the best possible. Our main discovery is that the half-norm of the empirical distribution provides tight upper and lower estimates on the empirical risk. Furthermore, this quantity decays at a nearly optimal rate as a function of the true distribution. The optimality follows from a minimax result, of possible independent interest. Additional structural results are provided, including an exact Rademacher complexity calculation and apparently a first connection between the total variation risk and the missing mass.