论文标题
马尔可夫连锁店缺少质量集中
Missing Mass Concentration for Markov Chains
论文作者
论文摘要
统计推断质量缺失的问题(由McAllester和Ortiz提出,NIPS'02;最近由Changa和Thangaraj重新审视,ISIT'2019)试图估算尚未从源头取样的符号的重量。 到目前为止,所有方法都集中在IID模型上,尽管过于简单,但它并不是很简单的问题。非平凡的部分是处理相关事件和具有非常不同尺度的变量总和,而经典浓度不平等不会产生良好的界限。 在本文中,我们开发了有关进一步缺失质量的研究,解决了马尔可夫连锁店的问题。我们将问题减少到研究打击时间的尾巴,并向它们找到\ emph {log-eDdive近似}。更确切地说,我们结合了大分化技术和对设定打击时间的某些估计,以显示如何最终将问题简化回IID案例。我们的贡献是a)新技术以获得缺失的质量界限 - 我们通过多数化替换传统上使用的负面关联,这用于更广泛的过程b)Markov链模型中缺失质量的首先(指数)浓度界限C)简化了集合时间和D的最新结果的简化,而D)简化了无数次数的质量估计值。
The problem of missing mass in statistical inference (posed by McAllester and Ortiz, NIPS'02; most recently revisited by Changa and Thangaraj, ISIT'2019) seeks to estimate the weight of symbols that have not been sampled yet from a source. So far all the approaches have been focused on the IID model which, although overly simplistic, is already not straightforward to tackle. The non-trivial part is in handling correlated events and sums of variables with very different scales where classical concentration inequalities do not yield good bounds. In this paper we develop the research on missing mass further, solving the problem for Markov chains. We reduce the problem to studying the tails of hitting times and finding \emph{log-additive approximations} to them. More precisely, we combine the technique of majorization and certain estimates on set hitting times to show how the problem can be eventually reduced back to the IID case. Our contribution are a) new technique to obtain missing mass bounds - we replace traditionally used negative association by majorization which works for a wider class of processes b) first (exponential) concentration bounds for missing mass in Markov chain models c) simplifications of recent results on set hitting times and d) simplified derivation of missing mass estimates for memory-less sources.