论文标题
Mcrapper:Poset家族的蒙特卡罗Rademacher平均值和近似模式开采
MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining
论文作者
论文摘要
我们提出了Mcrapper,这是一种用于有效计算蒙特 - 卡洛经验Rademacher平均值(MCERA)的算法,用于表现出POSET(例如晶格)结构的功能家族,例如在许多模式挖掘任务中出现的功能。 MCERA使我们能够将上限计算为样本平均值的最大偏差,因此,当将可用的数据视为未知分布中的样本时,可以使用它来查找两个统计上有意义的功能(即模式)。此功能比以前提出的解决方案有了很大的改进,只能实现两者之一。 Mcrapper使用上限到功能的差异来有效探索和修剪搜索空间,这是一种从模式挖掘本身借来的技术。为了显示Mcrapper的实际使用,我们采用它来开发一种算法TFP-R,以实现真正的频繁模式(TFP)挖掘的任务。 TFP-R可以保证包含任何误报(精度),并且比提供相同保证的现有方法具有更高的统计能力(召回)。我们评估Mcrapper和TFP-R,并表明他们在各自的任务方面的最先进。
We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks.