论文标题
一种非参数测试,可检测生成模型中的数据复制
A Non-Parametric Test to Detect Data-Copying in Generative Models
论文作者
论文摘要
在生成模型中检测过度拟合是机器学习的重要挑战。在这项工作中,我们将一种过度适应的形式形式化,我们称之为{\ em {data-copying}} - 生成模型记住并输出训练样本或其小变化。我们提供了三个样本非参数测试,用于检测使用训练集,与目标分布的单独样本以及从模型中生成的样本的数据进行的,并研究了我们在几个规范模型和数据集上测试的性能。 有关代码\&示例,请访问https://github.com/casey-meehan/data-copying
Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code \& examples, visit https://github.com/casey-meehan/data-copying