论文标题

一种非参数测试,可检测生成模型中的数据复制

A Non-Parametric Test to Detect Data-Copying in Generative Models

论文作者

Meehan, Casey, Chaudhuri, Kamalika, Dasgupta, Sanjoy

论文摘要

在生成模型中检测过度拟合是机器学习的重要挑战。在这项工作中,我们将一种过度适应的形式形式化,我们称之为{\ em {data-copying}} - 生成模型记住并输出训练样本或其小变化。我们提供了三个样本非参数测试,用于检测使用训练集,与目标分布的单独样本以及从模型中生成的样本的数据进行的,并研究了我们在几个规范模型和数据集上测试的性能。 有关代码\&示例,请访问https://github.com/casey-meehan/data-copying

Detecting overfitting in generative models is an important challenge in machine learning. In this work, we formalize a form of overfitting that we call {\em{data-copying}} -- where the generative model memorizes and outputs training samples or small variations thereof. We provide a three sample non-parametric test for detecting data-copying that uses the training set, a separate sample from the target distribution, and a generated sample from the model, and study the performance of our test on several canonical models and datasets. For code \& examples, visit https://github.com/casey-meehan/data-copying

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源