自我监督模型的数据集推断

论文标题

自我监督模型的数据集推断

Dataset Inference for Self-Supervised Models

论文作者

Dziedzic, Adam, Duan, Haonan, Kaleem, Muhammad Ahmad, Dhawan, Nikita, Guan, Jonas, Cattan, Yannis, Boenisch, Franziska, Papernot, Nicolas

论文摘要

自我监督的模型在机器学习（ML）中越来越普遍，因为它们减少了对昂贵标签数据的需求。由于它们在下游应用程序中的多功能性，它们越来越多地用作通过公共API暴露的服务。同时，由于它们输出的向量表示的高维度，这些编码器模型特别容易受到模型窃取攻击的影响。然而，编码器仍然没有防御：窃取攻击的现有缓解策略集中在监督学习上。我们介绍了一个新的数据集推理防御，该防御使用受害者编码器模型的私人培训集将其所有权归因于窃取时。直觉是，如果受害者从受害者那里偷走了编码器的培训数据，编码器的输出表示的对数可能比测试数据更高，但如果对其进行了独立培训，则不会。我们使用密度估计模型来计算这种对数似然的时期。作为评估的一部分，我们还建议测量被盗编码器的保真度并量化盗窃检测的有效性，而无需涉及下游任务。相反，我们利用共同信息和距离测量。我们在视觉领域中广泛的经验结果表明，数据集推断是捍卫自我监督模型免受模型窃取的有前途的方向。

Self-supervised models are increasingly prevalent in machine learning (ML) since they reduce the need for expensively labeled data. Because of their versatility in downstream applications, they are increasingly used as a service exposed via public APIs. At the same time, these encoder models are particularly vulnerable to model stealing attacks due to the high dimensionality of vector representations they output. Yet, encoders remain undefended: existing mitigation strategies for stealing attacks focus on supervised learning. We introduce a new dataset inference defense, which uses the private training set of the victim encoder model to attribute its ownership in the event of stealing. The intuition is that the log-likelihood of an encoder's output representations is higher on the victim's training data than on test data if it is stolen from the victim, but not if it is independently trained. We compute this log-likelihood using density estimation models. As part of our evaluation, we also propose measuring the fidelity of stolen encoders and quantifying the effectiveness of the theft detection without involving downstream tasks; instead, we leverage mutual information and distance measurements. Our extensive empirical results in the vision domain demonstrate that dataset inference is a promising direction for defending self-supervised models against model stealing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题