从机器学习模型中的近似数据删除

论文标题

从机器学习模型中的近似数据删除

Approximate Data Deletion from Machine Learning Models

论文作者

Izzo, Zachary, Smart, Mary Anne, Chaudhuri, Kamalika, Zou, James

论文摘要

在许多应用程序中，从训练有素的机器学习（ML）模型中删除数据是一项关键任务。例如，我们可能要删除可能已经过时或离群值的培训点的影响。诸如欧盟一般数据保护法规之类的法规还规定，个人可以要求删除其数据。数据删除的幼稚方法是在其余数据上重新训练ML模型，但这太耗时了。在这项工作中，我们为线性和逻辑模型提出了一种新的近似删除方法，其计算成本在特征尺寸$ d $中是线性，并且独立于培训数据$ n $的数量。这是所有现有方法的重要增长，所有方法都具有对维度的超线性时间依赖性。我们还开发了一项新的功能注射测试，以评估ML模型中数据删除的透彻性。

Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU's General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. In this work, we propose a new approximate deletion method for linear and logistic models whose computational cost is linear in the the feature dimension $d$ and independent of the number of training data $n$. This is a significant gain over all existing methods, which all have superlinear time dependence on the dimension. We also develop a new feature-injection test to evaluate the thoroughness of data deletion from ML models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题