论文标题
深度学习的影响功能很脆弱
Influence Functions in Deep Learning Are Fragile
论文作者
论文摘要
影响功能近似训练样品在测试时间预测中的影响,并且在机器学习可解释性和不确定性估计中具有多种应用。通常可以有效地实现一个普遍使用的(一阶)影响函数,作为事后方法,仅需要访问模型的梯度和黑森。对于线性模型,由于基本损耗函数的凸面性,影响函数的定义明确,并且即使在模型变化相当大的困难环境中,例如估计组的影响,也通常是准确的。然而,在通过非凸损失功能的深度学习中,影响功能并不是很好地理解。在本文中,我们对在诸如IRIS,MNIST,MNIST,CIFAR-10和IMAGENET等数据集中训练的神经网络模型中的成功和影响功能的成功和失败提供了全面的大规模实证研究。通过我们的广泛实验,我们表明网络体系结构,其深度和宽度以及模型参数化和正则化技术的程度在影响功能的准确性方面具有很强的影响。特别是,我们发现(i)影响估计值对于浅网络是相当准确的,而对于更深的网络,估计值通常是错误的; (ii)对于某些网络架构和数据集,重量定期的培训对于获得高质量影响估计很重要; (iii)影响估计值的准确性可能会明显变化,具体取决于所检查的测试点。这些结果表明,在深度学习中的一般影响功能是脆弱的,并呼吁开发改进的影响估计方法,以减轻非convex设置中的这些问题。
Influence functions approximate the effect of training samples in test-time predictions and have a wide variety of applications in machine learning interpretability and uncertainty estimation. A commonly-used (first-order) influence function can be implemented efficiently as a post-hoc method requiring access only to the gradients and Hessian of the model. For linear models, influence functions are well-defined due to the convexity of the underlying loss function and are generally accurate even across difficult settings where model changes are fairly large such as estimating group influences. Influence functions, however, are not well-understood in the context of deep learning with non-convex loss functions. In this paper, we provide a comprehensive and large-scale empirical study of successes and failures of influence functions in neural network models trained on datasets such as Iris, MNIST, CIFAR-10 and ImageNet. Through our extensive experiments, we show that the network architecture, its depth and width, as well as the extent of model parameterization and regularization techniques have strong effects in the accuracy of influence functions. In particular, we find that (i) influence estimates are fairly accurate for shallow networks, while for deeper networks the estimates are often erroneous; (ii) for certain network architectures and datasets, training with weight-decay regularization is important to get high-quality influence estimates; and (iii) the accuracy of influence estimates can vary significantly depending on the examined test points. These results suggest that in general influence functions in deep learning are fragile and call for developing improved influence estimation methods to mitigate these issues in non-convex setups.