论文标题

背景可解释的机器学习

Backdooring Explainable Machine Learning

论文作者

Noppel, Maximilian, Peter, Lukas, Wressnegger, Christian

论文摘要

可解释的机器学习具有分析和理解基于学习的系统的巨大潜力。但是,可以操纵这些方法以提出不忠的解释,从而产生强大而隐形的对手。在本文中,我们展示了令人眼花攻击的攻击,这些攻击可以完全掩盖对机器学习模型的持续攻击。与神经后门类似,我们修改了该模型对触发存在的预测,但同时也欺骗了提供的解释。这使对手可以隐藏触发器的存在或将解释指向完全不同的输入部分,从而抛出红鲱鱼。在我们恢复对恶意软件分类进行红色的攻击之前,我们分析了图像域中不同解释类型的此类攻击的不同表现。

Explainable machine learning holds great potential for analyzing and understanding learning-based systems. These methods can, however, be manipulated to present unfaithful explanations, giving rise to powerful and stealthy adversaries. In this paper, we demonstrate blinding attacks that can fully disguise an ongoing attack against the machine learning model. Similar to neural backdoors, we modify the model's prediction upon trigger presence but simultaneously also fool the provided explanation. This enables an adversary to hide the presence of the trigger or point the explanation to entirely different portions of the input, throwing a red herring. We analyze different manifestations of such attacks for different explanation types in the image domain, before we resume to conduct a red-herring attack against malware classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源