论文标题
通过最坏情况最大化学习对手的强大表示
Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization
论文作者
论文摘要
训练机器学习模型强大的对抗性投入提出了看似无法克服的挑战。为了更好地理解对抗性的鲁棒性,我们考虑了学习健壮表示的根本问题。我们制定了表示漏洞的表示概念,该概念在最差的输入扰动下捕获了输入和输出分布之间相互信息的最大变化。然后,我们证明了一个定理,该定理建立了基于其代表性漏洞的任何下游分类器可以实现的最小对抗风险的下限。我们提出了一种无监督的学习方法,用于通过最大化输入和输出分布之间的最差案例相互信息来获得本质上可靠的表示。下游分类任务的实验支持使用无监督的学习与我们的培训原则发现的表示形式的鲁棒性。
Training machine learning models that are robust against adversarial inputs poses seemingly insurmountable challenges. To better understand adversarial robustness, we consider the underlying problem of learning robust representations. We develop a notion of representation vulnerability that captures the maximum change of mutual information between the input and output distributions, under the worst-case input perturbation. Then, we prove a theorem that establishes a lower bound on the minimum adversarial risk that can be achieved for any downstream classifier based on its representation vulnerability. We propose an unsupervised learning method for obtaining intrinsically robust representations by maximizing the worst-case mutual information between the input and output distributions. Experiments on downstream classification tasks support the robustness of the representations found using unsupervised learning with our training principle.