无（额外）监督的无解码器鲁棒性分离

论文标题

无（额外）监督的无解码器鲁棒性分离

Decoder-free Robustness Disentanglement without (Additional) Supervision

论文作者

Wang, Yifei, Peng, Dan, Liu, Furui, Li, Zhenguo, Chen, Zhitang, Yang, Jiansheng

论文摘要

提出了对抗性训练（AT），以通过从输入中提取强大的功能来减轻机器学习模型的对抗性脆弱性，但是，由于它丢弃了非稳定但有用的功能，因此不可避免地会导致严重的准确性降低。这促使我们保留了鲁棒和非稳固的特征，并通过分离的表示学学习将它们分开。我们提出的对抗性不对称训练（AAT）算法可以可靠地解散强大的和非稳定的表示，而无需在鲁棒性方面进行其他监督。经验结果表明，我们的方法不仅通过结合两种表示来成功地保留准确性，而且比以前的工作要取得更好的分离。

Adversarial Training (AT) is proposed to alleviate the adversarial vulnerability of machine learning models by extracting only robust features from the input, which, however, inevitably leads to severe accuracy reduction as it discards the non-robust yet useful features. This motivates us to preserve both robust and non-robust features and separate them with disentangled representation learning. Our proposed Adversarial Asymmetric Training (AAT) algorithm can reliably disentangle robust and non-robust representations without additional supervision on robustness. Empirical results show our method does not only successfully preserve accuracy by combining two representations, but also achieve much better disentanglement than previous work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题