Dora：探索深神经网络中的离群值

论文标题

Dora：探索深神经网络中的离群值

DORA: Exploring Outlier Representations in Deep Neural Networks

论文作者

Bykov, Kirill, Deb, Mayukh, Grinwald, Dennis, Müller, Klaus-Robert, Höhne, Marina M. -C.

论文摘要

深度神经网络（DNNS）在其内部表示中学习复杂的抽象方面表现出色。但是，他们所学的概念仍然是不透明的，当模型无意间学习虚假相关性时，这个问题变得尤为严重。在这项工作中，我们介绍了dora（数据不可能的表示分析），这是第一个分析DNNS代表空间的数据无关框架。我们框架的核心是提出的极端激活（EA）距离度量，它通过分析导致最高激活水平的数据点上的激活模式来评估表示之间的相似性。由于虚假的相关性通常在与所需任务（例如水印或文物）异常的数据特征中表现出来，因此我们证明，可以通过分析神经表示中的关系来找到能够检测到这种人为概念的内部表示形式。我们定量验证EA指标，在受控方案和现实世界应用中证明了其有效性。最后，我们提供了来自流行的计算机视觉模型的实践示例，以说明使用EA指标被标识为异常值的表示通常对应于不希望的和虚假的概念。

Deep Neural Networks (DNNs) excel at learning complex abstractions within their internal representations. However, the concepts they learn remain opaque, a problem that becomes particularly acute when models unintentionally learn spurious correlations. In this work, we present DORA (Data-agnOstic Representation Analysis), the first data-agnostic framework for analyzing the representational space of DNNs. Central to our framework is the proposed Extreme-Activation (EA) distance measure, which assesses similarities between representations by analyzing their activation patterns on data points that cause the highest level of activation. As spurious correlations often manifest in features of data that are anomalous to the desired task, such as watermarks or artifacts, we demonstrate that internal representations capable of detecting such artifactual concepts can be found by analyzing relationships within neural representations. We validate the EA metric quantitatively, demonstrating its effectiveness both in controlled scenarios and real-world applications. Finally, we provide practical examples from popular Computer Vision models to illustrate that representations identified as outliers using the EA metric often correspond to undesired and spurious concepts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题