ID和OOD性能有时在现实世界数据集上呈负相关

论文标题

ID和OOD性能有时在现实世界数据集上呈负相关

ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

论文作者

Teney, Damien, Lin, Yong, Oh, Seong Joon, Abbasnejad, Ehsan

论文摘要

几项研究比较了计算机视觉和NLP中模型的分布（ID）和分布（OOD）性能。他们报告了频繁的正相关性，有些令人惊讶的是，甚至从未观察到表明必要权衡的反相关性。逆模式的可能性对于确定ID性能是否可以作为OOD泛化功能的代理很重要。本文使用多个数据集表明，ID和OOD性能之间的反相关性确实发生在现实世界数据中 - 不仅是在理论上最糟糕的设置中。从理论上讲，我们还可以解释这些情况即使在最低线性环境中也会出现，以及为什么由于模型选择有偏见，过去的研究可能会错过此类案例。我们的观察结果导致建议与当前许多文献中发现的建议相矛盾。 - 高OOD性能有时需要交易ID性能。 - 仅专注于ID性能可能不会导致最佳OOD性能。它可能会在OOD性能中产生减少（最终为负）的回报。 - 在这些情况下，对使用ID性能进行模型选择的OOD泛化的研究（一种常见的建议实践）必然会错过表现最佳的模型，从而使这些研究对整个现象视而不见。

Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD generalization capabilities. This paper shows with multiple datasets that inverse correlations between ID and OOD performance do happen in real-world data - not only in theoretical worst-case settings. We also explain theoretically how these cases can arise even in a minimal linear setting, and why past studies could miss such cases due to a biased selection of models. Our observations lead to recommendations that contradict those found in much of the current literature. - High OOD performance sometimes requires trading off ID performance. - Focusing on ID performance alone may not lead to optimal OOD performance. It may produce diminishing (eventually negative) returns in OOD performance. - In these cases, studies on OOD generalization that use ID performance for model selection (a common recommended practice) will necessarily miss the best-performing models, making these studies blind to a whole range of phenomena.

下载PDF全文

下载文献需遵守相关版权规定

论文标题