论文标题

ID和OOD性能有时在现实世界数据集上呈负相关

ID and OOD Performance Are Sometimes Inversely Correlated on Real-world Datasets

论文作者

Teney, Damien, Lin, Yong, Oh, Seong Joon, Abbasnejad, Ehsan

论文摘要

几项研究比较了计算机视觉和NLP中模型的分布(ID)和分布(OOD)性能。他们报告了频繁的正相关性,有些令人惊讶的是,甚至从未观察到表明必要权衡的反相关性。逆模式的可能性对于确定ID性能是否可以作为OOD泛化功能的代理很重要。 本文使用多个数据集表明,ID和OOD性能之间的反相关性确实发生在现实世界数据中 - 不仅是在理论上最糟糕的设置中。从理论上讲,我们还可以解释这些情况即使在最低线性环境中也会出现,以及为什么由于模型选择有偏见,过去的研究可能会错过此类案例。 我们的观察结果导致建议与当前许多文献中发现的建议相矛盾。 - 高OOD性能有时需要交易ID性能。 - 仅专注于ID性能可能不会导致最佳OOD性能。它可能会在OOD性能中产生减少(最终为负)的回报。 - 在这些情况下,对使用ID性能进行模型选择的OOD泛化的研究(一种常见的建议实践)必然会错过表现最佳的模型,从而使这些研究对整个现象视而不见。

Several studies have compared the in-distribution (ID) and out-of-distribution (OOD) performance of models in computer vision and NLP. They report a frequent positive correlation and some surprisingly never even observe an inverse correlation indicative of a necessary trade-off. The possibility of inverse patterns is important to determine whether ID performance can serve as a proxy for OOD generalization capabilities. This paper shows with multiple datasets that inverse correlations between ID and OOD performance do happen in real-world data - not only in theoretical worst-case settings. We also explain theoretically how these cases can arise even in a minimal linear setting, and why past studies could miss such cases due to a biased selection of models. Our observations lead to recommendations that contradict those found in much of the current literature. - High OOD performance sometimes requires trading off ID performance. - Focusing on ID performance alone may not lead to optimal OOD performance. It may produce diminishing (eventually negative) returns in OOD performance. - In these cases, studies on OOD generalization that use ID performance for model selection (a common recommended practice) will necessarily miss the best-performing models, making these studies blind to a whole range of phenomena.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源