论文标题
关于分类的最后层算法:与不确定性估计的分离表示
On Last-Layer Algorithms for Classification: Decoupling Representation from Uncertainty Estimation
论文作者
论文摘要
深度学习的不确定性量化是一个具有挑战性的开放问题。贝叶斯统计数据提供了一个数学上扎根的框架,以推理不确定性;但是,现代神经网络的近似后代仍然需要高度的计算成本。我们提出了一个算法家族,将分类任务分为两个阶段:表示学习和不确定性估计。我们比较了四个特定实例,其中通过随机梯度下降或随机梯度Langevin Dynamics快照,自举logistic logistic回归的集合或通过许多蒙特卡洛掉落通过。我们根据\ emph {pelective}分类(风险覆盖)评估它们的性能及其检测到分布样本的能力。我们的实验表明,在深层分类器中添加多个不确定性层的价值有限,我们观察到这些简单的方法在某些复杂的基准测试基准(例如ImageNet)中强烈胜过香草点估计的SGD。
Uncertainty quantification for deep learning is a challenging open problem. Bayesian statistics offer a mathematically grounded framework to reason about uncertainties; however, approximate posteriors for modern neural networks still require prohibitive computational costs. We propose a family of algorithms which split the classification task into two stages: representation learning and uncertainty estimation. We compare four specific instances, where uncertainty estimation is performed via either an ensemble of Stochastic Gradient Descent or Stochastic Gradient Langevin Dynamics snapshots, an ensemble of bootstrapped logistic regressions, or via a number of Monte Carlo Dropout passes. We evaluate their performance in terms of \emph{selective} classification (risk-coverage), and their ability to detect out-of-distribution samples. Our experiments suggest there is limited value in adding multiple uncertainty layers to deep classifiers, and we observe that these simple methods strongly outperform a vanilla point-estimate SGD in some complex benchmarks like ImageNet.