论文标题
贝叶斯深度学习的案例
The Case for Bayesian Deep Learning
论文作者
论文摘要
贝叶斯方法的主要区别特性是边缘化而不是优化,而不是先验或贝叶斯规则。贝叶斯推论对于深层神经网络特别有说服力。 (1)神经网络通常由数据指定,并且可以代表与参数不同设置相对应的许多不同但高性能的模型,这正是边缘化将对校准和准确性带来最大的差异。 (2)深层合奏被误认为是贝叶斯方法的竞争方法,但可以看作是近似的贝叶斯边缘化。 (3)神经网络的结构产生了功能空间的结构化先验,这反映了有助于它们概括的神经网络的电感偏差。 (4)观察到的损失平坦区域的参数与提供良好概括的各种解决方案之间的相关性进一步有助于贝叶斯边缘化,因为平坦区域在高维空间中占据了很大的体积,并且每个不同的解决方案都会为贝叶斯模型的平均值做出良好的贡献。 (5)与标准培训相比,贝叶斯深度学习的最新实际进步可改善准确性和校准,同时保持可伸缩性。
The key distinguishing property of a Bayesian approach is marginalization instead of optimization, not the prior, or Bayes rule. Bayesian inference is especially compelling for deep neural networks. (1) Neural networks are typically underspecified by the data, and can represent many different but high performing models corresponding to different settings of parameters, which is exactly when marginalization will make the biggest difference for both calibration and accuracy. (2) Deep ensembles have been mistaken as competing approaches to Bayesian methods, but can be seen as approximate Bayesian marginalization. (3) The structure of neural networks gives rise to a structured prior in function space, which reflects the inductive biases of neural networks that help them generalize. (4) The observed correlation between parameters in flat regions of the loss and a diversity of solutions that provide good generalization is further conducive to Bayesian marginalization, as flat regions occupy a large volume in a high dimensional space, and each different solution will make a good contribution to a Bayesian model average. (5) Recent practical advances for Bayesian deep learning provide improvements in accuracy and calibration compared to standard training, while retaining scalability.