论文标题
贝叶斯与潜在汉密尔顿神经网络的推断
Bayesian Inference with Latent Hamiltonian Neural Networks
论文作者
论文摘要
当采样贝叶斯推论时,一种流行的方法是使用汉密尔顿蒙特卡洛(HMC),特别是No-U-Turn采样器(NUTS),该采样器(NUTS)自动决定汉密尔顿轨迹的结束时间。但是,HMC和螺母可能需要众多目标密度的数值梯度,并且在实践中可能会很慢。我们建议使用HMC和坚果解决贝叶斯推理问题的汉密尔顿神经网络(HNNS)。一旦训练,HNN不需要在采样过程中的目标密度的数值梯度。此外,它们满足了重要的特性,例如完美的时间可逆性和哈密顿保护,使其非常适合在HMC和坚果中使用,因为可以显示平稳性。我们还提出了一个称为潜在HNNS(L-HNN)的HNN扩展,该扩展能够预测潜在的可变输出。与HNN相比,L-HNNS提供了提高表达性和减少的集成误差。最后,我们在具有在线错误监测方案的螺母中采用L-HNNS,以防止低概率密度区域的样本退化。我们在螺母中展示了L-HNN,并在线错误监视了一些涉及复杂,重尾和高本地狂欢的概率密度的示例。总体而言,通过在线错误监测的坚果中的L-HNN令人满意地推断出这些概率密度。与传统的螺母相比,在线错误监控的坚果中的L-HNN需要1--2个数量密度的数值梯度,并通过数量级提高了每个梯度的有效样本量(ESS)。
When sampling for Bayesian inference, one popular approach is to use Hamiltonian Monte Carlo (HMC) and specifically the No-U-Turn Sampler (NUTS) which automatically decides the end time of the Hamiltonian trajectory. However, HMC and NUTS can require numerous numerical gradients of the target density, and can prove slow in practice. We propose Hamiltonian neural networks (HNNs) with HMC and NUTS for solving Bayesian inference problems. Once trained, HNNs do not require numerical gradients of the target density during sampling. Moreover, they satisfy important properties such as perfect time reversibility and Hamiltonian conservation, making them well-suited for use within HMC and NUTS because stationarity can be shown. We also propose an HNN extension called latent HNNs (L-HNNs), which are capable of predicting latent variable outputs. Compared to HNNs, L-HNNs offer improved expressivity and reduced integration errors. Finally, we employ L-HNNs in NUTS with an online error monitoring scheme to prevent sample degeneracy in regions of low probability density. We demonstrate L-HNNs in NUTS with online error monitoring on several examples involving complex, heavy-tailed, and high-local-curvature probability densities. Overall, L-HNNs in NUTS with online error monitoring satisfactorily inferred these probability densities. Compared to traditional NUTS, L-HNNs in NUTS with online error monitoring required 1--2 orders of magnitude fewer numerical gradients of the target density and improved the effective sample size (ESS) per gradient by an order of magnitude.