论文标题
最大的多尺度熵和神经网络正则
Maximum Multiscale Entropy and Neural Network Regularization
论文作者
论文摘要
信息理论,机器学习和统计物理学的一个众所周知的结果表明,平均约束下的最大熵分布具有指数形式,称为Gibbs-Boltzmann分布。例如,在密度估计中使用或实现从单尺度熵正规化器(Xu-Raginsky '17)得出的多余风险范围。本文研究了这些结果对多尺度环境的概括。我们提出了通过结合规模概念来概括最大熵结果的不同方法。对于不同的熵和任意比例的变换,可以表明,最大化多尺度熵的分布的特征是该过程与统计物理学中的重新归一化组程序类似。对于拆卸转换的情况,进一步表明,每当最佳单尺度分布是高斯时,这种分布都是高斯。然后将其应用于神经网络,并表明在教师的情况下,多尺度Gibbs后部可以实现比单尺度Gibbs后部更小的多余风险。
A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This is used for instance in density estimation or to achieve excess risk bounds derived from single-scale entropy regularizers (Xu-Raginsky '17). This paper investigates a generalization of these results to a multiscale setting. We present different ways of generalizing the maximum entropy result by incorporating the notion of scale. For different entropies and arbitrary scale transformations, it is shown that the distribution maximizing a multiscale entropy is characterized by a procedure which has an analogy to the renormalization group procedure in statistical physics. For the case of decimation transformation, it is further shown that this distribution is Gaussian whenever the optimal single-scale distribution is Gaussian. This is then applied to neural networks, and it is shown that in a teacher-student scenario, the multiscale Gibbs posterior can achieve a smaller excess risk than the single-scale Gibbs posterior.