通过与Wasserstein度量匹配，在神经网络中正规化激活

论文标题

通过与Wasserstein度量匹配，在神经网络中正规化激活

Regularizing activations in neural networks via distribution matching with the Wasserstein metric

论文作者

Joo, Taejong, Kang, Donggu, Kim, Byunghoon

论文摘要

正则化和归一化已成为训练深层神经网络中必不可少的组成部分，从而导致更快的训练和改善的概括性能。我们建议鼓励激活遵循标准正态分布的投影误差函数正规化损失（PER）。每个随机将激活投射到一维空间上，并计算投影空间中的正则化损失。 PER类似于预计空间中的伪Huber损失，因此利用了$ l^1 $和$ l^2 $正规化损失。此外，通过从单位球体绘制的投影向量捕获隐藏单元之间的相互作用。通过这样做，每次每次降低了订单一阶的上限，在激活的经验分布和标准正态分布之间。据作者所知，这是通过概率分配空间中的分布匹配来使激活正规化的第一项工作。我们在图像分类任务和单词级别的语言建模任务上评估了建议的方法。

Regularization and normalization have become indispensable components in training deep neural networks, resulting in faster training and improved generalization performance. We propose the projected error function regularization loss (PER) that encourages activations to follow the standard normal distribution. PER randomly projects activations onto one-dimensional space and computes the regularization loss in the projected space. PER is similar to the Pseudo-Huber loss in the projected space, thus taking advantage of both $L^1$ and $L^2$ regularization losses. Besides, PER can capture the interaction between hidden units by projection vector drawn from a unit sphere. By doing so, PER minimizes the upper bound of the Wasserstein distance of order one between an empirical distribution of activations and the standard normal distribution. To the best of the authors' knowledge, this is the first work to regularize activations via distribution matching in the probability distribution space. We evaluate the proposed method on the image classification task and the word-level language modeling task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题