论文标题

关于超网的模块化

On the Modularity of Hypernetworks

论文作者

Galanti, Tomer, Wolf, Lior

论文摘要

In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii)HyperNetworks,其中功能的权重$θ_i$ $ h_i(x)= g(x;θ_i)$由hypernetwork $ f $ as $θ_i= f(i)$给出。在本文中,我们将模块化的属性定义为有效学习每个输入实例$ i $的不同功能的能力。为此,我们采用了对该财产的表达观点,并扩展了Devore等人的理论。 1996年,通过消除近似方法的要求,对神经网络的复杂性(可训练参数的数量)作为函数近似值提供了下限。然后,我们的结果被用来比较$ q $和$ g $的复杂性,表明在某些条件下,当让功能$ e $和$ f $的功能与我们希望的那样大时,$ g $可能会小于$ q $按数量级。与基于嵌入的方法相比,这阐明了超网的模块化。此外,我们表明,对于结构化的目标函数,超数量级中的可训练参数的总数比标准神经网络的可训练参数和嵌入方法的数量较小。

In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, two alternative methods are compared: (i) an embedding-based method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $θ_I$ of the function $h_I(x) = g(x;θ_I)$ are given by a hypernetwork $f$ as $θ_I=f(I)$. In this paper, we define the property of modularity as the ability to effectively learn a different function for each input instance $I$. For this purpose, we adopt an expressivity perspective of this property and extend the theory of Devore et al. 1996 and provide a lower bound on the complexity (number of trainable parameters) of neural networks as function approximators, by eliminating the requirements for the approximation method to be robust. Our results are then used to compare the complexities of $q$ and $g$, showing that under certain conditions and when letting the functions $e$ and $f$ be as large as we wish, $g$ can be smaller than $q$ by orders of magnitude. This sheds light on the modularity of hypernetworks in comparison with the embedding-based method. Besides, we show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源