层次上的组成任务和深度卷积网络

论文标题

层次上的组成任务和深度卷积网络

Hierarchically Compositional Tasks and Deep Convolutional Networks

论文作者

Deza, Arturo, Liao, Qianli, Banburski, Andrzej, Poggio, Tomaso

论文摘要

深度学习的主要成功案例从Imagenet开始，取决于深度卷积网络，在某些任务上的表现要比传统的浅层分类器（例如支持向量机器）好得多，并且比深度完全连接的网络更好。但是，深层卷积网络有什么特别之处？近似理论的最新结果证明了深度卷积网络的指数优势，在其组成结构中具有分层局部性的近似函数中有或没有共享权重。最近，事实证明，层次结构很难从数据中学习，这表明它是网络架构中的强大先验嵌入。但是，这些数学结果并不是说哪些现实生活任务对应于具有层次结构局部性的输入输出函数。为了评估这一点，我们考虑了一组视觉任务，我们通过“确定性的争夺”破坏了当地的图像组织，以稍后在结构上以相同方式进行训练和测试的方式对这些图像进行视觉任务。对于物体识别，我们发现，正如预期的那样，争夺不会影响浅层或深连接网络的性能，与卷积网络的表现相反。但是，并非所有涉及图像的任务都受到影响。纹理感知和全局颜色估计对确定性的争夺敏感得多，表明与这些任务相对应的基本功能在层次上不是本地的。而且还违反直觉表明，这些任务是通过不深（纹理）或卷积（颜色）的网络更好地近似的。总的来说，这些结果阐明了将网络体系结构与要学习的任务的嵌入式构造匹配的重要性。

The main success stories of deep learning, starting with ImageNet, depend on deep convolutional networks, which on certain tasks perform significantly better than traditional shallow classifiers, such as support vector machines, and also better than deep fully connected networks; but what is so special about deep convolutional networks? Recent results in approximation theory proved an exponential advantage of deep convolutional networks with or without shared weights in approximating functions with hierarchical locality in their compositional structure. More recently, the hierarchical structure was proved to be hard to learn from data, suggesting that it is a powerful prior embedded in the architecture of the network. These mathematical results, however, do not say which real-life tasks correspond to input-output functions with hierarchical locality. To evaluate this, we consider a set of visual tasks where we disrupt the local organization of images via "deterministic scrambling" to later perform a visual task on these images structurally-altered in the same way for training and testing. For object recognition we find, as expected, that scrambling does not affect the performance of shallow or deep fully connected networks contrary to the out-performance of convolutional networks. Not all tasks involving images are however affected. Texture perception and global color estimation are much less sensitive to deterministic scrambling showing that the underlying functions corresponding to these tasks are not hierarchically local; and also counter-intuitively showing that these tasks are better approximated by networks that are not deep (texture) nor convolutional (color). Altogether, these results shed light into the importance of matching a network architecture with its embedded prior of the task to be learned.

下载PDF全文

下载文献需遵守相关版权规定

论文标题