论文标题
重新思考跳过连接模型作为可学习的马尔可夫链
Rethinking skip connection model as a learnable Markov chain
论文作者
论文摘要
在过去的几年中,Resnet的诞生,Skip Connection由于其广泛采用,易于优化和经过验证的性能,已成为现代建筑设计的DefaTso标准。先前的工作已经从不同的角度解释了跳过连接机制的有效性。在这项工作中,我们通过跳过连接可以深入研究模型的行为,可以将其作为可学习的马尔可夫链。有效的马尔可夫链是优选的,因为它总是以更好的方式将输入数据映射到目标域。但是,虽然模型被解释为马尔可夫链,但不能保证在有效的马尔可夫链后通过现有基于SGD的优化器进行优化,这些优化器容易被捕获局部最佳点。为了迈向更有效的马尔可夫链,我们提出了一个简单的刑法连接程序,以使任何类似残留的模型成为可学习的马尔可夫链。除此之外,刑法连接也可以视为特定的模型正则化,并且可以在最流行的深度学习框架中轻松实现〜\ footNote {源代码:\ url {https://github.com/densechen/penal-connection}}}}。多模式翻译和图像识别的令人鼓舞的实验结果在经验上证实了我们对可学习的马尔可夫链观点的猜想,并证明了拟议的惩罚联系的优越性。
Over past few years afterward the birth of ResNet, skip connection has become the defacto standard for the design of modern architectures due to its widespread adoption, easy optimization and proven performance. Prior work has explained the effectiveness of the skip connection mechanism from different perspectives. In this work, we deep dive into the model's behaviors with skip connections which can be formulated as a learnable Markov chain. An efficient Markov chain is preferred as it always maps the input data to the target domain in a better way. However, while a model is explained as a Markov chain, it is not guaranteed to be optimized following an efficient Markov chain by existing SGD-based optimizers which are prone to get trapped in local optimal points. In order to towards a more efficient Markov chain, we propose a simple routine of penal connection to make any residual-like model become a learnable Markov chain. Aside from that, the penal connection can also be viewed as a particular model regularization and can be easily implemented with one line of code in the most popular deep learning frameworks~\footnote{Source code: \url{https://github.com/densechen/penal-connection}}. The encouraging experimental results in multi-modal translation and image recognition empirically confirm our conjecture of the learnable Markov chain view and demonstrate the superiority of the proposed penal connection.