学会在学习预测时停止

论文标题

学会在学习预测时停止

Learning to Stop While Learning to Predict

论文作者

Chen, Xinshi, Dai, Hanjun, Li, Yu, Gao, Xin, Song, Le

论文摘要

最近，基于传统算法的更新步骤或学习神经网络以改善和替换传统算法的更新步骤，最近人们引起了人们的兴趣。尽管传统算法在不同迭代时具有某些停止标准，但许多算法启发的深层模型仅限于所有输入的``固定深度''。与算法类似，对于不同的输入实例，深度体系结构的最佳深度可能会有所不同，以避免``过度思考''，或者是因为我们希望对已经收集的操作进行较少的计算。在本文中，我们使用可进入的体系结构解决了这个不同的深度问题，其中馈入深度模型和变分停止策略一起学习在一起，以顺序确定每个输入实例的最佳层数。培训这种建筑非常具有挑战性。我们提供了差异性贝叶斯的观点，并设计了一种新颖有效的训练程序，将任务分解为Oracle模型学习阶段和模仿阶段。在实验上，我们表明，学习的深层模型和停止政策可以改善各种任务的表现，包括学习稀疏恢复，少量元学习和计算机视觉任务。

There is a recent surge of interest in designing deep architectures based on the update steps in traditional algorithms, or learning neural networks to improve and replace traditional algorithms. While traditional algorithms have certain stopping criteria for outputting results at different iterations, many algorithm-inspired deep models are restricted to a ``fixed-depth'' for all inputs. Similar to algorithms, the optimal depth of a deep architecture may be different for different input instances, either to avoid ``over-thinking'', or because we want to compute less for operations converged already. In this paper, we tackle this varying depth problem using a steerable architecture, where a feed-forward deep model and a variational stopping policy are learned together to sequentially determine the optimal number of layers for each input instance. Training such architecture is very challenging. We provide a variational Bayes perspective and design a novel and effective training procedure which decomposes the task into an oracle model learning stage and an imitation stage. Experimentally, we show that the learned deep model along with the stopping policy improves the performances on a diverse set of tasks, including learning sparse recovery, few-shot meta learning, and computer vision tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题