关于高级计算的迭代复杂性

论文标题

关于高级计算的迭代复杂性

On the Iteration Complexity of Hypergradient Computation

论文作者

Grazzi, Riccardo, Franceschi, Luca, Pontil, Massimiliano, Salzo, Saverio

论文摘要

我们研究了一类养生问题，包括最小化上层物镜的最小化，该目标取决于解决参数固定点方程的解决方案。机器学习中产生的重要实例包括超参数优化，元学习以及某些图和复发性神经网络。通常，上层物镜的梯度（高级别）的梯度很难或甚至不可能准确地计算，这引起了人们对近似方法的兴趣。我们基于反向模式的迭代分化和近似隐式分化来研究一些流行的方法来计算高级别的方法。在固定点方程是由收缩映射定义的假设下，我们提出了统一的分析，该分析首次允许定量比较这些方法，从而为其迭代复杂性提供明确的界限。该分析表明，在上述方法之间的计算效率方面是一个层次结构，基于共轭梯度表现最好的近似隐式分化。我们对确认理论发现的方法进行了广泛的实验比较。

We study a general class of bilevel problems, consisting in the minimization of an upper-level objective which depends on the solution to a parametric fixed-point equation. Important instances arising in machine learning include hyperparameter optimization, meta-learning, and certain graph and recurrent neural networks. Typically the gradient of the upper-level objective (hypergradient) is hard or even impossible to compute exactly, which has raised the interest in approximation methods. We investigate some popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. Under the hypothesis that the fixed point equation is defined by a contraction mapping, we present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题