克服神经价值近似的光谱偏差

论文标题

克服神经价值近似的光谱偏差

Overcoming the Spectral Bias of Neural Value Approximation

论文作者

Yang, Ge, Ajay, Anurag, Agrawal, Pulkit

论文摘要

使用深神经网络的价值近似是非政策深度强化学习的核心，并且通常是为其余算法提供学习信号的主要模块。虽然多层感知器网络是通用函数近似器，但神经内核回归的最新作品表明存在光谱偏差，在该值函数的高频组件中拟合的高频组件需要比低频率的梯度更新步骤。在这项工作中，我们通过内核回归的镜头重新检查了违反果冻的增强，并建议通过复合神经切线核克服这种偏见。只有单个线路变化，我们的方法，傅立叶功能网络（FFN）在挑战性连续控制域上产生最先进的性能，只有一小部分计算。更快的收敛速度和更好的非政策稳定性也使删除目标网络而不会遭受灾难性差异，这进一步降低了TD}（0）对一些任务的估计偏差。

Value approximation using deep neural networks is at the heart of off-policy deep reinforcement learning, and is often the primary module that provides learning signals to the rest of the algorithm. While multi-layer perceptron networks are universal function approximators, recent works in neural kernel regression suggest the presence of a spectral bias, where fitting high-frequency components of the value function requires exponentially more gradient update steps than the low-frequency ones. In this work, we re-examine off-policy reinforcement learning through the lens of kernel regression and propose to overcome such bias via a composite neural tangent kernel. With just a single line-change, our approach, the Fourier feature networks (FFN) produce state-of-the-art performance on challenging continuous control domains with only a fraction of the compute. Faster convergence and better off-policy stability also make it possible to remove the target network without suffering catastrophic divergences, which further reduces TD}(0)'s estimation bias on a few tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题