从部分情节中学习gflownets，以提高收敛性和稳定性

论文标题

从部分情节中学习gflownets，以提高收敛性和稳定性

Learning GFlowNets from partial episodes for improved convergence and stability

论文作者

Madan, Kanika, Rector-Brooks, Jarrid, Korablyov, Maksym, Bengio, Emmanuel, Jain, Moksh, Nica, Andrei, Bosc, Tom, Bengio, Yoshua, Malkin, Nikolay

论文摘要

生成流动网络（Gflownets）是算法系列，用于训练在非均衡目标密度下离散对象的顺序采样器，并已成功用于各种概率建模任务。现有的Gflownets培训目标是国家本地的，或者是过渡的本地，或者在整个采样轨迹上传播奖励信号。我们认为，这些替代方案代表了梯度偏见变化权衡的相反目的，并提出了一种利用这种权衡以减轻其有害影响的方法。受TD（$λ$）算法增强学习的启发，我们引入了子区域余额或subtb（$λ$），这是一个可以从不同长度的部分动作子中学习的Gflownet培训目标。我们表明，在先前研究和新的环境中，SubTB（$λ$）加速了采样器的收敛，并在具有更长的动作序列和比以前可能更长的环境中训练GFLOWNETS。我们还对随机梯度动力学进行了比较分析，阐明了GFLOWNET训练中的偏差变化权衡以及亚条件平衡的优势。

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD($λ$) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB($λ$), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB($λ$) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题