带有马尔可夫数据的最小二乘回归：基本限制和算法

论文标题

带有马尔可夫数据的最小二乘回归：基本限制和算法

Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms

论文作者

Bresler, Guy, Jain, Prateek, Nagaraj, Dheeraj, Netrapalli, Praneeth, Wu, Xian

论文摘要

我们研究了最小二乘线性回归的问题，其中数据点是依赖并从马尔可夫链中取样的。我们以$τ_ {\ mathsf {mix}} $（在不同的噪声设置下的基础马尔可夫链的混合时间）建立了此问题的尖锐信息为此问题的下限。我们的结果表明，通常，具有独立数据的优化和优化的优化通常要难，而一种微不足道的算法（SGD-DD）仅适用于每个$ \tildeθ（τ_{\ Mathsf {mix}}）$样本，这些样品近似独立，是最小的，是最小的。实际上，它严格比具有恒定步尺的流行随机梯度下降（SGD）方法要好，否则在带有独立数据设置的回归中，最小值是最佳的。除了最坏的情况分析外，我们还研究了在实践中看到的结构化数据集（例如高斯自动回归动力学）是否可以接受更有效的优化方案。令人惊讶的是，即使在这种特定自然的环境中，具有恒定尺寸的随机梯度下降（SGD）仍然不超过SGD-DD。取而代之的是，我们提出了一种基于经验重播的算法（一种流行的增强学习技术），该算法的错误率明显更高。我们提高的速率是最早的结果之一，其中算法在一个有趣的马尔可夫链上优于SGD-DD，并且还提供了支持在实践中使用经验重播的第一个理论分析之一。

We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain. We establish sharp information theoretic minimax lower bounds for this problem in terms of $τ_{\mathsf{mix}}$, the mixing time of the underlying Markov chain, under different noise settings. Our results establish that in general, optimization with Markovian data is strictly harder than optimization with independent data and a trivial algorithm (SGD-DD) that works with only one in every $\tildeΘ(τ_{\mathsf{mix}})$ samples, which are approximately independent, is minimax optimal. In fact, it is strictly better than the popular Stochastic Gradient Descent (SGD) method with constant step-size which is otherwise minimax optimal in the regression with independent data setting. Beyond a worst case analysis, we investigate whether structured datasets seen in practice such as Gaussian auto-regressive dynamics can admit more efficient optimization schemes. Surprisingly, even in this specific and natural setting, Stochastic Gradient Descent (SGD) with constant step-size is still no better than SGD-DD. Instead, we propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate. Our improved rate serves as one of the first results where an algorithm outperforms SGD-DD on an interesting Markov chain and also provides one of the first theoretical analyses to support the use of experience replay in practice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题