用不完美的提示在线学习

论文标题

用不完美的提示在线学习

Online Learning with Imperfect Hints

论文作者

Bhaskara, Aditya, Cutkosky, Ashok, Kumar, Ravi, Purohit, Manish

论文摘要

我们考虑了经典在线线性优化问题的一种变体，其中在每个步骤中，在线玩家在选择该回合的动作之前都会收到“提示”向量。令人惊讶的是，这表明，如果提示向量与成本向量有正相关，那么在线播放器就可以遗憾的是$ o（\ log t）$，因此在一般环境中的$ o（\ sqrt {t}）$遗憾的是，它会显着改善。但是，结果和分析需要在\ emph {all}时间步骤处的相关属性，从而提出了自然的问题：我们可以设计对不良提示的在线学习算法吗？在本文中，我们开发了算法和几乎匹配的下限，以使用不完美的方向提示。我们的算法忽略了提示的质量，遗憾的界限是始终相关的提示案例和无颗粒案例之间的界限。我们的结果还概括，简化和改进了对乐观遗憾界限的先前结果，可以将其视为提示的加法版本。

We consider a variant of the classical online linear optimization problem in which at every step, the online player receives a "hint" vector before choosing the action for that round. Rather surprisingly, it was shown that if the hint vector is guaranteed to have a positive correlation with the cost vector, then the online player can achieve a regret of $O(\log T)$, thus significantly improving over the $O(\sqrt{T})$ regret in the general setting. However, the result and analysis require the correlation property at \emph{all} time steps, thus raising the natural question: can we design online learning algorithms that are resilient to bad hints? In this paper we develop algorithms and nearly matching lower bounds for online learning with imperfect directional hints. Our algorithms are oblivious to the quality of the hints, and the regret bounds interpolate between the always-correlated hints case and the no-hints case. Our results also generalize, simplify, and improve upon previous results on optimistic regret bounds, which can be viewed as an additive version of hints.

下载PDF全文

下载文献需遵守相关版权规定

论文标题