对稀疏线性上下文匪徒问题的在线拉索的平滑分析

论文标题

对稀疏线性上下文匪徒问题的在线拉索的平滑分析

A Smoothed Analysis of Online Lasso for the Sparse Linear Contextual Bandit Problem

论文作者

Liu, Zhiyuan, Wang, Huazheng, Waggoner, Bo, Youjian, Liu, Chen, Lijun

论文摘要

我们研究了参数$θ$稀疏的稀疏线性上下文匪徒问题。为了减轻抽样效率低下，我们利用了“扰动对手”，其中上下文是生成对手，但具有小的随机非适应性扰动。我们证明，简单的在线拉索支持稀疏的线性上下文强盗，遗憾绑定$ \ mathcal {o}（\ sqrt {kt \ log d}）$，即使$ d \ gg t $中的$ k $和$ k $和$ d $是有效和环境尺寸的数量。与Sivakumar等人最近的工作相比。（2020），我们的分析不依赖于前提处理，自适应扰动（自适应扰动违反了I.I.D扰动设置）或错误集的截断。此外，我们的结果中的特殊结构明确表征了扰动如何影响勘探长度，指导扰动的设计以及扰动方法的基本性能极限。提供数值实验以补充理论分析。

We investigate the sparse linear contextual bandit problem where the parameter $θ$ is sparse. To relieve the sampling inefficiency, we utilize the "perturbed adversary" where the context is generated adversarilly but with small random non-adaptive perturbations. We prove that the simple online Lasso supports sparse linear contextual bandit with regret bound $\mathcal{O}(\sqrt{kT\log d})$ even when $d \gg T$ where $k$ and $d$ are the number of effective and ambient dimension, respectively. Compared to the recent work from Sivakumar et al. (2020), our analysis does not rely on the precondition processing, adaptive perturbation (the adaptive perturbation violates the i.i.d perturbation setting) or truncation on the error set. Moreover, the special structures in our results explicitly characterize how the perturbation affects exploration length, guide the design of perturbation together with the fundamental performance limit of perturbation method. Numerical experiments are provided to complement the theoretical analysis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题