论文标题
通用线性功能的高效且近乎最佳的在线学习
Efficient and Near-Optimal Smoothed Online Learning for Generalized Linear Functions
论文作者
论文摘要
由于顺序和批处理统计学习之间的复杂性存在巨大的差距,最近的工作研究了一个顺序学习设置,在该设置中,自然界被限制为与已知度量μ相对于1/σ界定的密度的选择上下文。不幸的是,对于某些功能类别,统计上最佳的遗憾与可以有效实现的遗憾之间存在指数差距。在本文中,我们给出了一种计算高效的算法,该算法是第一个享受可实现的K-Wise线性分类的统计上最佳日志(T/σ)遗憾的算法。我们将结果扩展到设置,其中真正的分类器在上下文的过度参数多项式特征以及可实现的分段回归设置中,假设访问适当的Erm Oracle。令人惊讶的是,基于标准分歧的分析不足以在1/σ中获得后悔对数。取而代之的是,我们开发了由广义线性分类器引起的分歧区域的几何形状的新颖表征。在此过程中,我们开发了许多独立感兴趣的技术工具,包括针对某些矩阵平均值决定因素的一般抗浓度。
Due to the drastic gap in complexity between sequential and batch statistical learning, recent work has studied a smoothed sequential learning setting, where Nature is constrained to select contexts with density bounded by 1/σ with respect to a known measure μ. Unfortunately, for some function classes, there is an exponential gap between the statistically optimal regret and that which can be achieved efficiently. In this paper, we give a computationally efficient algorithm that is the first to enjoy the statistically optimal log(T/σ) regret for realizable K-wise linear classification. We extend our results to settings where the true classifier is linear in an over-parameterized polynomial featurization of the contexts, as well as to a realizable piecewise-regression setting assuming access to an appropriate ERM oracle. Somewhat surprisingly, standard disagreement-based analyses are insufficient to achieve regret logarithmic in 1/σ. Instead, we develop a novel characterization of the geometry of the disagreement region induced by generalized linear classifiers. Along the way, we develop numerous technical tools of independent interest, including a general anti-concentration bound for the determinant of certain matrix averages.