论文标题
近凸功能的核心
Coresets for Near-Convex Functions
论文作者
论文摘要
COLESET通常是$ \ Mathbb {r}^d $中$ n $输入点的一个较小的加权子集,可证明其在给定的查询集(模型,分类器等)中近似其损失函数。由于现有的启发式方法或效率低下的算法可能会通过在小核心上运行多次,因此可以维护以用于流分布式数据,因此可以改善核心在机器学习中越来越普遍。可以通过灵敏度(重要性)采样获得核心,其中其大小与敏感性总和成正比。不幸的是,计算每个点的敏感性取决于问题,并且可能比手头的原始优化问题更难计算。 我们为计算广泛损失函数家族的敏感性(以及核心)的一般框架提出了一个通用框架,我们称之为近凸功能。这是通过建议$ f $ -SVD分解,从而将矩阵的SVD分解概括为函数。示例应用程序包括新的或显着改善先前结果的核心,例如SVM,Logistic回归,M-估计器和$ \ ell_z $ - $ - $ Recressress。还提供了实验结果和开源。
Coreset is usually a small weighted subset of $n$ input points in $\mathbb{R}^d$, that provably approximates their loss function for a given set of queries (models, classifiers, etc.). Coresets become increasingly common in machine learning since existing heuristics or inefficient algorithms may be improved by running them possibly many times on the small coreset that can be maintained for streaming distributed data. Coresets can be obtained by sensitivity (importance) sampling, where its size is proportional to the total sum of sensitivities. Unfortunately, computing the sensitivity of each point is problem dependent and may be harder to compute than the original optimization problem at hand. We suggest a generic framework for computing sensitivities (and thus coresets) for wide family of loss functions which we call near-convex functions. This is by suggesting the $f$-SVD factorization that generalizes the SVD factorization of matrices to functions. Example applications include coresets that are either new or significantly improves previous results, such as SVM, Logistic regression, M-estimators, and $\ell_z$-regression. Experimental results and open source are also provided.