论文标题

低级最佳运输:近似,统计和辩护

Low-rank Optimal Transport: Approximation, Statistics and Debiasing

论文作者

Scetbon, Meyer, Cuturi, Marco

论文摘要

最佳传输(OT)背后的匹配原理在机器学习中起着越来越重要的作用,当使用OT来消除应用程序中的数据集(例如,单细胞基因组学)或用于改善更复杂的方法时,可以观察到这种趋势(例如,在变形金刚中的关注者平衡的注意力或自助服务的学习)。为了扩展到更具挑战性的问题,越来越多的共识要求求解器可以在数百万而不是数千个点上运作。在\ cite {scetbon2021lowrank}中提倡的低级最佳运输(LOT)方法在这方面有几个承诺,并被证明可以补充更确定的熵正则化方法,能够将自己插入更复杂的管道中,例如Quadratic OT。批次将低成本耦合的搜索限制在具有低位级等级的耦合方面,从而在感兴趣的情况下产生线性时间算法。但是,只有在比较感兴趣的特性时,这些诺言才能实现这些诺言,而记分卡通常包含理论属性(统计复杂性和与其他方法)或实际方面(模糊次数,多参数调谐,初始化,初始化)。我们针对本文中的每个领域,以巩固计算ot中低级别方法的影响。

The matching principles behind optimal transport (OT) play an increasingly important role in machine learning, a trend which can be observed when OT is used to disambiguate datasets in applications (e.g. single-cell genomics) or used to improve more complex methods (e.g. balanced attention in transformers or self-supervised learning). To scale to more challenging problems, there is a growing consensus that OT requires solvers that can operate on millions, not thousands, of points. The low-rank optimal transport (LOT) approach advocated in \cite{scetbon2021lowrank} holds several promises in that regard, and was shown to complement more established entropic regularization approaches, being able to insert itself in more complex pipelines, such as quadratic OT. LOT restricts the search for low-cost couplings to those that have a low-nonnegative rank, yielding linear time algorithms in cases of interest. However, these promises can only be fulfilled if the LOT approach is seen as a legitimate contender to entropic regularization when compared on properties of interest, where the scorecard typically includes theoretical properties (statistical complexity and relation to other methods) or practical aspects (debiasing, hyperparameter tuning, initialization). We target each of these areas in this paper in order to cement the impact of low-rank approaches in computational OT.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源