论文标题

数据驱动算法设计

Data-driven Algorithm Design

论文作者

Balcan, Maria-Florina

论文摘要

数据驱动算法设计是现代数据科学和算法设计的重要方面。从业人员通常不使用只有最恶劣的案例性能保证的货架算法,而是使用来自其域中的训练问题实例来确定未来实例中高预期的配置,因此经常对大型参数化算法进行优化,并调整这些算法的参数。但是,这项工作的大部分都没有绩效保证。面临的挑战是,对于许多重要性的组合问题,包括分区,子集选择和对齐问题,对参数的小调整可能会导致算法的行为发生变化,因此算法的性能是其参数的不连续功能。 在本章中,我们调查了有助于将数据驱动组合算法设计放在公司基础上的最新工作。对于批处理和在线场景,我们提供了强大的计算和统计性能保证,其中分别每次或以在线方式介绍了给定应用程序的典型问题实例。

Data driven algorithm design is an important aspect of modern data science and algorithm design. Rather than using off the shelf algorithms that only have worst case performance guarantees, practitioners often optimize over large families of parametrized algorithms and tune the parameters of these algorithms using a training set of problem instances from their domain to determine a configuration with high expected performance over future instances. However, most of this work comes with no performance guarantees. The challenge is that for many combinatorial problems of significant importance including partitioning, subset selection, and alignment problems, a small tweak to the parameters can cause a cascade of changes in the algorithm's behavior, so the algorithm's performance is a discontinuous function of its parameters. In this chapter, we survey recent work that helps put data-driven combinatorial algorithm design on firm foundations. We provide strong computational and statistical performance guarantees, both for the batch and online scenarios where a collection of typical problem instances from the given application are presented either all at once or in an online fashion, respectively.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源