通过自适应模型调度进行全面有效的数据标记

论文标题

通过自适应模型调度进行全面有效的数据标记

Comprehensive and Efficient Data Labeling via Adaptive Model Scheduling

论文作者

Yuan, Mu, Zhang, Lan, Li, Xiang-Yang, Xiong, Hui

论文摘要

标记数据（例如，在图像中对人，对象，动作和场景标记）是一项广泛而充满挑战的任务。提出了许多模型来标记各种数据，并设计了许多方法来增强深度学习模型或加速它们的能力。不幸的是，单个机器学习模型不足以从数据中提取各种语义信息。给定某些应用程序，例如图像检索平台和相册管理应用程序，通常需要执行模型集以获取足够的标签。通过有限的计算资源和严格的延迟，给定数据流以及一系列适用的渴望资源的深度学习模型的集合，我们设计了一种新颖的方法，以适应性地安排这些模型的子集以在每个数据项上执行，以最大程度地提高模型输出的价值（例如，高耐高率标签的数量）。实现这一崇高目标是不平凡的，因为任何数据项的模型输出都依赖于内容，并且未知。为了解决这个问题，我们提出了一个自适应模型调度框架，包括1）一种基于强化的学习方法，以通过在各种模型之间挖掘语义关系和2）两个启发式算法来预测非执行模型的价值，以适应在截止日期或截止日期的约束下适应模型执行顺序。所提出的框架不需要对数据的任何先验知识，这是对现有模型优化技术的强大补充。我们对五个不同的图像数据集和30个流行的图像标签模型进行了广泛的评估，以证明我们的设计有效性：我们的设计可以节省约53 \％的执行时间，而不会丢失任何有价值的标签。

Labeling data (e.g., labeling the people, objects, actions and scene in images) comprehensively and efficiently is a widely needed but challenging task. Numerous models were proposed to label various data and many approaches were designed to enhance the ability of deep learning models or accelerate them. Unfortunately, a single machine-learning model is not powerful enough to extract various semantic information from data. Given certain applications, such as image retrieval platforms and photo album management apps, it is often required to execute a collection of models to obtain sufficient labels. With limited computing resources and stringent delay, given a data stream and a collection of applicable resource-hungry deep-learning models, we design a novel approach to adaptively schedule a subset of these models to execute on each data item, aiming to maximize the value of the model output (e.g., the number of high-confidence labels). Achieving this lofty goal is nontrivial since a model's output on any data item is content-dependent and unknown until we execute it. To tackle this, we propose an Adaptive Model Scheduling framework, consisting of 1) a deep reinforcement learning-based approach to predict the value of unexecuted models by mining semantic relationship among diverse models, and 2) two heuristic algorithms to adaptively schedule the model execution order under a deadline or deadline-memory constraints respectively. The proposed framework doesn't require any prior knowledge of the data, which works as a powerful complement to existing model optimization technologies. We conduct extensive evaluations on five diverse image datasets and 30 popular image labeling models to demonstrate the effectiveness of our design: our design could save around 53\% execution time without loss of any valuable labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题