论文标题

MCAL:最低成本人机主动标签

MCAL: Minimum Cost Human-Machine Active Labeling

论文作者

Qiu, Hang, Chintalapudi, Krishna, Govindan, Ramesh

论文摘要

如今,地面真实生成使用基于云的注释服务注释的数据集。这些服务依赖于人类注释,这可能非常昂贵。在本文中,我们考虑了混合人机标记的问题,该标签训练分类器以准确的数据集自动标记部分。但是,培训分类器也可能很昂贵。我们提出了一种迭代方法,该方法在每个步骤中都可以将总的总成本最小化,从而共同确定使用人类标记哪些样品,哪些样品使用训练有素的分类器进行标记。我们在众所周知的公共数据集上验证了我们的方法,例如Fashion-Mnist,CIFAR-10,CIFAR-100和Imagenet。在某些情况下,相对于人类标记整个数据集的标签,我们的方法的总成本低6倍,并且始终比最便宜的竞争策略便宜。

Today, ground-truth generation uses data sets annotated by cloud-based annotation services. These services rely on human annotation, which can be prohibitively expensive. In this paper, we consider the problem of hybrid human-machine labeling, which trains a classifier to accurately auto-label part of the data set. However, training the classifier can be expensive too. We propose an iterative approach that minimizes total overall cost by, at each step, jointly determining which samples to label using humans and which to label using the trained classifier. We validate our approach on well known public data sets such as Fashion-MNIST, CIFAR-10, CIFAR-100, and ImageNet. In some cases, our approach has 6x lower overall cost relative to human labeling the entire data set, and is always cheaper than the cheapest competing strategy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源