论文标题
带有原位对象注释的手势感知互动机教学
Gesture-aware Interactive Machine Teaching with In-situ Object Annotations
论文作者
论文摘要
交互式机器教学(IMT)系统允许非专家轻松创建机器学习(ML)模型。但是,现有的基于视觉的IMT系统要么忽略对目标对象的注释,要么要求用户以事后方式注释。没有对象上的注释,模型可能会使用无关功能误解对象。事后注释会导致额外的工作量,从而降低了整体模型构建过程的可用性。在本文中,我们开发了外观,将原位对象注释集成到基于视觉的IMT中。 Lookhere利用用户的dectic手势,以实时分割感兴趣的对象。此细分信息可用于培训。为了实现此对象细分的可靠性能,我们使用了称为Hutics的自定义数据集,其中包括2040个朝向Deictic手势的前面图像,对各种对象,有170人。我们的用户研究的定量结果表明,与具有事后后注释过程的标准IMT系统相比,与我们的系统建立模型相比,参与者在创建系统模型方面的速度快16.3倍,同时证明了可比的精度。此外,与没有注释的对象相比,我们系统创建的模型在分割感兴趣的对象方面显示出明显的准确性提高($Δmiou= 0.466 $)。
Interactive Machine Teaching (IMT) systems allow non-experts to easily create Machine Learning (ML) models. However, existing vision-based IMT systems either ignore annotations on the objects of interest or require users to annotate in a post-hoc manner. Without the annotations on objects, the model may misinterpret the objects using unrelated features. Post-hoc annotations cause additional workload, which diminishes the usability of the overall model building process. In this paper, we develop LookHere, which integrates in-situ object annotations into vision-based IMT. LookHere exploits users' deictic gestures to segment the objects of interest in real time. This segmentation information can be additionally used for training. To achieve the reliable performance of this object segmentation, we utilize our custom dataset called HuTics, including 2040 front-facing images of deictic gestures toward various objects by 170 people. The quantitative results of our user study showed that participants were 16.3 times faster in creating a model with our system compared to a standard IMT system with a post-hoc annotation process while demonstrating comparable accuracies. Additionally, models created by our system showed a significant accuracy improvement ($ΔmIoU=0.466$) in segmenting the objects of interest compared to those without annotations.