论文标题
蒸馏并收集半监督的时间动作细分
Distill and Collect for Semi-Supervised Temporal Action Segmentation
论文作者
论文摘要
最近的时间动作细分方法需要训练期间的框架注释才能有效。这些注释非常昂贵且耗时。当仅可用带注释的数据时,这限制了他们的性能。相比之下,我们可以通过互联网清除互联网来轻松地收集大量的未识别视频。因此,本文提出了一种时间动作分割任务的方法,该方法可以同时利用带注释和未经注释的视频序列中的知识。我们的方法使用多流蒸馏,反复完整地完善并最终结合了其框架预测。我们的模型还预测了行动顺序,该操作顺序后来用作时间约束,同时估算框架标签以应对缺乏对未注销视频的监督。最后,我们对两个不同数据集的拟议方法的评估表明,尽管注释有限,但其能力与完全监督相当的性能。
Recent temporal action segmentation approaches need frame annotations during training to be effective. These annotations are very expensive and time-consuming to obtain. This limits their performances when only limited annotated data is available. In contrast, we can easily collect a large corpus of in-domain unannotated videos by scavenging through the internet. Thus, this paper proposes an approach for the temporal action segmentation task that can simultaneously leverage knowledge from annotated and unannotated video sequences. Our approach uses multi-stream distillation that repeatedly refines and finally combines their frame predictions. Our model also predicts the action order, which is later used as a temporal constraint while estimating frames labels to counter the lack of supervision for unannotated videos. In the end, our evaluation of the proposed approach on two different datasets demonstrates its capability to achieve comparable performance to the full supervision despite limited annotation.