论文标题

MS-TCN ++:动作分割的多阶段时间卷积网络

MS-TCN++: Multi-Stage Temporal Convolutional Network for Action Segmentation

论文作者

Li, Shijie, Farha, Yazan Abu, Liu, Yun, Cheng, Ming-Ming, Gall, Juergen

论文摘要

随着深度学习在分类简短的视频中的成功,更多的注意力集中在长期未修剪视频中的时间细分和分类活动上。最新的动作分割方法采用了几层时间卷积和时间池的层。尽管这些方法具有捕获时间依赖性的能力,但它们的预测仍遭受了过度分割错误。在本文中,我们为时间动作分割任务提出了一个多阶段体系结构,以克服先前方法的局限性。第一阶段生成了下一个预测,下一个预测。在每个阶段,我们堆叠了几层扩张的时间卷积,涵盖了一个很少的参数。尽管这种体系结构的性能已经很好,但下层仍然遭受一个小型的接收场。为了解决这一限制,我们提出了一个结合大型和小型接受场的双重扩张层。我们进一步将第一阶段的设计从精炼阶段解开,以满足这些阶段的不同要求。广泛的评估显示了所提出的模型在捕获长期依赖性和识别行动段的有效性。我们的模型在三个数据集上实现了最先进的结果:50萨拉德人,佐治亚理工学院的中心活动(GTEA)和早餐数据集。

With the success of deep learning in classifying short trimmed videos, more attention has been focused on temporally segmenting and classifying activities in long untrimmed videos. State-of-the-art approaches for action segmentation utilize several layers of temporal convolution and temporal pooling. Despite the capabilities of these approaches in capturing temporal dependencies, their predictions suffer from over-segmentation errors. In this paper, we propose a multi-stage architecture for the temporal action segmentation task that overcomes the limitations of the previous approaches. The first stage generates an initial prediction that is refined by the next ones. In each stage we stack several layers of dilated temporal convolutions covering a large receptive field with few parameters. While this architecture already performs well, lower layers still suffer from a small receptive field. To address this limitation, we propose a dual dilated layer that combines both large and small receptive fields. We further decouple the design of the first stage from the refining stages to address the different requirements of these stages. Extensive evaluation shows the effectiveness of the proposed model in capturing long-range dependencies and recognizing action segments. Our models achieve state-of-the-art results on three datasets: 50Salads, Georgia Tech Egocentric Activities (GTEA), and the Breakfast dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源