机器人辅助手术的自动手术室手术活动识别

论文标题

机器人辅助手术的自动手术室手术活动识别

Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery

论文作者

Sharghi, Aidean, Haugerud, Helene, Oh, Daniel, Mohareri, Omid

论文摘要

自动识别手术室中的手术活动（OR）是创建下一代智能手术设备和工作流程监控/支持系统的关键技术。这样的系统可以潜在地提高OR的效率，从而导致成本较低，并改善了向患者提供的护理效率。在本文中，我们研究了机器人辅助操作中的自动手术活动识别。我们收集了第一个大型数据集，其中包括使用飞行器摄像机捕获的各种机器人手术案例中的400个全长的多人视频。我们用10种最公认和临床相关的活动来密集地注释视频。此外，我们研究了最先进的计算机视觉动作识别技术，并将它们适应或环境和数据集。首先，我们在数据集中微调了膨胀的3D Convnet（i3d），以识别剪辑级活动识别，并使用它从视频中提取功能。然后将这些功能馈送到一个由3个颞高斯混合层组成的堆栈中，这些混合层从相邻的剪辑中提取上下文，并最终通过长期的短期内存网络以全长视频学习活动顺序。我们广泛评估该模型，并达到平均平均精度为88％的峰值性能。

Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-assisted operations. We collect the first large-scale dataset including 400 full-length multi-perspective videos from a variety of robotic surgery cases captured using Time-of-Flight cameras. We densely annotate the videos with 10 most recognized and clinically relevant classes of activities. Furthermore, we investigate state-of-the-art computer vision action recognition techniques and adapt them for the OR environment and the dataset. First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity recognition on our dataset and use it to extract features from the videos. These features are then fed to a stack of 3 Temporal Gaussian Mixture layers which extracts context from neighboring clips, and eventually go through a Long Short Term Memory network to learn the order of activities in full-length videos. We extensively assess the model and reach a peak performance of 88% mean Average Precision.

下载PDF全文

下载文献需遵守相关版权规定

论文标题