论文标题
通过动作三重镜在内窥镜视频中识别仪器组织互动
Recognition of Instrument-Tissue Interactions in Endoscopic Videos via Action Triplets
论文作者
论文摘要
识别外科活动是为手术室开发背景意见决策支持的重要组成部分。在这项工作中,我们解决了对精细颗粒活动的识别,以动作三重态<仪器,动词,目标>代表工具活动。为此,我们引入了一个新的腹腔镜数据集Cholect40,由公共数据集Cholec80的40个视频组成,其中所有帧都使用128个三重态类注释。此外,我们提出了一种直接从视频数据中识别这些三胞胎的方法。它依赖一个称为类激活指南(CAG)的模块,该模块使用仪器激活图指导动词和目标识别。为了模拟同一帧中多个三胞胎的识别,我们还提出了一个可训练的3D相互作用空间,该空间捕获了三胞胎组件之间的关联。最后,我们通过几项消融研究表明了这些贡献的重要性,并与Cholect40的基准进行了比较。
Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity. To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes. Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called Class Activation Guide (CAG), which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D Interaction Space, which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40.