论文标题
解剖手术计算机视觉的自学学习方法
Dissecting Self-Supervised Learning Methods for Surgical Computer Vision
论文作者
论文摘要
近年来,随着深度神经网络方法的普及,手术计算机视觉领域经历了相当大的突破。但是,用于培训的标准全面监督方法需要大量的带注释的数据,从而实现高昂的成本;特别是在临床领域。已经开始在一般计算机视觉社区中获得吸引力的自我监督学习(SSL)方法代表了这些注释成本的潜在解决方案,从而使仅从未标记的数据中学习有用的表示。尽管如此,SSL方法在更复杂和有影响力的领域(例如医学和手术)中的有效性仍然有限且未开发。在这项工作中,我们通过在手术计算机视觉的背景下研究了四种最先进的SSL方法(Moco V2,Simclr,Dino,SWAV)来满足这一关键需求。我们对这些方法在cholec80数据集上的性能进行了广泛的分析,以在手术环境中的两个基本和流行的任务理解,相位识别和工具存在检测中进行。我们检查了它们的参数化,然后在半监督设置中相对于训练数据数量的行为。如本工作所述和进行的那样,将这些方法正确转移到手术中,可以使SSL的一般用途获得可观的绩效增长 - 相位识别率高达7.4%,在工具存在检测方面以及20%的刀具 - 最高的半监测相位识别方法最高可达14%。在高度多样化的手术数据集中获得的进一步结果表现出强大的泛化特性。该代码可在https://github.com/camma-public/selfsupsurg上找到。
The field of surgical computer vision has undergone considerable breakthroughs in recent years with the rising popularity of deep neural network-based methods. However, standard fully-supervised approaches for training such models require vast amounts of annotated data, imposing a prohibitively high cost; especially in the clinical domain. Self-Supervised Learning (SSL) methods, which have begun to gain traction in the general computer vision community, represent a potential solution to these annotation costs, allowing to learn useful representations from only unlabeled data. Still, the effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored. In this work, we address this critical need by investigating four state-of-the-art SSL methods (MoCo v2, SimCLR, DINO, SwAV) in the context of surgical computer vision. We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection. We examine their parameterization, then their behavior with respect to training data quantities in semi-supervised settings. Correct transfer of these methods to surgery, as described and conducted in this work, leads to substantial performance gains over generic uses of SSL - up to 7.4% on phase recognition and 20% on tool presence detection - as well as state-of-the-art semi-supervised phase recognition approaches by up to 14%. Further results obtained on a highly diverse selection of surgical datasets exhibit strong generalization properties. The code is available at https://github.com/CAMMA-public/SelfSupSurg.