PS-DEVCEM：基于弱标记数据的视频胶囊内窥镜检查的病理敏感深度学习模型

论文标题

PS-DEVCEM：基于弱标记数据的视频胶囊内窥镜检查的病理敏感深度学习模型

PS-DeVCEM: Pathology-sensitive deep learning model for video capsule endoscopy based on weakly labeled data

论文作者

Mohammed, A., Farup, I., Pedersen, M., Yildirim, S., Hovde, Ø

论文摘要

我们提出了一种新型的病理敏感深度学习模型（PS-DEVCEM），用于在视频胶囊内窥镜（VCE）数据中对不同结肠疾病的框架级异常检测和多标签分类。我们提出的模型能够应对由几种类型的疾病引起的结肠明显异质性的关键挑战。我们的模型是由基于注意力的深层实例学习驱动的，并且是使用视频标签而不是详细的逐帧注释对弱标记数据进行训练的端到端训练。空间和时间特征分别通过RESNET50和残留的长期记忆（残留LSTM）块获得。此外，学习的时间注意模块还提供了每个帧对最终标签预测的重要性。此外，我们开发了一种自学方法，以最大程度地提高病理类别之间的距离。我们通过定性和定量实验证明，与三种最先进的视频分析方法相比，我们提出的弱监督学习模型可提供较高的精度和F1得分达到61.6％和55.1％。我们还展示了模型在训练过程中没有框架注释信息的时间内将框架定位为病理的框架的能力。此外，我们仅使用视频标签收集并注释了第一个也是最大的VCE数据集。该数据集包含455个简短的视频片段，其中包含28,304帧和14类结肠直肠疾病和人工制品。支持该出版物的数据集和代码将在我们的主页上提供。

We propose a novel pathology-sensitive deep learning model (PS-DeVCEM) for frame-level anomaly detection and multi-label classification of different colon diseases in video capsule endoscopy (VCE) data. Our proposed model is capable of coping with the key challenge of colon apparent heterogeneity caused by several types of diseases. Our model is driven by attention-based deep multiple instance learning and is trained end-to-end on weakly labeled data using video labels instead of detailed frame-by-frame annotation. The spatial and temporal features are obtained through ResNet50 and residual Long short-term memory (residual LSTM) blocks, respectively. Additionally, the learned temporal attention module provides the importance of each frame to the final label prediction. Moreover, we developed a self-supervision method to maximize the distance between classes of pathologies. We demonstrate through qualitative and quantitative experiments that our proposed weakly supervised learning model gives superior precision and F1-score reaching, 61.6% and 55.1%, as compared to three state-of-the-art video analysis methods respectively. We also show our model's ability to temporally localize frames with pathologies, without frame annotation information during training. Furthermore, we collected and annotated the first and largest VCE dataset with only video labels. The dataset contains 455 short video segments with 28,304 frames and 14 classes of colorectal diseases and artifacts. Dataset and code supporting this publication will be made available on our home page.

下载PDF全文

下载文献需遵守相关版权规定

论文标题