几乎没有针对原位大分子结构分类的射击域的适应

论文标题

几乎没有针对原位大分子结构分类的射击域的适应

Few shot domain adaptation for in situ macromolecule structural classification in cryo-electron tomograms

论文作者

Yu, Liangyong, Li, Ran, Zeng, Xiangrui, Wang, Hongyi, Jin, Jie, Yang, Ge, Jiang, Rui, Xu, Min

论文摘要

动机：冷冻电子层析成像（Cryo-ET）可视化大分子的结构和空间组织及其与在近分子分辨率下接近本地状态下单个细胞内部单元内其他亚细胞组件的相互作用。此类信息对于准确理解细胞过程至关重要。但是，由于成像限制和数据量，由于成像限制和数据量，次要图分类仍然是对低温分子结构的系统识别和恢复的主要挑战之一。最近，深度学习显着提高了大规模亚图分类的吞吐量和准确性。然而，由于标签的巨大费用，通常很难获得足够的高质量注释的小图数据来进行监督培训。为了解决这个问题，利用另一个已经注释的数据集来协助培训过程是有益的。但是，由于源域和目标域之间的图像强度分布的差异，在源域中的子图表上进行了训练的模型在预测目标域中的亚图表类方面的表现较差。结果：在本文中，我们适应了一些基于深度学习的跨域子图形分类的射击域适应方法。我们方法的基本思想包括两个部分：1）充分利用了丰富的未标记目标域数据的分布，以及2）利用整个源域数据集与很少标记的目标域数据之间的相关性。在模拟和真实数据集上进行的实验表明，与基线方法相比，我们的方法对跨域的亚图分类有了显着改善。

Motivation: Cryo-Electron Tomography (cryo-ET) visualizes structure and spatial organization of macromolecules and their interactions with other subcellular components inside single cells in the close-to-native state at sub-molecular resolution. Such information is critical for the accurate understanding of cellular processes. However, subtomogram classification remains one of the major challenges for the systematic recognition and recovery of the macromolecule structures in cryo-ET because of imaging limits and data quantity. Recently, deep learning has significantly improved the throughput and accuracy of large-scale subtomogram classification. However often it is difficult to get enough high-quality annotated subtomogram data for supervised training due to the enormous expense of labeling. To tackle this problem, it is beneficial to utilize another already annotated dataset to assist the training process. However, due to the discrepancy of image intensity distribution between source domain and target domain, the model trained on subtomograms in source domainmay perform poorly in predicting subtomogram classes in the target domain. Results: In this paper, we adapt a few shot domain adaptation method for deep learning based cross-domain subtomogram classification. The essential idea of our method consists of two parts: 1) take full advantage of the distribution of plentiful unlabeled target domain data, and 2) exploit the correlation between the whole source domain dataset and few labeled target domain data. Experiments conducted on simulated and real datasets show that our method achieves significant improvement on cross domain subtomogram classification compared with baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题