论文标题
固有的 - 超支卷积和在3D蛋白结构上学习的合并
Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures
论文作者
论文摘要
蛋白质在生物体中发挥了多种功能,因此在生物学中起着关键作用。截至目前,可用的学习算法处理蛋白质数据并未考虑此类数据的几个特殊性和/或对于大蛋白质构象的扩展不佳。为了填补这一空白,我们提出了两个新的学习操作,可以对大规模蛋白质数据进行深入的3D分析。首先,我们介绍了一个新型的卷积操作员,该操作员认为,通过使用在欧几里得距离上定义的$ n $ d卷积,以及在欧几里得距离上定义的$ n $ d卷积,以及在多仪器中原子之间的多个地理距离,都考虑了固有的(在蛋白质折叠下)和外在的(在键合)结构下不变。其次,我们通过引入层次合并操作员来启用多尺度蛋白质分析,并利用蛋白质是有限氨基酸的重组的事实,可以使用共享的池矩阵进行汇总。最后,我们在几个大规模数据集中评估了算法的准确性,用于公共蛋白质分析任务,在此方面,我们表现优于最先进的方法。
Proteins perform a large variety of functions in living organisms, thus playing a key role in biology. As of now, available learning algorithms to process protein data do not consider several particularities of such data and/or do not scale well for large protein conformations. To fill this gap, we propose two new learning operations enabling deep 3D analysis of large-scale protein data. First, we introduce a novel convolution operator which considers both, the intrinsic (invariant under protein folding) as well as extrinsic (invariant under bonding) structure, by using $n$-D convolutions defined on both the Euclidean distance, as well as multiple geodesic distances between atoms in a multi-graph. Second, we enable a multi-scale protein analysis by introducing hierarchical pooling operators, exploiting the fact that proteins are a recombination of a finite set of amino acids, which can be pooled using shared pooling matrices. Lastly, we evaluate the accuracy of our algorithms on several large-scale data sets for common protein analysis tasks, where we outperform state-of-the-art methods.