论文标题

PDB测序数据上蛋白质家庭分类的深度学习方法

Deep Learning Methods for Protein Family Classification on PDB Sequencing Data

论文作者

Wang, Aaron

论文摘要

由氨基酸链组成的氨基酸链会影响它们的折叠方式并因此决定其功能和特征,蛋白质是一类大分子,在主要生物过程中起着核心作用,并且是人体组织的结构,功能和调节所必需的。了解蛋白质功能对于治疗和精确医学的发展至关重要,因此可以根据可测量特征对蛋白质进行分类及其功能至关重要。实际上,从其主要结构的氨基酸序列中对蛋白质特性的自动推断仍然是生物信息学领域中重要的开放问题,尤其是考虑到测序技术的最新进展以及已知的已知但未分类的蛋白质,具有未知特性。在这项工作中,我们演示并比较了几个深度学习框架的性能,包括新型双向LSTM和卷积模型,从蛋白质数据库(PDB)的广泛可用的测序数据进行了研究合作,用于结构生物信息知识的研究合作(RCSB)(RCSB),以及在包括Classical Machine Leartiver的跨越跨越跨度的跨越跨越邻居的范围内,并将其基于跨性别的范围。 数据。我们的结果表明,我们的深度学习模型为古典机器学习方法提供了卓越的性能,卷积体系结构提供了最令人印象深刻的推理性能。

Composed of amino acid chains that influence how they fold and thus dictating their function and features, proteins are a class of macromolecules that play a central role in major biological processes and are required for the structure, function, and regulation of the body's tissues. Understanding protein functions is vital to the development of therapeutics and precision medicine, and hence the ability to classify proteins and their functions based on measurable features is crucial; indeed, the automatic inference of a protein's properties from its sequence of amino acids, known as its primary structure, remains an important open problem within the field of bioinformatics, especially given the recent advancements in sequencing technologies and the extensive number of known but uncategorized proteins with unknown properties. In this work, we demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data from the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics (RCSB), as well as benchmark this performance against classical machine learning approaches, including k-nearest neighbors and multinomial regression classifiers, trained on experimental data. Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源