论文标题
使用深度学习技术Artina的RAW NMR光谱的快速蛋白质分配和结构
Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA
论文作者
论文摘要
核磁共振(NMR)光谱是结构生物学的主要技术之一,蛋白质数据库中沉积了11,800多个蛋白质结构。 NMR可以阐明溶液,活细胞和固体中中小型蛋白质的结构和动力学,但受乏味的数据分析过程的限制。通常,它需要训练有素的专家进行数周或数月的手动工作,以将NMR测量变成蛋白质结构。此过程的自动化是一个空旷的问题,在30年前在该领域中提出。在这里,我们提出了解决这一挑战的解决方案,该解决方案可以在完成测量后几小时内对蛋白质NMR数据进行完全自动化的分析。仅使用NMR光谱和蛋白质序列作为输入,我们的基于机器学习的方法,ARTINA,即可在无需任何人类干预的情况下提供信号位置,共振分配和结构。 Artina在包含1329个多维NMR光谱的100个蛋白基准测试中测试,证明了其以1.44Å中位RMSD求解PDB参考的结构的能力,并确定了91.36%正确的NMR共振分配。非专家可以使用Artina,从而减少了NMR对蛋白质分配或结构确定的努力,而NMR基本上是在样品的制备和光谱测量的基本上进行的。
Nuclear Magnetic Resonance (NMR) spectroscopy is one of the major techniques in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. Here, we present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without any human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 Å median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.