随机步行和基质分解的多个相似性药物目标相互作用预测

论文标题

随机步行和基质分解的多个相似性药物目标相互作用预测

Multiple Similarity Drug-Target Interaction Prediction with Random Walks and Matrix Factorization

论文作者

Liu, Bin, Papadopoulos, Dimitrios, Malliaros, Fragkiskos D., Tsoumakas, Grigorios, Papadopoulos, Apostolos N.

论文摘要

药物目标相互作用（DTI）的发现是一个非常有前途的研究领域，具有巨大的潜力。通过计算方法准确鉴定药物和蛋白质之间可靠的相互作用，这些方法通常利用从不同数据源检索到的异质信息，可以增强有效药物的发展。尽管随机行走和基质分解技术在DTI预测中广泛使用，但它们有几个局限性。通常以无监督的方式进行基于步行的嵌入生成，而矩阵分解中的线性相似性组合会扭曲不同视图提供的单个见解。为了解决这些问题，我们采用多层网络方法来处理多样化的药物和靶向相似性，并提出了一个新颖的优化框架，称为多重相似性深走矩阵分解（MDMF），以进行DTI预测。该框架统一了嵌入的产生和相互作用预测，药物的学习向量表示以及目标不仅保留所有超层和特定层特异性局部不变性的高阶近端，而且还可以近似与其内部产品的相互作用。此外，我们开发了一种集成方法（MDMF2A），该方法集成了MDMF模型的两个实例化，优化了Precision-Recall曲线（AUPR）和接收器工作特性曲线（AUC）下的面积。关于现实世界DTI数据集的实证研究表明，我们的方法在四种不同的环境中对当前最新方法的统计学显着改善。此外，对高度排名的非相互作用对的验证也证明了MDMF2A发现新型DTI的潜力。

The discovery of drug-target interactions (DTIs) is a very promising area of research with great potential. The accurate identification of reliable interactions among drugs and proteins via computational methods, which typically leverage heterogeneous information retrieved from diverse data sources, can boost the development of effective pharmaceuticals. Although random walk and matrix factorization techniques are widely used in DTI prediction, they have several limitations. Random walk-based embedding generation is usually conducted in an unsupervised manner, while the linear similarity combination in matrix factorization distorts individual insights offered by different views. To tackle these issues, we take a multi-layered network approach to handle diverse drug and target similarities, and propose a novel optimization framework, called Multiple similarity DeepWalk-based Matrix Factorization (MDMF), for DTI prediction. The framework unifies embedding generation and interaction prediction, learning vector representations of drugs and targets that not only retain higher-order proximity across all hyper-layers and layer-specific local invariance, but also approximate the interactions with their inner product. Furthermore, we develop an ensemble method (MDMF2A) that integrates two instantiations of the MDMF model, optimizing the area under the precision-recall curve (AUPR) and the area under the receiver operating characteristic curve (AUC) respectively. The empirical study on real-world DTI datasets shows that our method achieves statistically significant improvement over current state-of-the-art approaches in four different settings. Moreover, the validation of highly ranked non-interacting pairs also demonstrates the potential of MDMF2A to discover novel DTIs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题