学习使用增强学习来浏览合成的化学空间

论文标题

学习使用增强学习来浏览合成的化学空间

Learning To Navigate The Synthetically Accessible Chemical Space Using Reinforcement Learning

论文作者

Gottipati, Sai Krishna, Sattarov, Boris, Niu, Sufeng, Pathak, Yashaswi, Wei, Haoran, Liu, Shengchao, Thomas, Karam M. J., Blackburn, Simon, Coley, Connor W., Tang, Jian, Chandar, Sarath, Bengio, Yoshua

论文摘要

在过去的十年中，从头毒品设计的机器学习领域取得了重大进展，尤其是在深层生成模型中。但是，当前的生成方法表现出巨大的挑战，因为它们不能确保所提出的分子结构可以被可行地合成，也不能提供所提出的小分子的合成途径，从而严重限制了它们的实际适用性。在这项工作中，我们提出了一个新颖的前瞻性综合框架，该框架由增强学习（RL）用于从头毒品设计，远期合成的政策梯度（PGFS），该框架通过将合成可及性的概念直接嵌入到从头开始药物设计系统中来解决这一挑战。在此设置中，代理商学会了通过在迭代虚拟虚拟多步合成过程的每个时间步骤中对有效的化学反应进行市售的小分子构建块来浏览巨大的合成化学空间。拟议的药物发现环境为RL算法提供了高度挑战性的测试床，这是由于较大的状态空间和具有分层作用的高维连续作用空间。 PGFS在产生高QED和受到惩罚的Clogp的结构方面取得了最先进的表现。此外，我们在与三个艾滋病毒靶标相关的近亲概念验证中验证了PGF。最后，我们描述了这项研究中概念化的端到端训练如何在从根本上扩展可合成的化学空间并自动化药物发现过程时代表一个重要的范式。

Over the last decade, there has been significant progress in the field of machine learning for de novo drug design, particularly in deep generative models. However, current generative approaches exhibit a significant challenge as they do not ensure that the proposed molecular structures can be feasibly synthesized nor do they provide the synthesis routes of the proposed small molecules, thereby seriously limiting their practical applicability. In this work, we propose a novel forward synthesis framework powered by reinforcement learning (RL) for de novo drug design, Policy Gradient for Forward Synthesis (PGFS), that addresses this challenge by embedding the concept of synthetic accessibility directly into the de novo drug design system. In this setup, the agent learns to navigate through the immense synthetically accessible chemical space by subjecting commercially available small molecule building blocks to valid chemical reactions at every time step of the iterative virtual multi-step synthesis process. The proposed environment for drug discovery provides a highly challenging test-bed for RL algorithms owing to the large state space and high-dimensional continuous action space with hierarchical actions. PGFS achieves state-of-the-art performance in generating structures with high QED and penalized clogP. Moreover, we validate PGFS in an in-silico proof-of-concept associated with three HIV targets. Finally, we describe how the end-to-end training conceptualized in this study represents an important paradigm in radically expanding the synthesizable chemical space and automating the drug discovery process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题