变分策略梯度应用于原子级材料合成

论文标题

变分策略梯度应用于原子级材料合成

Application of variational policy gradient to atomic-scale materials synthesis

论文作者

Liu, Siyan, Borodinov, Nikolay, Vlcek, Lukas, Lu, Dan, Laanait, Nouamane, Vasudevan, Rama K.

论文摘要

通过层沉积技术合成原子尺度材料，为控制材料结构和产量系统提供了独特的机会，该系统显示出独特的功能性能，这些功能性能无法使用传统的散装合成路线稳定。但是，沉积过程本身提出了一个庞大的多维空间，传统上通过直觉和反复试验优化，从而减慢了进度。在这里，我们使用Stein变分策略梯度（SVPG）方法介绍了深入增强学习对模拟材料合成问题的应用，以训练多个代理，以优化随机策略以产生所需的功能性能。我们的贡献是（1）用于分层材料合成问题的完全开源的仿真环境，利用动力学蒙特卡洛引擎并在OpenAI健身框架中实现，（2）Stein变异策略梯度方法扩展与图像和表格输入的处理，以及（3）使用Horovg的Sim and Rortial Onoves，syvpg gp， CPU。我们证明了这种方法在优化材料表面特征，表面粗糙度的实用性，并与传统的参与者 - 批评（A2C）基线相比，探索了代理使用的策略。此外，我们发现SVPG稳定了传统A2C的训练过程。如果解决了实施挑战，那么这种训练的剂对于各种原子尺度沉积技术，包括脉冲激光沉积和分子束外延可能有用。

Atomic-scale materials synthesis via layer deposition techniques present a unique opportunity to control material structures and yield systems that display unique functional properties that cannot be stabilized using traditional bulk synthetic routes. However, the deposition process itself presents a large, multidimensional space that is traditionally optimized via intuition and trial and error, slowing down progress. Here, we present an application of deep reinforcement learning to a simulated materials synthesis problem, utilizing the Stein variational policy gradient (SVPG) approach to train multiple agents to optimize a stochastic policy to yield desired functional properties. Our contributions are (1) A fully open source simulation environment for layered materials synthesis problems, utilizing a kinetic Monte-Carlo engine and implemented in the OpenAI Gym framework, (2) Extension of the Stein variational policy gradient approach to deal with both image and tabular input, and (3) Developing a parallel (synchronous) implementation of SVPG using Horovod, distributing multiple agents across GPUs and individual simulation environments on CPUs. We demonstrate the utility of this approach in optimizing for a material surface characteristic, surface roughness, and explore the strategies used by the agents as compared with a traditional actor-critic (A2C) baseline. Further, we find that SVPG stabilizes the training process over traditional A2C. Such trained agents can be useful to a variety of atomic-scale deposition techniques, including pulsed laser deposition and molecular beam epitaxy, if the implementation challenges are addressed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题