论文标题
多孔材料中的大数据科学:材料基因组学和机器学习
Big-Data Science in Porous Materials: Materials Genomics and Machine Learning
论文作者
论文摘要
通过将金属节点与有机连接器相结合,我们可以潜在地合成数百万个可能的金属有机框架(MOF)。目前,我们拥有一千多个合成材料的文库和数百万个硅内预测的材料。我们拥有这么多材料的事实开放了许多令人兴奋的途径来量身定制,这是一种最适合给定应用的材料。但是,从实验和计算的角度来看,我们只有太多的材料来使用蛮力技术进行筛选。在这篇综述中,我们表明拥有如此多的材料使我们能够使用大数据方法作为一种强大的技术来研究这些材料并发现复杂的相关性。评论的第一部分介绍了大数据科学原理。我们强调数据收集的重要性,增加小数据集的方法,如何选择适当的培训集。本综述的一个重要部分是用于在特征空间中代表这些材料的不同方法。该评论还包括对不同ML技术的一般概述,但是由于多数材料中的大多数应用程序都使用了监督ML,因此我们的评论集中在监督ML的不同方法上。特别是,我们回顾了优化ML过程以及如何量化不同方法的性能的不同方法。在第二部分中,我们回顾了如何将ML的不同方法应用于多孔材料。特别是,我们讨论了在气体存储和分离领域的应用,这些材料的稳定性,它们的电子特性及其合成。主题范围说明了可以使用大数据科学研究的各种主题。鉴于科学界对ML的兴趣越来越大,我们预计该列表将在未来几年迅速扩大。
By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.