论文标题
实验性偏见下的化学性质预测
Chemical Property Prediction Under Experimental Biases
论文作者
论文摘要
预测化合物的化学特性对于发现具有特定所需特征的新型材料和药物至关重要。机器学习技术的最新进展已从文献中报道的过去的实验数据中实现了自动预测建模。但是,由于各种原因,例如实验计划和出版决策,这些数据集通常会偏见,而使用此类偏见数据集训练的预测模型通常会过度适应偏见的分布,并且在随后的用途上表现较差。因此,这项研究的重点是减轻实验数据集中的偏差。我们从因果推理中采用了两种技术,并结合了可以代表分子结构的图神经网络。实验结果在四个可能的偏置方案中表明,基于反倾向评分的方法和基于反事实回归的方法进行了良好的改进。
Predicting the chemical properties of compounds is crucial in discovering novel materials and drugs with specific desired characteristics. Recent significant advances in machine learning technologies have enabled automatic predictive modeling from past experimental data reported in the literature. However, these datasets are often biased because of various reasons, such as experimental plans and publication decisions, and the prediction models trained using such biased datasets often suffer from over-fitting to the biased distributions and perform poorly on subsequent uses. Hence, this study focused on mitigating bias in the experimental datasets. We adopted two techniques from causal inference combined with graph neural networks that can represent molecular structures. The experimental results in four possible bias scenarios indicated that the inverse propensity scoring-based method and the counter-factual regression-based method made solid improvements.