论文标题

转聚合物:基于变压器的聚合物属性预测的语言模型

TransPolymer: a Transformer-based language model for polymer property predictions

论文作者

Xu, Changwen, Wang, Yuyang, Farimani, Amir Barati

论文摘要

在聚合物设计中,准确有效地预测聚合物性质具有重要意义。通常,需要进行昂贵且耗时的实验或模拟来评估聚合物功能。最近,配备了自我发挥机制的变压器模型在自然语言处理方面表现出色。但是,这种方法尚未在聚合物科学中进行研究。在此,我们报告了TransPolymer,这是一种基于变压器的语言模型,用于聚合物属性预测。我们提出的具有化学意识的聚合物令牌可以从聚合物序列中进行学习表示。对十个聚合物性能预测基准的严格实验证明了转染剂的出色性能。此外,我们表明,通过蒙版语言建模在大型未标记数据集上预处理的转染剂受益。实验结果进一步体现了自我注意力在建模聚合物序列中的重要作用。我们强调该模型是一种有前途的计算工具,用于促进从数据科学观点中促进理性聚合物设计并理解结构 - 质谱关系。

Accurate and efficient prediction of polymer properties is of great significance in polymer design. Conventionally, expensive and time-consuming experiments or simulations are required to evaluate polymer functions. Recently, Transformer models, equipped with self-attention mechanisms, have exhibited superior performance in natural language processing. However, such methods have not been investigated in polymer sciences. Herein, we report TransPolymer, a Transformer-based language model for polymer property prediction. Our proposed polymer tokenizer with chemical awareness enables learning representations from polymer sequences. Rigorous experiments on ten polymer property prediction benchmarks demonstrate the superior performance of TransPolymer. Moreover, we show that TransPolymer benefits from pretraining on large unlabeled dataset via Masked Language Modeling. Experimental results further manifest the important role of self-attention in modeling polymer sequences. We highlight this model as a promising computational tool for promoting rational polymer design and understanding structure-property relationships from a data science view.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源