论文标题
随机阈值模型树:一种基于树的集合方法,用于处理外推
Stochastic Threshold Model Trees: A Tree-Based Ensemble Method for Dealing with Extrapolation
论文作者
论文摘要
在化学领域,已经有许多尝试从使用机器学习构建的统计模型中预测未知化合物的特性。在存在许多已知化合物(插值区域)的区域中,可以构建准确的模型。相比之下,通常难以预测没有已知化合物(外推区域)的区域的数据。但是,在开发新材料时,希望搜索此外推区域并发现具有前所未有的物理特性的化合物。在本文中,我们提出了随机阈值模型树(STMT),这是一种推断方法,反映了数据的趋势,同时保持了常规插值方法的准确性。通过使用人工数据和真实数据的实验证实了STMT的行为。在实际数据的情况下,尽管准确性没有显着的总体提高,但有一种化合物的预测准确性显着提高,这表明STMT反映了外推区域的数据趋势。我们认为,所提出的方法将在新的材料开发等情况下有助于更有效的搜索。
In the field of chemistry, there have been many attempts to predict the properties of unknown compounds from statistical models constructed using machine learning. In an area where many known compounds are present (the interpolation area), an accurate model can be constructed. In contrast, data in areas where there are no known compounds (the extrapolation area) are generally difficult to predict. However, in the development of new materials, it is desirable to search this extrapolation area and discover compounds with unprecedented physical properties. In this paper, we propose Stochastic Threshold Model Trees (STMT), an extrapolation method that reflects the trend of the data, while maintaining the accuracy of conventional interpolation methods. The behavior of STMT is confirmed through experiments using both artificial and real data. In the case of the real data, although there is no significant overall improvement in accuracy, there is one compound for which the prediction accuracy is notably improved, suggesting that STMT reflects the data trends in the extrapolation area. We believe that the proposed method will contribute to more efficient searches in situations such as new material development.