使用不同的机器学习方法对杆的质量和年龄确定

论文标题

使用不同的机器学习方法对杆的质量和年龄确定

Mass and Age determination of the LAMOST data with different Machine Learning methods

论文作者

Li, Qi-Da, Wang, Hai-Feng, Luo, Yang-Ping, Li, Qing, Deng, Li-Cai, Ting, Yuan-Sen

论文摘要

我们介绍了带有质量标签的948,216颗恒星的目录，并同时提供带有质量和年龄标签的163,105颗红色团块（RC）恒星目录。训练数据集与Lamost（大型天空区域多对象纤维光谱望远镜）进行交叉匹配，而高分辨率的Asterosology数据，质量和年龄是通过随机森林方法或凸赫尔算法预测的。提取与质量和年龄高相关的恒星参数，并且测试数据集表明，大样本质量的预测模型的中位相对误差为3 \％，同时，红色团恒星的质量和年龄为4 \％和7 \％。我们还比较了红色团恒星的预测年龄与最近的作品，发现RC样品的最终不确定性可能达到18％的年龄，而质量为9 \％，同时，大型样本的最终精度可以达到不同类型的恒星可能达到13 \％的质量，而无需考虑这些方法即将在未来使用这些方法。此外，我们探讨了样本的不同机器学习方法的性能，包括贝叶斯线性回归（BYS），梯度增强决策树（GBDT），多层式PESCEPTRON（MLP），多线性回归（MLR），随机森林（RF）（RF）（RF）和支持载体回归（SVR）。最后，我们发现非线性模型的性能通常优于线性模型，而GBDT和RF方法相对较好。

We present a catalog of 948,216 stars with mass label and a catalog of 163,105 red clump (RC) stars with mass and age labels simultaneously. The training dataset is cross matched from the LAMOST (The Large Sky Area Multi-Object Fiber Spectroscopic Telescope) DR5 and high resolution asteroseismology data, mass and age are predicted by random forest method or convex hull algorithm. The stellar parameters with high correlation with mass and age are extracted and the test dataset shows that the median relative error of the prediction model for the mass of large sample is 3\% and meanwhile, the mass and age of red clump stars are 4\% and 7\%. We also compare the predicted age of red clump stars with the recent works and find that the final uncertainty of the RC sample could reach 18\% for age and 9\% for mass, in the meantime, final precision of the mass for large sample with different type of stars could reach 13\% without considering systematics, all these are implying that this method could be widely used in the future. Moreover, we explore the performance of different machine learning methods for our sample, including bayesian linear regression (BYS), gradient boosting decision Tree (GBDT), multilayer perceptron (MLP), multiple linear regression (MLR), random forest (RF) and support vector regression (SVR). Finally we find that the performance of nonlinear model is generally better than that of linear model, and the GBDT and RF methods are relatively better.

下载PDF全文

下载文献需遵守相关版权规定

论文标题