论文标题
在回归和分类树中执行单调约束的更好方法
A better method to enforce monotonic constraints in regression and classification trees
论文作者
论文摘要
在本报告中,我们介绍了在回归和分类树中执行单调约束的两种新方法。一个比当前的LightGBM产生更好的结果,并且具有相似的计算时间。另一个产生更好的结果,但比当前的LightGBM慢得多。我们还提出了一种启发式,该启发式方法可以考虑到通过选择单调分裂相对于其即时增益而贪婪地拆分树的,这远非最佳。然后,我们将结果与使用众所周知的成人公共数据集在LightGBM库中当前的约束实施进行了比较。在整个报告中,我们主要关注我们为LightGBM库制定的方法的实施,即使它们是一般的,并且可以在任何回归或分类树中实施。我们提出的最佳方法(将树拆分结合到单调分裂的惩罚的更聪明的方法)始终击败LightGBM的当前实现。在训练的早期阶段,小树或平均树木的减少可能高达1%,而成人数据集的损失峰值下降到约0.1%。较大的树木的结果甚至会更好。在我们的实验中,我们没有对正则化参数进行很多调整,而且我们不会惊讶地看到增加方法在测试集上的性能。
In this report we present two new ways of enforcing monotone constraints in regression and classification trees. One yields better results than the current LightGBM, and has a similar computation time. The other one yields even better results, but is much slower than the current LightGBM. We also propose a heuristic that takes into account that greedily splitting a tree by choosing a monotone split with respect to its immediate gain is far from optimal. Then, we compare the results with the current implementation of the constraints in the LightGBM library, using the well known Adult public dataset. Throughout the report, we mostly focus on the implementation of our methods that we made for the LightGBM library, even though they are general and could be implemented in any regression or classification tree. The best method we propose (a smarter way to split the tree coupled to a penalization of monotone splits) consistently beats the current implementation of LightGBM. With small or average trees, the loss reduction can be as high as 1% in the early stages of training and decreases to around 0.1% at the loss peak for the Adult dataset. The results would be even better with larger trees. In our experiments, we didn't do a lot of tuning of the regularization parameters, and we wouldn't be surprised to see that increasing the performance of our methods on test sets.