论文标题
使用基于树的方法利用分类结构
Exploiting Categorical Structure Using Tree-Based Methods
论文作者
论文摘要
使用分类变量作为预测因子的标准方法要么赋予它们序列结构,要么假定它们根本没有结构。但是,分类变量通常具有比线性排序可以捕获的结构更复杂的结构。我们开发了一个数学框架来表示分类变量的结构,并展示了如何推广决策树以利用该结构。这种方法适用于诸如使用决策树作为基础学习者的梯度增强树之类的方法。我们显示了天气数据的结果,以证明这种方法所产生的改进。
Standard methods of using categorical variables as predictors either endow them with an ordinal structure or assume they have no structure at all. However, categorical variables often possess structure that is more complicated than a linear ordering can capture. We develop a mathematical framework for representing the structure of categorical variables and show how to generalize decision trees to make use of this structure. This approach is applicable to methods such as Gradient Boosted Trees which use a decision tree as the underlying learner. We show results on weather data to demonstrate the improvement yielded by this approach.