论文标题
多级分类的真正无序的概率规则集
Truly Unordered Probabilistic Rule Sets for Multi-class Classification
论文作者
论文摘要
长期以来,研究了规则集学习,并且由于需要可解释的模型,最近经常经常重新审视。尽管如此,现有方法仍有几个缺点:1)最新方法需要二进制特征矩阵作为输入,而直接从数字变量中学习规则; 2)现有方法在规则之间施加命令,无论是明确或隐式而损害解释性的; 3)当前,对于多级目标变量学习概率规则集尚无方法(概率规则列表只有一种)。 我们提出了TUR,以解决真正无序的规则集,以解决这些缺点。我们首先将学习真正无序规则集的问题形式化。为了解决由重叠规则引起的冲突,即多个规则所涵盖的实例,我们提出了一种新颖的方法来利用我们规则集的概率属性。接下来,我们开发了一种两相启发式算法,该算法通过精心发展的规则来学习规则。一个重要的创新是,我们使用替代分数在学习地方规则时考虑了规则的全球潜力。 最后,我们从经验上证明,与非稳定和(明确或隐式)有序的最新方法相比,我们的方法学习了规则集,这些规则集不仅具有更好的解释性,而且还具有更好的预测性能。
Rule set learning has long been studied and has recently been frequently revisited due to the need for interpretable models. Still, existing methods have several shortcomings: 1) most recent methods require a binary feature matrix as input, while learning rules directly from numeric variables is understudied; 2) existing methods impose orders among rules, either explicitly or implicitly, which harms interpretability; and 3) currently no method exists for learning probabilistic rule sets for multi-class target variables (there is only one for probabilistic rule lists). We propose TURS, for Truly Unordered Rule Sets, which addresses these shortcomings. We first formalize the problem of learning truly unordered rule sets. To resolve conflicts caused by overlapping rules, i.e., instances covered by multiple rules, we propose a novel approach that exploits the probabilistic properties of our rule sets. We next develop a two-phase heuristic algorithm that learns rule sets by carefully growing rules. An important innovation is that we use a surrogate score to take the global potential of the rule set into account when learning a local rule. Finally, we empirically demonstrate that, compared to non-probabilistic and (explicitly or implicitly) ordered state-of-the-art methods, our method learns rule sets that not only have better interpretability but also better predictive performance.