监督学习中降解问题的概率诊断测试

论文标题

监督学习中降解问题的概率诊断测试

Probabilistic Diagnostic Tests for Degradation Problems in Supervised Learning

论文作者

Valencia-Zapata, Gustavo A., Gonzalez-Canas, Carolina, Zentner, Michael G., Ersoy, Okan, Klimeck, Gerhard

论文摘要

几项研究指出，监督机器学习中性能退化的不同原因。诸如分类不平衡，重叠，小单击，嘈杂标签和稀疏度限制分类算法的问题。即使采用方法的形式或算法的多种方法试图最大程度地减少性能降解，但它们在有限的范围内是孤立的。这些方法中的大多数都集中在众多问题中的补救措施中，实验结果来自很少的数据集和分类算法，预测能力的度量不足以及缺乏统计验证来测试所提出方法的真正好处。本文由两个主要部分组成：在第一部分中，提出了一个基于识别每个问题的体征和症状的新型概率诊断模型。因此，为了选择最方便的修复处理，还可以选择公正的性能指标，可以解决这些问题的早期正确诊断。其次，当训练集存在此类问题时，研究了几种监督算法的行为和性能。因此，可以在分类器之间估计治疗成功的预测。

Several studies point out different causes of performance degradation in supervised machine learning. Problems such as class imbalance, overlapping, small-disjuncts, noisy labels, and sparseness limit accuracy in classification algorithms. Even though a number of approaches either in the form of a methodology or an algorithm try to minimize performance degradation, they have been isolated efforts with limited scope. Most of these approaches focus on remediation of one among many problems, with experimental results coming from few datasets and classification algorithms, insufficient measures of prediction power, and lack of statistical validation for testing the real benefit of the proposed approach. This paper consists of two main parts: In the first part, a novel probabilistic diagnostic model based on identifying signs and symptoms of each problem is presented. Thereby, early and correct diagnosis of these problems is to be achieved in order to select not only the most convenient remediation treatment but also unbiased performance metrics. Secondly, the behavior and performance of several supervised algorithms are studied when training sets have such problems. Therefore, prediction of success for treatments can be estimated across classifiers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题