重新思考和重新计算机器学习模型的价值

论文标题

重新思考和重新计算机器学习模型的价值

Rethinking and Recomputing the Value of Machine Learning Models

论文作者

Sayin, Burcu, Yang, Jie, Chen, Xinyue, Passerini, Andrea, Casati, Fabio

论文摘要

在本文中，我们认为，培训和评估机器学习模型的主要方法通常无法在组织或社会环境中考虑其现实世界的应用，在这种情况下，它们旨在为人们创造有益的价值。我们提出了透视，重新定义模型评估和选择的转变，以强调将机器预测与人类专业知识相结合的工作流程，尤其是在需要人类干预以进行低信心预测的情况下。诸如准确性和F得分之类的传统指标无法捕获这种混合设置中模型的有益价值。为了解决这个问题，我们引入了一个简单而理论上的声音“价值”度量标准，该指标结合了特定于任务的成本，以实现正确的预测，错误和拒绝，为实际评估提供了实用的框架。通过广泛的实验，我们表明现有的指标无法捕获现实世界的需求，通常会在用于对分类器进行排名的情况下，从价值上产生次优选择。此外，我们强调了校准在确定模型值中的关键作用，这表明简单，精心校准的模型通常可以超过更具校准的更复杂的模型。

In this paper, we argue that the prevailing approach to training and evaluating machine learning models often fails to consider their real-world application within organizational or societal contexts, where they are intended to create beneficial value for people. We propose a shift in perspective, redefining model assessment and selection to emphasize integration into workflows that combine machine predictions with human expertise, particularly in scenarios requiring human intervention for low-confidence predictions. Traditional metrics like accuracy and f-score fail to capture the beneficial value of models in such hybrid settings. To address this, we introduce a simple yet theoretically sound "value" metric that incorporates task-specific costs for correct predictions, errors, and rejections, offering a practical framework for real-world evaluation. Through extensive experiments, we show that existing metrics fail to capture real-world needs, often leading to suboptimal choices in terms of value when used to rank classifiers. Furthermore, we emphasize the critical role of calibration in determining model value, showing that simple, well-calibrated models can often outperform more complex models that are challenging to calibrate.

下载PDF全文

下载文献需遵守相关版权规定

论文标题