论文标题

新模型更好吗?一个指标说是的,但另一个表示否。我使用哪个指标?

Is the new model better? One metric says yes, but the other says no. Which metric do I use?

论文作者

Zhou, Qian M., Lu, Zhe, Brooke, Russell J., Hudson, Melissa M, Yuan, Yan

论文摘要

增量值(INCV)评估了从现有风险模型到新模型的性能变化。这是确定新风险模型是否比现有风险模型更好的关键考虑因素之一。当不同的INCV指标相互矛盾时,就会出现问题。例如,与规定的剂量模型相比,用于预测急性卵巢衰竭的卵巢剂量模型在接收器操作特征曲线(AUC)下的面积略低,但在Precision-Recall曲线(AP)下的面积增加了48%。这种相互矛盾的结论现象并不少见,它在医疗决策中造成了困境。在本文中,我们研究了两个INCV指标之间的分析连接和差异:AUC中的INCV(INCV-AUC)和AP中的INCV(INCV-AP)。此外,由于它们都是半培训的评分规则,因此我们将它们与严格正确的得分规则进行比较:通过数值研究,缩放的Brier得分(INCV-SBRS)的INCV。我们证明,INCV-AUC和INCV-AP都是更改的加权平均值(从现有模型到新模型),以分离事件和非事件之间的风险评分分布。但是,INCV-AP为高风险组的变化分配了较重的权重,而INCV-AUC则平均加权变化。在数值研究中,我们发现INCV-AP的范围很大,从负到正面,但是INCV-AUC的大小要小得多。此外,INCV-AP和INCV-SBR SARE高度一致,但INCV-AUC与INCV-SBRS负相关,并以较低的事件速率与INCV-AP相关。 INCV-AUC和INCV-AP在这三对中的一致性最小,并且随着事件速率降低,它们的差异更为明显。

Incremental value (IncV) evaluates the performance change from an existing risk model to a new model. It is one of the key considerations in deciding whether a new risk model performs better than the existing one. Problems arise when different IncV metrics contradict each other. For example, compared with a prescribed-dose model, an ovarian-dose model for predicting acute ovarian failure has a slightly lower area under the receiver operating characteristic curve (AUC) but increases the area under the precision-recall curve (AP) by 48%. This phenomenon of conflicting conclusions is not uncommon, and it creates a dilemma in medical decision making. In this article, we examine the analytical connections and differences between two IncV metrics: IncV in AUC (IncV-AUC) and IncV in AP (IncV-AP). Additionally, since they are both semi-proper scoring rules, we compare them with a strictly proper scoring rule: the IncV of the scaled Brier score (IncV-sBrS), via a numerical study. We demonstrate that both IncV-AUC and IncV-AP are weighted averages of the changes (from the existing model to the new one) in separating the risk score distributions between events and non-events. However, IncV-AP assigns heavier weights to the changes in the high-risk group, whereas IncV-AUC weights the changes equally. In the numerical study, we find that IncV-AP has a wide range, from negative to positive, but the size of IncV-AUC is much smaller. In addition, IncV-AP and IncV-sBr Sare highly consistent, but IncV-AUC is negatively correlated with IncV-sBrS and IncV-AP at a low event rate. IncV-AUC and IncV-AP are the least consistent among the three pairs, and their differences are more pronounced as the event rate decreases.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源