论文标题
cutpointr:改进了R中最佳切口的估计和验证
cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R
论文作者
论文摘要
对于二进制分类任务的“最佳切口”通常是通过测试来确定的,该测试可从特定样本中进行最佳歧视,例如Youden索引。这会导致“最佳”切口,这些切口高度可变,并且系统地高估了样本外的性能。为了解决这些问题,Cutpointr软件包提供了可靠的方法,用于估计最佳切割点和样本外部性能。强大的方法包括基于内核估计,广义添加剂模型,平滑样条和局部回归的启动和平滑。这些方法可以应用于广泛的二元分类和基于成本的指标。 CutPointr还提供了利用用户定义的指标和估计方法的机制。该软件包具有引导程序并行化的功能,包括可复制的随机数生成。此外,它是对管道友好型的,例如,用于与Tidyverse功能的兼容。包括用于绘制接收器操作特征曲线,精确召回图,引导结果和数据的其他表示的各种功能。该软件包包含来自心理特征和自杀尝试的研究的示例数据,适合应用二进制分类算法。
'Optimal cutpoints' for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in 'optimal' cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.