论文标题
一项用于测量放射学研究报告的前瞻性随机临床试验报告基于人工智能的检测到新兴护理头CT中颅内出血的检测
A Prospective Randomized Clinical Trial for Measuring Radiology Study Reporting Time on Artificial Intelligence-Based Detection of Intracranial Hemorrhage in Emergent Care Head CT
论文作者
论文摘要
我们提出人工智能前瞻性随机观察者盲目评估(AI-Probe),以在前瞻性随机临床试验中对放射学AI系统的定量临床性能评估。 AI-Probe涵盖了研究设计和匹配的放射学IT基础架构,该基础架构随机使放射科医生视而不见,以获取基于AI的图像分析提供的结果。为了证明我们的评估框架的适用性,我们提出了第一个前瞻性随机临床试验,该试验对颅内出血(ICH)检测在新兴护理头CT中对放射学研究转环(TAT)的影响。在这里,我们在一家大型学术医院的住院和急诊室患者中获得了620个非对比度的CT扫描。收购后,使用市售软件(AIDOC,特拉维夫,以色列)自动分析扫描以了解ICH的存在。在放射学家阅读工作列表中标记了AI(ICH-AI+)对ICH呈阳性的病例,在该列表中,标记被随机关闭,概率为50%。 TAT的测量是研究完成和首次临床传播报告之间的时间差,并从各种IT系统中自动检索到时间戳记。对于非信号(132 +/- 193分钟)病例(p <0.05,单侧t检验)的标记病例(73 +/- 143分钟)的TAT显着低于TAT,其中122个ICH-AI+病例中的105例是真正的阳性。在所有分析情况下,总灵敏度,特异性和准确性分别为95.0%,96.7%和96.4%。我们得出的结论是,对ICH的自动识别可以减少紧急护理头CT中ICH的TAT,这具有改善及时的ICH临床管理的潜力。我们的结果表明,使用临床有意义的数量(例如TAT或诊断准确性),AI探针可以在临床实践中对AI系统进行系统的定量评估。
We propose Artificial Intelligence Prospective Randomized Observer Blinding Evaluation (AI-PROBE) for quantitative clinical performance evaluation of radiology AI systems within prospective randomized clinical trials. AI-PROBE encompasses a study design and a matching radiology IT infrastructure that randomly blinds radiologists for results provided by AI-based image analysis. To demonstrate the applicability of our evaluation framework, we present a first prospective randomized clinical trial on the effect of Intra-Cranial Hemorrhage (ICH) detection in emergent care head CT on radiology study Turn-Around Time (TAT). Here, we acquired 620 non-contrast head CT scans from inpatient and emergency room patients at a large academic hospital. Following acquisition, scans were automatically analyzed for the presence of ICH using commercially available software (Aidoc, Tel Aviv, Israel). Cases identified positive for ICH by AI (ICH-AI+) were flagged in radiologists' reading worklists, where flagging was randomly switched off with probability 50%. TAT was measured as time difference between study completion and first clinically communicated reporting, with time stamps automatically retrieved from various IT systems. TATs for flagged cases (73+/-143 min) were significantly lower than TATs for non-flagged (132+/-193 min) cases (p<0.05, one-sided t-test), where 105 of 122 ICH-AI+ cases were true positive. Total sensitivity, specificity, and accuracy over all analyzed cases were 95.0%, 96.7%, and 96.4%, respectively. We conclude that automatic identification of ICH reduces TAT for ICH in emergent care head CT, which carries the potential for improving timely clinical management of ICH. Our results suggest that AI-PROBE can contribute to systematic quantitative evaluation of AI systems in clinical practice using clinically meaningful quantities, such as TAT or diagnostic accuracy.