论文标题
临床语言理解评估(线索)
Clinical Language Understanding Evaluation (CLUE)
论文作者
论文摘要
近年来,临床语言处理引起了很多关注,从而导致了新的模型或疾病表型,死亡率预测和其他任务的方法。不幸的是,这些方法中的许多方法都经过不同的实验设置(例如数据源,培训和测试拆分,指标,评估标准等)的测试,从而使很难比较方法并确定最新方法。为了解决这些问题并促进可重复性和比较,我们通过一组四个临床语言理解任务,标准培训,开发,验证和测试集介绍了临床语言理解评估(CLUE)基准测试,以及来自模拟数据的测试集以及软件工具包。我们希望这些数据能够在方法之间进行直接比较,提高可重现性,并减少开发新型模型或这些临床语言理解任务的方法的障碍。
Clinical language processing has received a lot of attention in recent years, resulting in new models or methods for disease phenotyping, mortality prediction, and other tasks. Unfortunately, many of these approaches are tested under different experimental settings (e.g., data sources, training and testing splits, metrics, evaluation criteria, etc.) making it difficult to compare approaches and determine state-of-the-art. To address these issues and facilitate reproducibility and comparison, we present the Clinical Language Understanding Evaluation (CLUE) benchmark with a set of four clinical language understanding tasks, standard training, development, validation and testing sets derived from MIMIC data, as well as a software toolkit. It is our hope that these data will enable direct comparison between approaches, improve reproducibility, and reduce the barrier-to-entry for developing novel models or methods for these clinical language understanding tasks.