从患者的角度来检测德国药物不良药物反应的跨语性方法

论文标题

从患者的角度来检测德国药物不良药物反应的跨语性方法

Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective

论文作者

Raithel, Lisa, Thomas, Philippe, Roller, Roland, Sapina, Oliver, Möller, Sebastian, Zweigenbaum, Pierre

论文摘要

在这项工作中，我们介绍了患者生成的含量中第一个用于德国不良药物反应（ADR）检测的语料库。该数据包括来自德国患者论坛的4,169个二进制注释的文档，用户谈论健康问题并从医生那里获得建议。正如该领域的社交媒体数据中常见的那样，语料库的类标签非常不平衡。这一主题不平衡使其成为一个非常具有挑战性的数据集，因为相同的症状通常会带来多种原因，并且并不总是与药物摄入有关。我们旨在鼓励在ADR检测领域进行进一步的多语性努力，并使用基于多语言模型的零和少量学习的不同方法为二进制分类提供初步实验。当在英语患者论坛数据上首次对XLM-Roberta进行微调，然后在新的德国数据上进行微调时，我们的正面阶级的F1得分为37.52。我们使数据集和模型公开可供社区使用。

In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题