基于NLP的自动合规性检查数据处理协议针对GDPR

论文标题

基于NLP的自动合规性检查数据处理协议针对GDPR

NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR

论文作者

Amaral, Orlando, Azeem, Muhammad Ilyas, Abualhaija, Sallam, Briand, Lionel C

论文摘要

处理个人数据在欧洲通过数据处理协议（DPA）在欧洲受到调节。检查DPA的合规性有助于软件系统的合规性验证，因为DPA是涉及处理个人数据的软件开发需求的重要来源。但是，手动检查给定的DPA是否符合GDPR是否具有挑战性，因为它需要大量的时间和精力来理解和识别GDPR中与DPA相关的合规性要求，然后在DPA中验证这些要求。在本文中，我们提出了一种自动解决方案，以检查给定DPA对GDPR的依从性。在与法律专家的密切互动中，我们首先建立了两个工件：（i）从与DPA合规性相关的GDPR规定中提取的“应”要求，以及（ii）一个词汇表，定义了要求中的法律概念。然后，我们开发了一种自动化解决方案，该解决方案利用自然语言处理（NLP）技术来检查给定DPA对这些“应”要求的依从性。具体而言，我们的方法会自动为DPA的文本内容生成短语级表示，并将其与“应”要求的预定义表示形式进行比较。在一个由30个实际DPA的数据集中，该方法正确地找到了750次真正的违规行为中的618个，同时提出了76次虚假违规行为，并进一步正确地确定了524个满足要求。因此，该方法的平均精度为89.1％，召回82.4％，精度为84.6％。与依靠现成的NLP工具的基线相比，我们的方法的平均准确度增益约为20个百分点。通过有限的手动验证工作，我们方法的准确性可以提高到〜94％。

Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the "shall" requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these "shall" requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares it against predefined representations of the "shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of ~20 percentage points. The accuracy of our approach can be improved to ~94% with limited manual verification effort.

下载PDF全文

下载文献需遵守相关版权规定

论文标题