嵌入模型中的信息泄漏

论文标题

嵌入模型中的信息泄漏

Information Leakage in Embedding Models

论文作者

Song, Congzheng, Raghunathan, Ananth

论文摘要

嵌入是将原始输入数据映射到低维矢量表示的函数，同时保留有关输入的重要语义信息。在大量未标记的数据上进行预训练的嵌入，并将其微调用于下游任务，现在是在许多领域中实现最新学习状态的事实上的标准。我们证明，除了编码通用语义外，嵌入式嵌入通常还会提出一个向量，该向量会泄漏有关输入数据的敏感信息。我们开发了三类攻击，以系统地研究可能因嵌入而泄漏的信息。首先，嵌入向量可以倒置以部分恢复一些输入数据。例如，我们表明我们对流行句子嵌入的攻击在50 \％-70 \％之间恢复了输入词（F1分数为0.5---0.7）。其次，嵌入可能会揭示输入中固有的敏感属性，并且独立于手头的基本语义任务。只需在少数标记的嵌入矢量上训练推断模型，就可以轻松提取诸如文本作者之类的属性。第三，嵌入模型泄漏适量的会员信息，以供不常见的培训数据输入。我们广泛评估了对文本域中各种最新嵌入模型的攻击。我们还提出和评估防御能力，可以在某种程度上以较小的公用事业成本在某种程度上阻止泄漏。

Embeddings are functions that map raw input data to low-dimensional vector representations, while preserving important semantic information about the inputs. Pre-training embeddings on a large amount of unlabeled data and fine-tuning them for downstream tasks is now a de facto standard in achieving state of the art learning in many domains. We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data. We develop three classes of attacks to systematically study information that might be leaked by embeddings. First, embedding vectors can be inverted to partially recover some of the input data. As an example, we show that our attacks on popular sentence embeddings recover between 50\%--70\% of the input words (F1 scores of 0.5--0.7). Second, embeddings may reveal sensitive attributes inherent in inputs and independent of the underlying semantic task at hand. Attributes such as authorship of text can be easily extracted by training an inference model on just a handful of labeled embedding vectors. Third, embedding models leak moderate amount of membership information for infrequent training data inputs. We extensively evaluate our attacks on various state-of-the-art embedding models in the text domain. We also propose and evaluate defenses that can prevent the leakage to some extent at a minor cost in utility.

下载PDF全文

下载文献需遵守相关版权规定

论文标题