论文标题
餐饮:通过有条件水印的文本生成API的知识产权保护
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
论文作者
论文摘要
先前的工作已经证实了文本生成API可以通过模仿攻击而被偷走,从而造成IP违规。为了保护文本生成API的IP,最近的一项工作引入了水印算法,并利用了无效的假设测试作为模仿模型的事后所有权验证。但是,我们发现可以通过足够的候选水印单词频率来检测这些水印。为了解决这一缺点,在本文中,我们提出了一个新型的有条件水印框架(CATER),以保护文本生成API的IP。提出了一种优化方法来决定可以最大程度地减少整体单词分布的变形的水印规则,同时最大程度地减少条件单词选择的变化。从理论上讲,我们证明,即使是最精巧的攻击者(他们知道cater的工作原理)也是不可行的,可以根据统计检查揭示大量潜在单词对的二手水印。从经验上讲,我们观察到高阶条件会导致可疑(未使用)水印的指数增长,从而使我们的制作水印更加隐秘。此外,\ cater可以有效地确定建筑不匹配和跨域模仿攻击下的IP侵权,并且对受害者API的产生质量的损害可忽略不计。我们设想我们的工作是隐秘保护文本API的IP的里程碑。
Previous works have validated that text generation APIs can be stolen through imitation attacks, causing IP violations. In order to protect the IP of text generation APIs, a recent work has introduced a watermarking algorithm and utilized the null-hypothesis test as a post-hoc ownership verification on the imitation models. However, we find that it is possible to detect those watermarks via sufficient statistics of the frequencies of candidate watermarking words. To address this drawback, in this paper, we propose a novel Conditional wATERmarking framework (CATER) for protecting the IP of text generation APIs. An optimization method is proposed to decide the watermarking rules that can minimize the distortion of overall word distributions while maximizing the change of conditional word selections. Theoretically, we prove that it is infeasible for even the savviest attacker (they know how CATER works) to reveal the used watermarks from a large pool of potential word pairs based on statistical inspection. Empirically, we observe that high-order conditions lead to an exponential growth of suspicious (unused) watermarks, making our crafted watermarks more stealthy. In addition, \cater can effectively identify the IP infringement under architectural mismatch and cross-domain imitation attacks, with negligible impairments on the generation quality of victim APIs. We envision our work as a milestone for stealthily protecting the IP of text generation APIs.