论文标题
在本地环境中增强隐私权
Privacy-Preserving Boosting in the Local Setting
论文作者
论文摘要
在机器学习中,提升是最受欢迎的方法之一,旨在将多个基础学习者结合到上级学习者。著名的增强决策树分类器在许多领域被广泛采用。在大数据时代,个人和实体持有的数据,例如个人图像,浏览历史记录和人口普查信息,更有可能包含敏感信息。当此类数据离开所有者的手并进一步探索或开采时,隐私问题引起了人们的注意。这样的隐私问题要求机器学习算法应该是隐私意识。最近,将当地的差异隐私作为一种有效的隐私保护方法提出,该方法为数据所有者提供了强有力的保证,因为数据在进一步使用之前会受到干扰,而真实价值永远不会离开所有者的手。因此,具有私人数据实例的机器学习算法具有很大的价值和重要性。在本文中,我们有兴趣开发隐私保护算法,即允许数据用户在不知道或得出每个数据示例的确切值的情况下构建分类器。我们的实验证明了提出的增强算法和学识渊博的分类器的高效用的有效性。
In machine learning, boosting is one of the most popular methods that designed to combine multiple base learners to a superior one. The well-known Boosted Decision Tree classifier, has been widely adopted in many areas. In the big data era, the data held by individual and entities, like personal images, browsing history and census information, are more likely to contain sensitive information. The privacy concern raises when such data leaves the hand of the owners and be further explored or mined. Such privacy issue demands that the machine learning algorithm should be privacy aware. Recently, Local Differential Privacy is proposed as an effective privacy protection approach, which offers a strong guarantee to the data owners, as the data is perturbed before any further usage, and the true values never leave the hands of the owners. Thus the machine learning algorithm with the private data instances is of great value and importance. In this paper, we are interested in developing the privacy-preserving boosting algorithm that a data user is allowed to build a classifier without knowing or deriving the exact value of each data samples. Our experiments demonstrate the effectiveness of the proposed boosting algorithm and the high utility of the learned classifiers.