离线增强学习和不同的隐私

论文标题

离线增强学习和不同的隐私

Offline Reinforcement Learning with Differential Privacy

论文作者

Qiao, Dan, Wang, Yu-Xiang

论文摘要

离线加强学习（RL）问题通常是由于需要在财务，法律和医疗保健应用中学习数据驱动的决策政策的动机。但是，学习的政策可以保留个人在培训数据中的敏感信息（例如，患者的治疗和结果），因此容易受到各种隐私风险的影响。我们设计了具有不同隐私保证的离线RL算法，可防止这种风险。这些算法在表格和线性马尔可夫决策过程（MDP）设置下还享有强大的实例依赖性学习界。我们的理论和仿真表明，与中型数据集的非私人关系相比，隐私保证与（几乎）没有效用。

The offline reinforcement learning (RL) problem is often motivated by the need to learn data-driven decision policies in financial, legal and healthcare applications. However, the learned policy could retain sensitive information of individuals in the training data (e.g., treatment and outcome of patients), thus susceptible to various privacy risks. We design offline RL algorithms with differential privacy guarantees which provably prevent such risks. These algorithms also enjoy strong instance-dependent learning bounds under both tabular and linear Markov decision process (MDP) settings. Our theory and simulation suggest that the privacy guarantee comes at (almost) no drop in utility comparing to the non-private counterpart for a medium-size dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题