论文标题
忠于模型还是属于数据?
True to the Model or True to the Data?
论文作者
论文摘要
最近的各种论文讨论了Shapley Value的应用,Shapley值是解释联盟游戏的概念,用于机器学习中的功能归因。但是,将机器学习模型连接到联盟游戏的正确方法是引起争议的根源。提出的两种主要方法在其在已知特征上条件的方式上有所不同,使用(1)介入或(2)观察性的条件期望。尽管以前的工作认为这两种方法通常是优选的,但我们认为选择是依赖应用程序的。此外,我们认为选择归结为是否需要忠于模型还是忠实于数据。我们使用线性模型来研究此选择。在得出了一种有效的方法来计算线性模型的观测条件期望shapley值之后,我们研究了模拟数据中的相关性如何影响观察条件期望值shapley值的收敛性。最后,我们提出了两个真实的数据示例,我们认为代表了特征归因的可能用例 - (1)信用风险建模和(2)生物学发现。我们展示了在每种情况下的不同选择函数的性能如何更好,以及如何通过建模选择影响可能的归因。
A variety of recent papers discuss the application of Shapley values, a concept for explaining coalitional games, for feature attribution in machine learning. However, the correct way to connect a machine learning model to a coalitional game has been a source of controversy. The two main approaches that have been proposed differ in the way that they condition on known features, using either (1) an interventional or (2) an observational conditional expectation. While previous work has argued that one of the two approaches is preferable in general, we argue that the choice is application dependent. Furthermore, we argue that the choice comes down to whether it is desirable to be true to the model or true to the data. We use linear models to investigate this choice. After deriving an efficient method for calculating observational conditional expectation Shapley values for linear models, we investigate how correlation in simulated data impacts the convergence of observational conditional expectation Shapley values. Finally, we present two real data examples that we consider to be representative of possible use cases for feature attribution -- (1) credit risk modeling and (2) biological discovery. We show how a different choice of value function performs better in each scenario, and how possible attributions are impacted by modeling choices.