groupShapley: Efficient Shapley value explanation through feature groups

Licence: CC BY 2.0


Shapley values has established itself as one of the most appropriate and theoretically sound frameworks for explaining predictions from complex machine learning models. The main drawback with the Shapley value framework is that its computationally complexity grows exponentially in the number of input features, making it infeasible for use in real world situations with hundreds or even thousands of features. Furthermore, with many (dependent) features, presenting/visualizing and interpreting the computed Shapley values also becomes challenging.I hereby present groupSHAP, a conceptually simple approach for dealing with the aforementioned bottleneck for Shapley values. The idea is to group the features, perhaps of similar type, from the same data source or based on their dependence, to then compute and present Shapley values for these groups instead of for all the individual features. Reducing hundreds or thousands of features to say half a dozen or a dozen, will make precise computations practically feasible, and the presentation and knowledge extraction greatly simplified. In many situations it could be more informative to know that a certain set of features in total contribute to the prediction in certain way, rather than being confused with small positive and negative contributions from similar (types of) features. While this work focuses on the prediction explanation side, the general idea is valid also for other types of Shapley value based explanations, such as global model explanations. This is work in progress, but I will present some preliminary results and examples. The method is implemented in a development version of the R-package shapr.

Jan 27, 2021