Efficient and simple prediction explanations with groupShapley -- A practical perspective


Shapley values has established itself as one of the most appropriate and theoretically sound frameworks for explaining predictions from complex machine learning models. The popularity of Shapley values in the explanation setting is probably due to Shapley values’ unique theoretical properties. The main drawback with Shapley values, however, is that the computational complexity grows exponentially in the number of input features, making it unfeasible in many real world situations where there could be hundreds or thousands of features. Furthermore, with many (dependent) features, presenting/visualizing and interpreting the computed Shapley values also becomes challenging. The present paper introduces and showcases a method that we call groupShapley. The idea of the method is to group features and then compute and present Shapley values for these groups instead of for all individual features. Reducing hundreds or thousands of features to half a dozen or so feature groups makes precise computations practically feasible, and the presentation and knowledge extraction greatly simplified. We give practical advice for using the approach and illustrate its usability in three different real world examples. The examples vary in both data type (regular tabular data and time series), feature dimension (medium to high), and application (insurance, genetics, and banking).

Nov 30, 2021