How to measure interpretability?

MODEL INTERPRETABILITY

Today almost everyone uses, consciously or unconsciously, artificial intelligence algorithms. And more and more people are beginning to question how they work. In September 2020, Twitter apologized for ‘racist‘ image-cropping algorithm. This image-cropping anomaly has been found by Colin Madland a PhD candidate who wanted to tweet a similar anomaly from the software Zoom. Of course, this is a minor issue, but a sentence such as "The algorithm said to turn left for a reason, but we do not know what the reason is" cannot be an answer after a self-driving car accident. Then the question of interpretability arises.

If you work in the fields of Data Science, machine learning or artificial intelligence, you have probably heard about interpretability (if not, I recommend the reading of the book [5]). I am not an exception to the rule, and I have had the mission of designing an interpretable predictive algorithm for management of financial assets. So, I’ve started to do some research. After a few readings, I have found several algorithms presented as interpretable but I have also realized that everyone agrees that there is not yet a definition of interpretability and that this notion is difficult to define. So, without any quantitative measure, how to be sure that an algorithm is more interpretable than another? To answer to this question I have decided to create a measure of interpretability based on the triptych predictivity, stability and simplicity as proposed in [6].

Introduction

Usually, two main ways can be distinguished for the production of interpretable predictive models.

The first one relies on the use of a so-called post-hoc interpretable model. In this case, one uses of an uninterpretable Machine Learning algorithm to create predictive models, and then one tries to interpret the generated model.

The other way is to use an intrinsically interpretable algorithm to directly generate an interpretable model. Usually when one wants to design an intrinsically interpretable algorithm, one bases it on rules. A rule is a "If … Then …" statement, easily understandable by human. There exist two families of intrinsically interpretable algorithm: the tree-based algorithms generating trees and rule-based algorithms generating sets of rules. But any tree can be represented as a set of rules.

In this article I focus on the interpretability of algorithms based on rules. I describe how to evaluate the predictivity, stability and simplicity of a set of algorithms and then how to combine them to obtain an interpretability measure and identify the most interpretable.

How to evaluate predictivity ?

The predictivity is a positive number between 0 and 1 which evaluates the accuracy of the predictive model. The accuracy ensures a trustfulness in the generated model. Accuracy measure is a well studied notion in Machine Learning.

The accuracy function should be chosen according to the type of settings. For instance, one uses the mean square error for a regression problem, and the 0–1 error function for a binary classification setting. Using an error term in the predictivity function generates two constraints: a) this term has to be normalized to make it independent on the range of the predicted variable, and b) it must take its values between 0 and 1, 1 being the highest level of accuracy.

How to evaluate the stability?

The stability quantifies the noise sensitivity of an algorithm. It allows to evaluate the robustness of the algorithm. To measure the stability of a tree-based algorithm or a rule-based algorithm, in [1], Bénard et al. state that

"A rule learning algorithm is stable if two independent estimations based on two independent samples result in two similar lists of rules."

Unfortunately, for continuous variables, there is null probability for a tree cut to show the exact same value for a given rule, evaluated on two independent samples. For this reason, the pure stability appears too penalizing in this case.

In order to avoid this bias, I translate rules in a discretized version of the features space. I discretize each features in q bins by considering the q-quantiles ranges. Then for each rule, bounds of conditions are replaced by their corresponding bins. Usually q=10 is a good choice.

Finally, using the so-called Sørensen–Dice coefficient on two sets of rules generated by the same algorithms on two independent samples I obtain the stability value where 1 means that the two sets of rules are identical.

How to evaluate the simplicity?

Simplicity could be constructed as the capacity to audit the prediction easily. A simple model ensures that it is easy to check some qualitative criteria such as ethics and morals. To measure the simplicity of a model generated by a tree-based algorithm or a rule-based algorithm I use the interpretability index defined in [2]. The interpretability index is defined as the sum of the lengths of the rules of the generated model. It should not be confused with the interpretability I define in this article.

To have a measure between 0 and 1, I evaluate the simplicity of an algorithm among a set of algorithms, and I define its simplicity as the ratio between the minimum of the interpretability indexes of the considered algorithms divided by the interpretability index of this algorithm. Thus, it is better to have a small set of rules with small lengths than the opposite.

How to evaluate interpretability?

Finally, considering a set of tree-based algorithms and rule-based algorithm, I am able to point out the most interpretable one by using a convex combination of the predictivity, the stability and the simplicity. The coefficients of the combination can be chosen according to your desiderata. For instance, if you try to describe a phenomenon the simplicity and the stability are more important than the predictivity (provided it remains acceptable).

In [2], the authors define interpretability as following:

"In the context of ML systems, we define interpretability as the ability to explain or to present in understandable terms to a human."

I state that an algorithm with a high predictivity, stability and simplicity is interpretable in the sense of the previous definition. Indeed, high predictivity ensures trust, high stability score ensures robustness and a high simplicity ensures that the generated model can be easily understood by humans because it relies on a limited number of rules of small length.

Conclusions

In this article, I present a quantitative criterion to compare interpretability of tree-based algorithms and rule-based algorithms. This measure is based on the triptych predictivity, stability and simplicity. This concept of interpretability has been thought to be fair and rigorous. It can be adapted to the various desiderata of the statistician by choosing appropriate coefficients in the convex combination.

The methodology described provides a fair measure for classifying the interpretability of a set of algorithms. In fact, it allows to integrate the main goals of interpretability. An algorithm designed to be accurate, stable or simple should keep this property whatever the dataset.

However, according to the definition of simplicity, 100 rules of length 1 have the same simplicity that one single rule of length 100, which is debatable. Moreover, the stability measure is purely syntactic and rather restrictive. Indeed, if some features are duplicated, two rules may have two different syntactic conditions but be instead of by otherwise identical based in their activations. One way of relaxing this stability criterion could be to compare the rules, based on their activation sets (i.e. by looking to observations where conditions are met simultaneously).

If you want more precision about this measure of interpretability I refer you to the reading of the working paper [4].

Thanks to Ygor Rebouças Serpa for his remarks and comments.

References

[1] C.Bénard and G.Biau and S.da Veiga and E.Scornet, SIRUS: Stable and Interpretable RUle Set (2020), ArXiv

[2] F.Doshi-Velez and B.Kim, Towards A Rigorous Science of Interpretable Machine Learning (2017), ArXiv

[3] V.Margot and J.P.Baudry and F.Guilloux and O.Wintenberger, Consistent Regression using Data-Dependent Coverings (2020), ArXiv

[4] V.Margot and G.Luta, A rigorous method to compare interpretability (2020), ArXiv

[5] C.Molnar, Interpretable Machine Learning (2020), Lulu.com

[6] B.Yu and K.Kumbier, Veridical data science (2020), Proceedings of the National Academy of Sciences

About Us

Advestis is a European Contract Research Organization (CRO) with a deep understanding and practice of statistics, and interpretable machine learning techniques. The expertise of Advestis covers the modeling of complex systems and predictive analysis for temporal phenomena.

LinkedIn: https://www.linkedin.com/company/advestis/