Thoughts and Theory

TL;DR

Problem – Neural networks cannot explain how they arrive to a prediction, hence their deployment in safety-critical applications is discouraged.
Solution – Logic Explained Networks are novel "explainable-by-design" deep learning models, providing logic explanations of their predictions.
Have a go! – "PyTorch, Explain!" is a python package providing simple APIs to implement Logic Explained Networks.

Why you might be interested?

I— Are you a researcher or a startupper with a super cool deep learning system you want to deploy to save the world? Well, you probably can’t…

YOU – Why?

I – Because deep neural networks are "black boxes": They are not designed to explain how they arrive to a prediction!

YOU – And so what?

I – Well, the use of "black boxes" is now forbidden for many safety-critical applications in US and Europe (if you don’t believe me, check out https://gdpr.eu/).

YOU – So, what can I do?

I – Read this post and learn how you can deploy your state-of-the-art deep learning system safely and lawfully! 🙂

A knowledge gap in Explainable Artificial Intelligence (XAI)

Why can’t we use (standard) deep learning to solve real-world problems?

The application of deep learning (DL) in safety-critical domains has been strongly limited by lawmakers, as DL models are generally considered as black-boxes whose decision processe is opaque and too complex to understand for laypeople (and even for DL experts!).

For this reason, explainable artificial intelligence (XAI) research has focused either on explaining black box decisions or on developing machine learning models "interpretable by design" (like decision trees). However, while interpretable models engender trust in their predictions (which is why we like them…) [1], black box models, such as neural networks, are the ones that generally provide state-of-the-art task performances (which is why we like them!) [2].

Knowledge gaps in XAI research

Most techniques explaining black boxes focus on finding or ranking the most relevant features used by the black box to make predictions. Such "feature-scoring" methods are very efficient and widely used, but they cannot explain how neural networks compose such features to make predictions [3]. In addition, a key issue of most explaining methods is that explanations are given in terms of input features (e.g. pixel intensities) that do not correspond to high-level categories that humans can easily understand [4].

To overcome this issue, concept-based approaches have become increasingly popular as they provide explanations in terms of human-understandable categories (i.e. the "concepts") rather than raw features [4]. In simple terms, a concept-based model is a function f mapping categorical inputs C (i.e. the "concepts") into categorical outputs Y (i.e. the target classes):

f: C ↦ Y

If your input features are not categorical (e.g. pixel intensities for an image), you can first map your inputs X into a categorical "concept space" C, using a function g:

g: X ↦ C

Have a look at this example based on the CUB dataset (a dataset for bird classification from images [5]):

First, the classifier g learns how to map pixel intensities (input features of the image) into the categorical concept space C, where each category correspond to a bird characteristic;
Second, the classifier f learns how to map the input sample described in terms of concepts C into categorical outputs corresponding to bird names (target classes).

A black-box concept bottleneck model. Image by the author.

However, despite being useful, most of these concept-based approaches focus on the identification of the most relevant concepts, but they are not able to explain how such concepts are leveraged by the classifier f and even less provide concise explanations whose validity can be assessed quantitatively.

Very few approaches are able to explain how neural networks compose features/concepts to make predictions.

What do you need then?

In summary what you need is a new deep learning paradigm allowing you to deploy models that:

Achieve classification accuracies close to the state of the art reached by equivalent "standard" black boxes.
Explain how input features are composed to make predictions.
Provide clear explanations whose quality can be assessed quantitatively.

Do you still want to deploy your state-of-the-art DL-powered system safely and lawfully?

(LENs)

Logic Explained Networks (or LENs) are a special family of concept-based neural networks providing first-order logic (FOL) explanations for their decisions.

Have a look at the same example on the CUB dataset:

The classifier g performs the same action (i.e. predicting concepts from images);
However, the classifier f is now a Logic Explained Network, providing both the predictions for the target classes + logic formulas explaining how the network f leverages the input concepts to arrive to a decision!

A Logic Explained Network in action! Image by the author.

Why logic explanations?

Compared to other XAI techniques, first order logic explanations provide many key advantages:

Clarity – An explanation reported in FOL is a rigorous and unambiguous statement. **** This formal clarity may serve cognitive-behavioral purposes such as engendering trust, aiding bias identification, or taking actions/decisions. For instance, dropping quantifiers and variables for simplicity, the formula "snow ∧ tree ↔ wolf" may easily outline the presence of a bias in the collection of training data.
Modularity – Different logic-based explanations can be combined to describe groups of observations or global phenomena. For instance, for an image showing only the face of a person, an explanation could be "nose ∧ lips → human", while for another image showing a person from behind a valid explanation could be "feet ∧ hair ∧ ears → human". The two local explanations can be combined into "(nose ∧ lips) ∨ (feet ∧ hair ∧ ears) → human".
Measurability – The quality of logic-based explanations can be quantitatively measured to check their validity and completeness. For instance, once the explanation "(nose ∧ lips) ∨ (feet ∧ hair ∧ ears)” is extracted for the class "human", this logic formula can be applied on a test set to check its generality in terms of quantitative metrics like accuracy, fidelity and consistency.
Simplifiability— FOL explanations can be rewritten in equivalent forms such as in Disjunctive Normal Form (DNF) and Conjunctive Normal Form (CNF). Further, techniques such as the Quine–McCluskey algorithm can be used to simplify logic explanations. For instance, the formula "(person ∧ nose) ∨ (¬person ∧ nose)” can be easily simplified in "nose".

Logic Explained Networks are a special kind of concept-based neural networks providing first-order logic explanations for their decisions.

The LENs paradigm: Explainability by design

The LENs family is a class of neural models explainable by design.

"What do you mean by a model which is explainable by design?", you may ask. Well, there is a clear distinction between post-hoc methods and the LEN paradigm which is explainable by design. Post-hoc methods generally do not impose constraints on the classifier: the model itself is free from any constraints related to the explanation method. This is why this class of methods is called post hoc: After the usual training is completed, the XAI method kicks in. The LENs paradigm instead is explainable by design as it embeds additional constraints both in the architecture and in the loss function, making the network self explainable. This is why we say that LENs are explainable by design: The classifier itself is constrained to learn in a way that makes explanations emerge automatically.

If you try to understand what happened inside a "black box" only after the experiment, you are using a post-hoc method! Image from Max Pixel.

The LENs paradigm is explainable by design as it embeds additional constraints both in the architecture and in the loss function, making explanations emerge automatically.

Theoretical details (only a bit, I promise!)

The design of a logic explained network requires the specification of three aspects: the architecture, the loss function, and the parsimony criteria. The mix of these three elements allows the unique identification of a logic explained network. While the architecture and the loss functions are standard requirements for any neural network, parsimony criteria play a special role for LENs as they allow the classifier to mimic the way humans learn and provide explanations. In fact, when humans compare a set of hypotheses outlining the same outcomes, they tend to have an implicit bias towards the simplest ones. This phenomenon was observed by Aristotle in his "Posterior Analytics", it was rediscovered by later philosophers (like Ockham) and more recently studied in cognitive psychology, behavioral economics, and decision making (some of the most famous works are from George Miller [6] and the Nobel Prize and Turing Award winner Herbert Simon [7]). This cognitive bias is the main reason why XAI research cries out for simple explanations. Parsimony criteria are one way of encoding this inductive bias in end-to-end differentiable models – such as Logic Explained Networks!

When humans compare a set of hypotheses outlining the same outcomes, they tend to have an implicit bias towards the simplest ones – parsimony criteria encode this inductive bias in end-to-end differentiable models.

If you are interested in learning more about LENs and their parsimony criteria, you can find additional details in our recent paper [8].

However, if you just want to play around with LENs, then the next section is made for you!

PyTorch, Explain!

Reading requirements: basic knowledge of Python, pip, and PyTorch.

What is "PyTorch, Explain!"?

"PyTorch, Explain!" is an extension library for PyTorch to develop Logic Explained Networks!

You can install torch_explain along with all its dependencies from PyPI:

pip install -r requirements.txt torch-explain

The code is freely available on github.

Toy example #1

For this simple example, let’s solve the XOR problem using the first Logic Explained Network proposed by Ciravegna et al. in 2020 the authors called "ψ network" [9]. This LEN is characterized by:

a sequence of fully connected layers with sigmoid activation functions;
a pruning stage at training time to simplify the architecture.

You just need to import two libraries:

import torch
import torch_explain as te

and generate the training data as follows:

x_train = torch.tensor([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1],
], dtype=torch.float)
y_train = torch.tensor([0, 1, 1, 0], dtype=torch.long)

Let’s define a 3-layer ψ network:

layers = [
    torch.nn.Linear(x_train.shape[1], 10),
    torch.nn.Sigmoid(),
    torch.nn.Linear(10, 5),
    torch.nn.Sigmoid(),
    torch.nn.Linear(5, 1),
    torch.nn.Sigmoid(),
]
model = torch.nn.Sequential(*layers)

You can now train the network by optimizing the binary cross entropy loss and the l1_loss loss function incorporating the human prior towards simple explanations. We will prune the network after 1000 epochs:

optimizer = torch.optim.AdamW(model.parameters(), lr=0.01)
loss_form = torch.nn.BCELoss()
model.train()
for epoch in range(6001):
    optimizer.zero_grad()
    y_pred = model(x_train)
    loss = loss_form(y_pred, y_train) + 0.000001 * te.nn.functional.l1_loss(model)
    loss.backward()
    optimizer.step()

    model = prune_equal_fanin(model, epoch, prune_epoch=1000)

Once trained you can extract first-order logic formulas describing how the network composed the input features to obtain the predictions:

from torch_explain.logic.nn import psi
from torch.nn.functional import one_hot

y1h = one_hot(y_train.squeeze().long())
explanation = psi.explain_class(model, x_train)

Explanations will be logic formulas in disjunctive normal form. In this case, the explanation will be y=1 IFF (f1 AND ~f2) OR (f2 AND ~f1) corresponding to y=1 IFF f1 XOR f2.

The quality of the logic explanation can quantitatively be assessed in terms of classification accuracy and rule complexity as follows:

from torch_explain.logic.metrics import test_explanation, complexity

accuracy, preds = test_explanation(explanation, x_train, y1h, target_class=1)
explanation_complexity = complexity(explanation)

In this case the accuracy is 100% and the complexity is 4.

Toy example #2

Now let’s complicate the problem a bit: Let’s solve the XOR problem augmented with 100 dummy features. You just need to generate the training data as follows:

x0 = torch.zeros((4, 100))
x_train = torch.tensor([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1],
], dtype=torch.float)
x_train = torch.cat([x_train, x0], dim=1)
y_train = torch.tensor([0, 1, 1, 0], dtype=torch.long)

To solve this task, let’s define a more powerful LEN using a SPECIAL layer (i.e. the EntropyLayer) we recently introduced to implement a very efficient kind of LENs called "entropy-based LENs" [10]:

layers = [
    te.nn.EntropyLinear(x_train.shape[1], 10, n_classes=2),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(10, 4),
    torch.nn.LeakyReLU(),
    torch.nn.Linear(4, 1),
]
model = torch.nn.Sequential(*layers)

You can now train the network by optimizing the cross entropy loss and the entropy_logic_loss loss function incorporating the human prior towards simple explanations:

optimizer = torch.optim.AdamW(model.parameters(), lr=0.01)
loss_form = torch.nn.CrossEntropyLoss()
model.train()
for epoch in range(1001):
    optimizer.zero_grad()
    y_pred = model(x_train).squeeze(-1)
    loss = loss_form(y_pred, y_train) + 0.00001 * te.nn.functional.entropy_logic_loss(model)
    loss.backward()
    optimizer.step()

Once trained you can extract first-order logic formulas describing how the network composed the input features to obtain the predictions:

from torch_explain.logic.nn import entropy
from torch.nn.functional import one_hot

y1h = one_hot(y_train)
explanation, _ = entropy.explain_class(model, x_train, y1h, x_train, y1h, target_class=1)

Explanations will be logic formulas in disjunctive normal form. In this case, the explanation will be y=1 IFF (f1 AND ~f2) OR (f2 AND ~f1) corresponding to y=1 IFF f1 XOR f2.

The quality of the logic explanation can quantitatively assessed in terms of classification accuracy and rule complexity as follows:

from torch_explain.logic.metrics import test_explanation, complexity

accuracy, preds = test_explanation(explanation, x_train, y1h, target_class=1)
explanation_complexity = complexity(explanation)

In this case the accuracy is 100% and the complexity is 4.

How powerful are LENs?

At this point you may be conviced that LENs do what they do. However, you may still have a few practical questions! Now, I’ll do my best to guess some of them and provide a brief answer 🙂

Question 1 – How powerful are LENs? Or, in other words, how much you might expect to lose in terms of classification accuracy if you use LENs instead of an equivalent black box neural network?

Answer 1 -LENs accuracy is comparable with equivalent black boxes. In our paper we showed how, on challenging datasets, entropy-based LENs are quite competitive with respect to equivalent black boxes and they usually outperform white box methods like Decision Tree and Bayesian Rule List:

Classification accuracy of LENs with respect to state-of-the-art white box models. Image by the author.

Question 2 -What is the quality of the extracted FOL explanations in practice? Are the logic formulas accurate and concise? Or, in other words, are LENs providing flawed explanations? Is it total rubbish?

Answer 2—The explanations are simple and accurate! How much? Well, at the very least comparable to the quality provided by state-of-the-art white box-models. The picture below shows the quality of the logic explanations in terms of (1) the average classification test error (y-axis) and (2) the average complexity (number of literals) of the logic explanations (x-axis). While providing some of the most accurate formulas, the entropy-based network is also generating the simplest ones. How simple? Well, the vertical dotted red line marks the maximum explanation complexity laypeople can handle (~9 literals [6]). Most explanations from the entropy-based network are more concise (3–4 literals), so their interpretation is straightforward 🙂

Quantitative assessment of the quality of the explanations in terms of classification error (y-axis) and length of the logic rule (x-axis). Image by the author.

Take home messages

In this post I’ve tried to convey 4 key messages:

There is a urgent & practical need for models that are BOTH explainable & accurate.
First order logic explanations are cool because they can be quantitatively evaluated (which is critical for practical real-world applications).
Logic Explained Networks are a family of explainable-by-design neural models providing first order logic explanations.
Logic Explained Networks are BOTH explainable & accurate (and easy to implement!!!).

References

[1] Rudin, Cynthia. "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead." Nature Machine Intelligence 1.5 (2019): 206–215.

[2] Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

[3] Kindermans, Pieter-Jan, et al. "The (un) reliability of saliency methods." Explainable Ai: Interpreting, Explaining and Visualizing Deep Learning. Springer, Cham, 2019. 267–280.

[4] Koh, Pang Wei, et al. "Concept bottleneck models." International Conference on Machine Learning. PMLR, 2020.

[5] Wah, Catherine, et al. "The caltech-ucsd birds-200–2011 dataset." (2011).

[6] Miller, George A. "The magical number seven, plus or minus two: Some limits on our capacity for processing information." Psychological review 63.2 (1956): 81.

[7] Simon, Herbert A. "Models of man; social and rational." (1957).

[8] Ciravegna, Gabriele, et al."Logic Explained Networks." arXiv preprint arXiv:2108.05149 (2021).

[9] Ciravegna, Gabriele, et al. "A constraint-based approach to learning and explanation." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. №04. 2020.

[10] Barbiero, Pietro, et al. "Entropy-based Logic Explanations of Neural Networks." arXiv preprint arXiv:2106.06804 (2021).

Logic Explained Networks

Thoughts and Theory

Why you might be interested?

A knowledge gap in Explainable Artificial Intelligence (XAI)

Why can’t we use (standard) deep learning to solve real-world problems?

Knowledge gaps in XAI research

What do you need then?

(LENs)

Why logic explanations?

The LENs paradigm: Explainability by design

Theoretical details (only a bit, I promise!)

PyTorch, Explain!

What is "PyTorch, Explain!"?

Toy example #1

Toy example #2

How powerful are LENs?

Take home messages

References

Related Articles

Solving a Constrained Project Scheduling Problem with Quantum Annealing

Towards Generalization on Graphs: From Invariance to Causality

Monte Carlo Methods Decoded

Evaluating Cinematic Dialogue - Which syntactic and semantic features are predictive of genre?

Quantization, Linear Regression, and Hardware for AI: Our Best Recent Deep Dives

Prompt Engineering for Cognitive Flexibility

A Whimsical Journey Through Wait Times

From Adaline to Multilayer Neural Networks

Beyond the Blind Zone