
Interpreto attribution tutorial¶
This notebook contains examples of what you can do with the attribution module of Interpreto.
The first part focuses on classification, while the second explores generation.
author: Fanny Jourdan & Antonin Poché
Available methods¶
All methods will have at list one example in this notebook.
Inference based methods: - Occlusion - LIME - KernelSHAP - Sobol
Gradients based methods: - Saliency - Integrated Gradients - SmoothGrad
Imports¶
import sys
sys.path.append("../..")
import torch
from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification, AutoTokenizer
from interpreto import (
Granularity,
IntegratedGradients,
KernelShap,
Lime,
Occlusion,
Saliency,
SmoothGrad,
Sobol,
plot_attributions,
)
from interpreto.attributions import InferenceModes
from interpreto.attributions.metrics import Deletion, Insertion
from interpreto.commons import GranularityAggregationStrategy
I. Classification task¶
I.0 Setup¶
Loading a BERT model for the IMDB dataset.
model_name_classif = "textattack/bert-base-uncased-imdb"
model_classif = AutoModelForSequenceClassification.from_pretrained(model_name_classif)
tokenizer_classif = AutoTokenizer.from_pretrained(model_name_classif)
dico_name_classes = {0: "negative", 1: "positive"}
I.1 Minimal example¶
sentence = "This is the best movie I have ever seen. The cinematography was uncharacteristically breathtaking."
# Instantiate the Occlusion explainer with the model and tokenizer
explainer = Occlusion(model_classif, tokenizer_classif)
# Compute the attributions on a given sentence
attributions = explainer(sentence)
# Visualize the attributions
plot_attributions(attributions[0], classes_names=["negative", "positive"])
I.2 Compute explanations on multiple inputs optimally¶
# Gradient-based methods have the exact same API
explainer = Saliency(model_classif, tokenizer_classif)
# Inputs can also be a list of strings, or even `input_ids`
attributions = explainer(
[
"This is the best movie I have ever seen.",
"I hate this movie.",
"This movie is super good. I love it.",
]
)
# To visualize multiple attributions, simply loop over the list of attributions
for attr in attributions:
plot_attributions(attr, classes_names=dico_name_classes)
I.3 Changing the granularity_level and inference_mode¶
# Let's modify the default parameters of the Lime explainer
explainer = Lime(
# ---------------------------------
# common to all attribution methods
model_classif,
tokenizer_classif,
batch_size=32, # default is 4
# study impact on the softmax of the logits, default is the logits
inference_mode=InferenceModes.SOFTMAX,
# attribution at the word level, default is the token level
granularity=Granularity.WORD,
# ----------------------------------------
# common to all perturbation-based methods
n_perturbations=20,
# ----------------
# specific to Lime
# arguments possible value are in classes static attributes, in Enums
distance_function=Lime.distance_functions.HAMMING,
)
# The `__call__` method is a renaming of the `explain` method
attrs = explainer.explain(model_inputs="Would Interpreto be a good movie name?")
# Let's visualize the attributions
plot_attributions(attrs[0], classes_names=dico_name_classes)
# for gradient-based methods, there is an additional argument for granularity,
# since it is necessary to choose how the token scores calculated from the gradients are to be aggregated.
explainer = Saliency(
# common to all attribution methods
model_classif,
tokenizer_classif,
# attribution at the word level, default is the all_tokens level
granularity=Granularity.WORD,
# aggregation method for the word level attributions (specific on for gradient-based methods)
granularity_aggregation_strategy=GranularityAggregationStrategy.SIGNED_MAX,
)
# The `__call__` method is a renaming of the `explain` method
attrs = explainer.explain(model_inputs="Would Interpreto be a bad movie name?")
# Let's visualize the attributions
plot_attributions(attrs[0], classes_names=dico_name_classes)
I.4 Explaining multiple classes at once¶
Specifying what to explain is the role of the targets argument of the explain method.
For n inputs, there should be n elements in the targets argument.
But each elements of the targets can specify several classes to explain.
Therefore, the targets shape should be (n, t), with t the number of classes to explain.
When targets are not specified, the explainer compute the model's prediction for each input.
Then it explains the model's prediction for each input.
# remember, the BERT trained on IMDB, movie reviews
explainer = KernelShap(model_classif, tokenizer_classif)
# we explain the prediction for both the positive and negative class
attributions = explainer(
model_inputs=["I do not know if this is the best or the worst movie ever, I am confused."],
targets=torch.tensor([[0, 1]]),
)
# be careful, we use a new visualization class for multi class attributions
plot_attributions(attributions[0], classes_names=dico_name_classes)
I.5 Metrics¶
To evaluate attributions, interpreto proposes two faithfulness metrics: Insertion and Deletion.
On just have to provide the model, tokenizer, and attributions outputs to it.
Note that such metric scores should only be used to compare explanations on the same model and samples.
Furthermore, the metrics have been shown to require about 1000 samples to converge.
# use the same model as for the explainer
metric = Deletion(model_classif, tokenizer_classif)
# compute scores on attributions
auc, detailed_scores = metric.evaluate(attributions)
print(f"Deletion AUC: {round(auc, 3)} (higher is better)")
II. Generation¶
II.0 Setup¶
Let's load a good old GPT2.
model_gen = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer_gen = AutoTokenizer.from_pretrained("gpt2")
II.1 Minimal example¶
# the API is the same as classification
explainer = SmoothGrad(
model_gen,
tokenizer_gen,
granularity=Granularity.WORD,
granularity_aggregation_strategy=GranularityAggregationStrategy.MAX,
)
# if no target is specified, we generate the text on our part
# `generation_kwargs` are optional
attributions = explainer("Roses are red, the sky is", max_length=16)
# there is a third visualization class for generation attributions
plot_attributions(attributions[0])
II.2 Explain your own outputs¶
# naturally gradient-based methods also work
explainer = IntegratedGradients(model_gen, tokenizer_gen)
# you can pass strings as targets and even several samples at once
attributions = explainer(
model_inputs=["Interpreto can explain", "And even treat"],
targets=[" the outputs you provide.", " several samples at once."],
)
# for multiple samples, visualization need to be done one by one
for attr in attributions:
plot_attributions(attr)
II.3 Explain from tokenized inputs¶
# the default granularity ignores the special tokens
# but we can set it to ALL_TOKENS to include them
explainer = Sobol(model_gen, tokenizer_gen, granularity=Granularity.ALL_TOKENS)
tokenized_inputs = tokenizer_gen("Hi there, how are you?", return_tensors="pt")
tokenized_targets = tokenizer_gen("I am fine, thank you!", return_tensors="pt")
# these inputs/targets can be passed just like the others
attributions = explainer(tokenized_inputs, tokenized_targets)
plot_attributions(attributions[0])
II.4 Metrics¶
Generation attributions evaluation follows the same API.
# use the same model as for the explainer
metric = Insertion(model_gen, tokenizer_gen)
# compute scores on attributions
auc, detailed_scores = metric.evaluate(attributions)
print(f"Insertion AUC: {round(auc, 3)} (lower is better)")