interpreto_banner

Interpreto attribution tutorial¶

This notebook contains examples of what you can do with the attribution module of Interpreto.

The first part focuses on classification, while the second explores generation.

author: Fanny Jourdan & Antonin Poché

Available methods¶

All methods will have at list one example in this notebook.

Inference based methods: - Occlusion - LIME - KernelSHAP - Sobol

Gradients based methods: - Saliency - Integrated Gradients - SmoothGrad

Imports¶

import sys

sys.path.append("../..")

import torch
from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification, AutoTokenizer

from interpreto import (
    Granularity,
    IntegratedGradients,
    KernelShap,
    Lime,
    Occlusion,
    Saliency,
    SmoothGrad,
    Sobol,
    plot_attributions,
)
from interpreto.attributions import InferenceModes
from interpreto.attributions.metrics import Deletion, Insertion
from interpreto.commons import GranularityAggregationStrategy

I. Classification task¶

I.0 Setup¶

Loading a BERT model for the IMDB dataset.

model_name_classif = "textattack/bert-base-uncased-imdb"
model_classif = AutoModelForSequenceClassification.from_pretrained(model_name_classif)
tokenizer_classif = AutoTokenizer.from_pretrained(model_name_classif)
dico_name_classes = {0: "negative", 1: "positive"}

Warning: You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads.

Loading weights:   0%|          | 0/201 [00:00<?, ?it/s]

I.1 Minimal example¶

sentence = "This is the best movie I have ever seen. The cinematography was uncharacteristically breathtaking."

# Instantiate the Occlusion explainer with the model and tokenizer
explainer = Occlusion(model_classif, tokenizer_classif)

# Compute the attributions on a given sentence
attributions = explainer(sentence)

# Visualize the attributions
plot_attributions(attributions[0], classes_names=["negative", "positive"])

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`

Classes

Inputs¶

I.2 Compute explanations on multiple inputs optimally¶

# Gradient-based methods have the exact same API
explainer = Saliency(model_classif, tokenizer_classif)

# Inputs can also be a list of strings, or even `input_ids`
attributions = explainer(
    [
        "This is the best movie I have ever seen.",
        "I hate this movie.",
        "This movie is super good. I love it.",
    ]
)

# To visualize multiple attributions, simply loop over the list of attributions
for attr in attributions:
    plot_attributions(attr, classes_names=dico_name_classes)

Classes

Inputs¶

Classes

Inputs

Classes

Inputs

I.3 Changing the `granularity_level` and `inference_mode`¶

# Let's modify the default parameters of the Lime explainer
explainer = Lime(
    # ---------------------------------
    # common to all attribution methods
    model_classif,
    tokenizer_classif,
    batch_size=32,  # default is 4
    # study impact on the softmax of the logits, default is the logits
    inference_mode=InferenceModes.SOFTMAX,
    # attribution at the word level, default is the token level
    granularity=Granularity.WORD,
    # ----------------------------------------
    # common to all perturbation-based methods
    n_perturbations=20,
    # ----------------
    # specific to Lime
    # arguments possible value are in classes static attributes, in Enums
    distance_function=Lime.distance_functions.HAMMING,
)

# The `__call__` method is a renaming of the `explain` method
attrs = explainer.explain(model_inputs="Would Interpreto be a good movie name?")

# Let's visualize the attributions
plot_attributions(attrs[0], classes_names=dico_name_classes)

Classes

Inputs¶

# for gradient-based methods, there is an additional argument for granularity,
# since it is necessary to choose how the token scores calculated from the gradients are to be aggregated.
explainer = Saliency(
    # common to all attribution methods
    model_classif,
    tokenizer_classif,
    # attribution at the word level, default is the all_tokens level
    granularity=Granularity.WORD,
    # aggregation method for the word level attributions (specific on for gradient-based methods)
    granularity_aggregation_strategy=GranularityAggregationStrategy.SIGNED_MAX,
)

# The `__call__` method is a renaming of the `explain` method
attrs = explainer.explain(model_inputs="Would Interpreto be a bad movie name?")

# Let's visualize the attributions
plot_attributions(attrs[0], classes_names=dico_name_classes)

Classes

Inputs¶

I.4 Explaining multiple classes at once¶

Specifying what to explain is the role of the targets argument of the explain method.

For n inputs, there should be n elements in the targets argument. But each elements of the targets can specify several classes to explain. Therefore, the targets shape should be (n, t), with t the number of classes to explain.

When targets are not specified, the explainer compute the model's prediction for each input. Then it explains the model's prediction for each input.

# remember, the BERT trained on IMDB, movie reviews
explainer = KernelShap(model_classif, tokenizer_classif)

# we explain the prediction for both the positive and negative class
attributions = explainer(
    model_inputs=["I do not know if this is the best or the worst movie ever, I am confused."],
    targets=torch.tensor([[0, 1]]),
)

# be careful, we use a new visualization class for multi class attributions
plot_attributions(attributions[0], classes_names=dico_name_classes)

Classes

Inputs¶

I.5 Metrics¶

To evaluate attributions, interpreto proposes two faithfulness metrics: Insertion and Deletion.

On just have to provide the model, tokenizer, and attributions outputs to it.

Note that such metric scores should only be used to compare explanations on the same model and samples.

Furthermore, the metrics have been shown to require about 1000 samples to converge.

# use the same model as for the explainer
metric = Deletion(model_classif, tokenizer_classif)

# compute scores on attributions
auc, detailed_scores = metric.evaluate(attributions)

print(f"Deletion AUC: {round(auc, 3)} (lower is better)")

Deletion AUC: 0.064 (higher is better)

II. Generation¶

II.0 Setup¶

Let's load a good old GPT2.

device = "cuda" if torch.cuda.is_available() else "cpu"
model_gen = AutoModelForCausalLM.from_pretrained("gpt2").to(device)
tokenizer_gen = AutoTokenizer.from_pretrained("gpt2")

Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]

II.1 Minimal example¶

# the API is the same as classification

explainer = SmoothGrad(
    model_gen,
    tokenizer_gen,
    granularity=Granularity.WORD,
    granularity_aggregation_strategy=GranularityAggregationStrategy.MAX,
)


attributions = explainer("Roses are red, the sky is", "blue")

# there is a third visualization class for generation attributions
plot_attributions(attributions[0])

Inputs

Outputs¶

II.2 Explain several examples at the same time¶

# naturally gradient-based methods also work
explainer = IntegratedGradients(model_gen, tokenizer_gen)

# you can pass strings as targets and even several samples at once
attributions = explainer(
    model_inputs=["Interpreto can explain", "And even treat"],
    targets=[" the outputs you provide.", " several samples at once."],
)

# for multiple samples, visualization need to be done one by one
for attr in attributions:
    plot_attributions(attr)

Inputs

Outputs¶

Inputs

Outputs¶

II.3 Explain from tokenized inputs¶

# the default granularity ignores the special tokens
# but we can set it to ALL_TOKENS to include them
explainer = Sobol(model_gen, tokenizer_gen, granularity=Granularity.ALL_TOKENS)

tokenized_inputs = tokenizer_gen("Hi there, how are you?", return_tensors="pt")
tokenized_targets = tokenizer_gen("I am fine, thank you!", return_tensors="pt")

# these inputs/targets can be passed just like the others
attributions = explainer(tokenized_inputs, tokenized_targets)

plot_attributions(attributions[0])

Inputs

Outputs¶

II.4 Generate the output and then explain it¶

text = "Hi, how are you?"

tokenized_inputs = tokenizer_gen(text, return_tensors="pt").to(device)
all_generation = model_gen.generate(**tokenized_inputs, max_length=15)
target = all_generation[:, tokenized_inputs["input_ids"].shape[1] :]

explainer = Occlusion(
    model_gen,
    tokenizer_gen,
    granularity=Granularity.WORD,
)


attributions = explainer(tokenized_inputs, target)
plot_attributions(attributions[0])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Inputs

Outputs¶

II.5 Metrics¶

Generation attributions evaluation follows the same API.

# use the same model as for the explainer
metric = Insertion(model_gen, tokenizer_gen)

# compute scores on attributions
auc, detailed_scores = metric.evaluate(attributions)

print(f"Insertion AUC: {round(auc, 3)} (higher is better)")

Insertion AUC: 0.021 (lower is better)

Interpreto attribution tutorial¶

Available methods¶

Imports¶

I. Classification task¶

I.0 Setup¶

I.1 Minimal example¶

Classes

Inputs¶

I.2 Compute explanations on multiple inputs optimally¶

Classes

Inputs¶

Classes

Inputs

Classes

Inputs

I.3 Changing the granularity_level and inference_mode¶

Classes

Inputs¶

Classes

Inputs¶

I.4 Explaining multiple classes at once¶

Classes

Inputs¶

I.5 Metrics¶

II. Generation¶

II.0 Setup¶

II.1 Minimal example¶

Inputs

Outputs¶

II.2 Explain several examples at the same time¶

Inputs

Outputs¶

Inputs

Outputs¶

II.3 Explain from tokenized inputs¶

Inputs

Outputs¶

II.4 Generate the output and then explain it¶

Inputs

Outputs¶

II.5 Metrics¶

I.3 Changing the `granularity_level` and `inference_mode`¶