Interpreto attribution tutorial¶

This notebook contains examples of what you can do with the attribution module of Interpreto.

The first part focuses on classification, while the second explores generation.

author: Fanny Jourdan & Antonin Poché

Available methods¶

All methods will have at list one example in this notebook.

Inference based methods: - Occlusion - LIME - KernelSHAP - Sobol

Gradients based methods: - Saliency - Integrated Gradients - SmoothGrad

Imports¶

import sys

sys.path.append("../..")

import torch
from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification, AutoTokenizer

from interpreto import (
    AttributionVisualization,
    Granularity,
    IntegratedGradients,
    KernelShap,
    Lime,
    Occlusion,
    Saliency,
    SmoothGrad,
    Sobol,
)
from interpreto.attributions import InferenceModes

I. Classification task¶

I.0 Setup¶

Loading a BERT model for the IMDB dataset.

model_name = "textattack/bert-base-uncased-imdb"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
dico_name_classes = {0: "negative", 1: "positive"}

I.1 Minimal example¶

sentence = "This is the best movie I have ever seen. The cinematography was uncharacteristically breathtaking."

# Instantiate the Occlusion explainer with the model and tokenizer
explainer = Occlusion(model, tokenizer)

# Compute the attributions on a given sentence
attributions = explainer(sentence)

# Visualize the attributions
AttributionVisualization(attributions[0]).display()
# AttributionVisualization(attributions[0], class_names={0: "Class A", 1: "Class B"}).display()

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`

Class¶

Inputs¶

I.2 Compute explanations on multiple inputs optimally¶

# Gradient-based methods have the exact same API
explainer = Saliency(model, tokenizer)

# Inputs can also be a list of strings, or even `input_ids`
attributions = explainer(
    [
        "This is the best movie I have ever seen.",
        "I hate this movie.",
        "This movie is super good. I love it.",
    ]
)

# To visualize multiple attributions, simply loop over the list of attributions
for attr in attributions:
    AttributionVisualization(attr, class_names=dico_name_classes).display()

Class¶

Inputs¶

Class¶

Inputs

Class¶

Inputs

I.3 Changing the `granularity_level` and `inference_mode`¶

# Let's modify the default parameters of the Lime explainer
explainer = Lime(
    # ---------------------------------
    # common to all attribution methods
    model,
    tokenizer,
    batch_size=32,  # default is 4
    # study perturbations impact on the softmax of the logits, default is the logits
    inference_mode=InferenceModes.SOFTMAX,
    # ----------------------------------------
    # common to all perturbation-based methods
    n_perturbations=20,
    # attribution at the word level, default is the token level
    granularity=Granularity.WORD,
    # ----------------
    # specific to Lime
    # arguments possible value are in classes static attributes, in Enums
    distance_function=Lime.distance_functions.HAMMING,
)

# The `__call__` method is a renaming of the `explain` method
attrs = explainer.explain(model_inputs="Would Interpreto be a good movie name?")

# Let's visualize the attributions
AttributionVisualization(attrs[0], class_names=dico_name_classes).display()

Class¶

Inputs¶

I.4 Explaining multiple classes at once¶

Specifying what to explain is the role of the targets argument of the explain method.

For n inputs, there should be n elements in the targets argument. But each elements of the targets can specify several classes to explain. Therefore, the targets shape should be (n, t), with t the number of classes to explain.

When targets are not specified, the explainer compute the model's prediction for each input. Then it explains the model's prediction for each input.

# remember, the BERT trained on IMDB, movie reviews
explainer = KernelShap(model, tokenizer)

# we explain the prediction for both the positive and negative class
attributions = explainer(
    model_inputs=["I do not know if this is the best or the worst movie ever, I am confused."],
    targets=torch.tensor([[0, 1]]),
)

# be careful, we use a new visualization class for multi class attributions
AttributionVisualization(attributions[0], class_names=dico_name_classes).display()

Classes¶

Inputs¶

II. Generation¶

II.0 Setup¶

Let's load a good old GPT2.

model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")

II.1 Minimal example¶

# the API is the same as classification
explainer = SmoothGrad(model, tokenizer)

# if no target is specified, we generate the text on our part
# `generation_kwargs` are optional
attributions = explainer("Roses are red, the sky is", max_length=16)

# there is a third visualization class for generation attributions
AttributionVisualization(attributions[0]).display()

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Inputs¶

Outputs¶

II.2 Explain your own outputs¶

# naturally gradient-based methods also work
explainer = IntegratedGradients(model, tokenizer)

# you can pass strings as targets and even several samples at once
attributions = explainer(
    model_inputs=["Interpreto can explain", "And even treat"],
    targets=[" the outputs you provide.", " several samples at once."],
)

# for multiple samples, visualization need to be done one by one
for attr in attributions:
    AttributionVisualization(attr).display()

Inputs¶

Outputs¶

Inputs¶

Outputs¶

II.3 Explain from tokenized inputs¶

# the default granularity ignores the special tokens
# but we can set it to ALL_TOKENS to include them
explainer = Sobol(model, tokenizer, granularity=Granularity.ALL_TOKENS)

tokenized_inputs = tokenizer("Hi there, how are you?", return_tensors="pt")
tokenized_targets = tokenizer("I am fine, thank you!", return_tensors="pt")

# these inputs/targets can be passed just like the others
attributions = explainer(tokenized_inputs, tokenized_targets)

AttributionVisualization(attributions[0]).display()

Interpreto attribution tutorial¶

Available methods¶

Imports¶

I. Classification task¶

I.0 Setup¶

I.1 Minimal example¶

Class¶

Inputs¶

I.2 Compute explanations on multiple inputs optimally¶

Class¶

Inputs¶

Class¶

Inputs

Class¶

Inputs

I.3 Changing the granularity_level and inference_mode¶

Class¶

Inputs¶

I.4 Explaining multiple classes at once¶

Classes¶

Inputs¶

II. Generation¶

II.0 Setup¶

II.1 Minimal example¶

Inputs¶

Outputs¶

II.2 Explain your own outputs¶

Inputs¶

Outputs¶

Inputs¶

Outputs¶

II.3 Explain from tokenized inputs¶

Inputs¶

Outputs¶

I.3 Changing the `granularity_level` and `inference_mode`¶