Interpreto attribution tutorial¶
This notebook contains examples of what you can do with the attribution module of Interpreto.
The first part focuses on classification, while the second explores generation.
author: Fanny Jourdan & Antonin Poché
Available methods¶
All methods will have at list one example in this notebook.
Inference based methods: - Occlusion - LIME - KernelSHAP - Sobol
Gradients based methods: - Saliency - Integrated Gradients - SmoothGrad
Imports¶
import sys
sys.path.append("../..")
import torch
from transformers import AutoModelForCausalLM, AutoModelForSequenceClassification, AutoTokenizer
from interpreto import (
AttributionVisualization,
Granularity,
IntegratedGradients,
KernelShap,
Lime,
Occlusion,
Saliency,
SmoothGrad,
Sobol,
)
from interpreto.attributions import InferenceModes
I. Classification task¶
I.0 Setup¶
Loading a BERT model for the IMDB dataset.
model_name = "textattack/bert-base-uncased-imdb"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
dico_name_classes = {0: "negative", 1: "positive"}
I.1 Minimal example¶
sentence = "This is the best movie I have ever seen. The cinematography was uncharacteristically breathtaking."
# Instantiate the Occlusion explainer with the model and tokenizer
explainer = Occlusion(model, tokenizer)
# Compute the attributions on a given sentence
attributions = explainer(sentence)
# Visualize the attributions
AttributionVisualization(attributions[0]).display()
# AttributionVisualization(attributions[0], class_names={0: "Class A", 1: "Class B"}).display()