Skip to content

Saliency

Bases: MultitaskExplainerMixin, AttributionExplainer

Saliency maps are a simple and widely used gradient-based method for interpreting neural network predictions. The idea is to compute the gradient of the model's output with respect to its input embeddings to estimate which input tokens most influence the output.

Procedure: - Pass the input through the model to obtain an output (e.g., class logit, token probability). - Compute the gradient of the output with respect to the input embeddings. - For each token, reduce the gradient vector (e.g., via norm with the embedding) to obtain a scalar importance score.

Reference: Simonyan et al. (2013). Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Paper

Examples:

>>> from interpreto import Saliency
>>> method = Saliency(model, tokenizer, batch_size=4)
>>> explanations = method.explain(text)

Parameters:

Name Type Description Default

model

PreTrainedModel

model to explain

required

tokenizer

PreTrainedTokenizer

Hugging Face tokenizer associated with the model

required

batch_size

int

batch size for the attribution method

4

device

device

device on which the attribution method will be run

None

inference_mode

Callable[[Tensor], Tensor]

The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode.

LOGITS