Saliency¶
Bases: MultitaskExplainerMixin
, AttributionExplainer
Saliency maps are a simple and widely used gradient-based method for interpreting neural network predictions. The idea is to compute the gradient of the model's output with respect to its input embeddings to estimate which input tokens most influence the output.
Procedure: - Pass the input through the model to obtain an output (e.g., class logit, token probability). - Compute the gradient of the output with respect to the input embeddings. - For each token, reduce the gradient vector (e.g., via norm with the embedding) to obtain a scalar importance score.
Reference: Simonyan et al. (2013). Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. Paper
Examples:
>>> from interpreto import Saliency
>>> method = Saliency(model, tokenizer, batch_size=4)
>>> explanations = method.explain(text)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
PreTrainedModel
|
model to explain |
required |
|
PreTrainedTokenizer
|
Hugging Face tokenizer associated with the model |
required |
|
int
|
batch size for the attribution method |
4
|
|
device
|
device on which the attribution method will be run |
None
|
|
Callable[[Tensor], Tensor]
|
The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode. |
LOGITS
|