Smoothgrad¶
Bases: MultitaskExplainerMixin
, AttributionExplainer
SmoothGrad is an enhanced version of gradient-based interpretability methods, such as saliency maps. It reduces the noise and visual instability often seen in raw gradient attributions by averaging gradients over multiple noisy versions of the input. The result is a smoothed importance score for each token.
Procedure: - Generate multiple perturbed versions of the input by adding noise (Gaussian) to the input embeddings. - For each noisy input, compute the gradient of the output with respect to the embeddings. - Average the gradients across all samples. - Aggregate the result per token (e.g., by norm with the input) to get the final attribution scores.
Reference: Smilkov et al. (2017). SmoothGrad: removing noise by adding noise. Paper
Examples:
>>> from interpreto import Smoothgrad
>>> method = Smoothgrad(model, tokenizer, batch_size=4,
>>> n_interpolations=50, noise_level=0.01)
>>> explanations = method.explain(text)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
PreTrainedModel
|
model to explain |
required |
|
PreTrainedTokenizer
|
Hugging Face tokenizer associated with the model |
required |
|
int
|
batch size for the attribution method |
4
|
|
device
|
device on which the attribution method will be run |
None
|
|
Callable[[Tensor], Tensor]
|
The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode. |
LOGITS
|
|
int
|
the number of interpolations to generate |
10
|
|
float
|
standard deviation of the Gaussian noise to add to the inputs |
0.1
|