VarGrad¶
Bases: MultitaskExplainerMixin
, AttributionExplainer
VarGrad is a gradient-based attribution method that computes the variance of input gradients under random perturbations. Unlike methods that average gradients (e.g., SmoothGrad), VarGrad focuses on capturing the sensitivity and variability of the model's response to local perturbations in input space.
The resulting attributions reveal regions where the gradient signal is consistently volatile, thus potentially highlighting areas where explanations may be less reliable or more fragile.
Procedure:
- Generate multiple perturbed versions of the input by adding noise (Gaussian) to the input embeddings.
- For each noisy input, compute the gradient of the output with respect to the embeddings.
- Compute the element-wise variance of the gradient values across these samples.
- Aggregate the result per token (e.g., by norm with the input) to get the final attribution scores
Reference: Richter et al. (2020). VarGrad: A Low-Variance Gradient Estimator for Variational Inference. Paper
Examples:
>>> from interpreto import VarGrad
>>> method = VarGrad(model, tokenizer, batch_size=4,
>>> n_perturbations=50, noise_std=0.01)
>>> explanations = method.explain(text)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
PreTrainedModel
|
model to explain |
required |
|
PreTrainedTokenizer
|
Hugging Face tokenizer associated with the model |
required |
|
int
|
batch size for the attribution method |
4
|
|
Granularity
|
The level of granularity for the explanation.
Options are: |
WORD
|
|
GranularityAggregationStrategy
|
how to aggregate token-level attributions into granularity scores.
Options are: MEAN, MAX, MIN, SUM, and SIGNED_MAX.
Ignored for |
MEAN
|
|
device
|
device on which the attribution method will be run |
None
|
|
Callable[[Tensor], Tensor]
|
The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode. |
LOGITS
|
|
bool
|
If True, multiplies the input embeddings with
their gradients before aggregation. Defaults to |
True
|
|
int
|
the number of interpolations to generate |
10
|
|
float
|
standard deviation of the Gaussian noise to add to the inputs |
0.1
|