Integrated Gradients¶
Bases: MultitaskExplainerMixin
, AttributionExplainer
Integrated Gradients (IG) is a gradient-based interpretability method that attributes importance scores to input features (e.g., tokens) by integrating the model’s gradients along a path from a baseline input to the actual input.
The method is designed to address some of the limitations of standard gradients, such as saturation and noise, by averaging gradients over interpolated inputs rather than relying on a single local gradient.
Reference: Sundararajan et al. (2017). Axiomatic Attribution for Deep Networks. Paper
Examples:
>>> from interpreto import IntegratedGradients
>>> method = IntegratedGradients(model=model, tokenizer=tokenizer,
>>> batch_size=4, n_interpolations=50)
>>> explanations = method.explain(model_inputs=text)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
PreTrainedModel
|
model to explain |
required |
|
PreTrainedTokenizer
|
Hugging Face tokenizer associated with the model |
required |
|
int
|
batch size for the attribution method |
4
|
|
device
|
device on which the attribution method will be run |
None
|
|
Callable[[Tensor], Tensor]
|
The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode. |
LOGITS
|
|
int
|
the number of interpolations to generate |
10
|
|
Tensor | float | None
|
the baseline to use for the interpolations |
None
|