Integrated Gradients¶

Bases: MultitaskExplainerMixin, AttributionExplainer

Integrated Gradients (IG) is a gradient-based interpretability method that attributes importance scores to input features (e.g., tokens) by integrating the model’s gradients along a path from a baseline input to the actual input.

The method is designed to address some of the limitations of standard gradients, such as saturation and noise, by averaging gradients over interpolated inputs rather than relying on a single local gradient.

Reference: Sundararajan et al. (2017). Axiomatic Attribution for Deep Networks. Paper

Examples:

>>> from interpreto import IntegratedGradients
>>> method = IntegratedGradients(model=model, tokenizer=tokenizer,
>>>                              batch_size=4, n_interpolations=50)
>>> explanations = method.explain(model_inputs=text)

Parameters:

Name	Type	Description	Default
`model` ¶	`PreTrainedModel`	model to explain	required
`tokenizer` ¶	`PreTrainedTokenizer`	Hugging Face tokenizer associated with the model	required
`batch_size` ¶	`int`	batch size for the attribution method	`4`
`device` ¶	`device`	device on which the attribution method will be run	`None`
`inference_mode` ¶	`Callable[[Tensor], Tensor]`	The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode.	`LOGITS`
`n_interpolations` ¶	`int`	the number of interpolations to generate	`10`
`baseline` ¶	`Tensor \| float \| None`	the baseline to use for the interpolations	`None`

Integrated Gradients¶

`model` ¶

`tokenizer` ¶

`batch_size` ¶

`device` ¶

`inference_mode` ¶

`n_interpolations` ¶

`baseline` ¶

Integrated Gradients¶

model ¶

tokenizer ¶

batch_size ¶

device ¶

inference_mode ¶

n_interpolations ¶

baseline ¶

`model` ¶

`tokenizer` ¶

`batch_size` ¶

`device` ¶

`inference_mode` ¶

`n_interpolations` ¶

`baseline` ¶