Skip to content

Gradient Shap

Bases: MultitaskExplainerMixin, AttributionExplainer

GradientSHAP is a gradient-based Shapley value estimator that computes attributions by integrating model gradients along a path between a baseline (reference) and the input. It approximates Shapley values by averaging multiple stochastic integrated gradients across randomly sampled paths.

By combining ideas from Integrated Gradients and Shapley value theory, GradientSHAP provides additive feature attributions with strong consistency guarantees, while capturing non-linear effects.

Reference: Lundberg and Lee (2017). A Unified Approach to Interpreting Model Predictions. Paper

Examples:

>>> from interpreto import GradientShap
>>> method = GradientShap(model, tokenizer, batch_size=4,
>>>                       n_perturbations=20,
>>>                       baseline=0,
>>>                       noise_std=0.1,)
>>> explanations = method(text)

Parameters:

Name Type Description Default

model

PreTrainedModel

model to explain

required

tokenizer

PreTrainedTokenizer

Hugging Face tokenizer associated with the model

required

batch_size

int

batch size for the attribution method

4

granularity

Granularity

The level of granularity for the explanation. Options are: ALL_TOKENS, TOKEN, WORD, or SENTENCE. Defaults to Granularity.WORD. To obtain it, from interpreto import Granularity then Granularity.WORD.

WORD

granularity_aggregation_strategy

GranularityAggregationStrategy

how to aggregate token-level attributions into granularity scores. Options are: MEAN, MAX, MIN, SUM, and SIGNED_MAX. Ignored for granularity set to ALL_TOKENS or TOKEN.

MEAN

device

device

device on which the attribution method will be run

None

inference_mode

Callable[[Tensor], Tensor]

The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode.

LOGITS

input_x_gradient

bool

If True, multiplies the input embeddings with their gradients before aggregation. Defaults to True.

True

n_perturbations

int

the number of interpolations to generate

10

baseline

Tensor | float | None

the baseline to use for the interpolations

None

noise_std

float

the standard deviation of the noise added to the baseline

0.1