Gradient Shap¶
Bases: MultitaskExplainerMixin
, AttributionExplainer
GradientSHAP is a gradient-based Shapley value estimator that computes attributions by integrating model gradients along a path between a baseline (reference) and the input. It approximates Shapley values by averaging multiple stochastic integrated gradients across randomly sampled paths.
By combining ideas from Integrated Gradients and Shapley value theory, GradientSHAP provides additive feature attributions with strong consistency guarantees, while capturing non-linear effects.
Reference: Lundberg and Lee (2017). A Unified Approach to Interpreting Model Predictions. Paper
Examples:
>>> from interpreto import GradientShap
>>> method = GradientShap(model, tokenizer, batch_size=4,
>>> n_perturbations=20,
>>> baseline=0,
>>> noise_std=0.1,)
>>> explanations = method(text)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
PreTrainedModel
|
model to explain |
required |
|
PreTrainedTokenizer
|
Hugging Face tokenizer associated with the model |
required |
|
int
|
batch size for the attribution method |
4
|
|
Granularity
|
The level of granularity for the explanation.
Options are: |
WORD
|
|
GranularityAggregationStrategy
|
how to aggregate token-level attributions into granularity scores.
Options are: MEAN, MAX, MIN, SUM, and SIGNED_MAX.
Ignored for |
MEAN
|
|
device
|
device on which the attribution method will be run |
None
|
|
Callable[[Tensor], Tensor]
|
The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode. |
LOGITS
|
|
bool
|
If True, multiplies the input embeddings with
their gradients before aggregation. Defaults to |
True
|
|
int
|
the number of interpolations to generate |
10
|
|
Tensor | float | None
|
the baseline to use for the interpolations |
None
|
|
float
|
the standard deviation of the noise added to the baseline |
0.1
|