Skip to content

API: Attributions Methods

Common API

from interpreto import Method

explainer = Method(model, tokenizer, kwargs)
explanation = explainer(inputs, targets)

The API have two steps:

Step 1: Explainer instantiation: Method is a generic name for an attribution method from the list of available methods (below). It inherits from the AttributionExplainer base class. Its initialization takes the general parameters explained here (and parameters that are specific to each method, explained in their own documentation):

  • model (PreTrainedModel): Hugging Face model to explain,
  • tokenizer (PreTrainedTokenizer): Hugging Face tokenizer associated with the model,
  • batch_size (int): batch size for the attribution method,
  • device (torch.device | None = None): device on which the attribution method will be run,
  • args: specifics for each method.

Note: There are also two arguments, general to all inference-based attribution methods, which are:

  • granularity (Granularity): granularity level of the perturbations. It can be either one of ALL_TOKENS (includes all sentence tokens, even special tokens), TOKEN, and WORD.
  • inference_mode (Callable[[torch.Tensor], torch.Tensor], optional): The mode used for inference. It can be either one of LOGITS, SOFTMAX, or LOG_SOFTMAX. Use InferenceModes to choose the appropriate mode. Default inference_mode is LOGITS.

Step 2: The AttributionExplainer class overloads the __call__ method to directly invoke the explain function. Therefore, calling explainer(inputs, targets) is equivalent to explainer.explain(inputs, targets). It takes two parameters:

  • inputs: the samples for which explanations are requested.
  • targets: specifies what to explain in the inputs. This can be a specific class or a set of classes (for classification tasks), or target texts (for generation tasks). If targets=None, the target is automatically inferred by performing a prediction on the input using the provided model.

Available methods

➡️ Inference-based Methods:

🔁 Gradient based methods:

Custom API

You can easily use Interpreto with your attribution methods. You have to implement two things:

  • a Perturbator to generate perturbed elements.
  • an Aggregator to aggregate scores on perturbed elements.

Then, inherit from MultitaskExplainerMixin and AttributionExplainer, and finally, in the initialization of your method, point to your perturbator and aggregator in the super().__init__.

Please check how we did it with the implemented method. If you have any questions, please file an issue or contact us directly.

Once you succeed, please make a pull request. We welcome your method and contributions to the library.

perturbator = Perturbator(inputs_embedder)
aggregator = Aggregator()
explainer = AttributionExplainer(model, tokenizer, batch_size, perturbator, aggregator, device)
explanation = explainer(inputs, targets)

The API have two steps:

Step 1: Instantiate a perturbator and an aggregator that will be used by the attribution method.

perturbator:

  • inputs_embedder: Optional module used to embed input IDs when only input_ids are provided.

aggregator: no argument.

Step 2: Instantiate the AttributionExplainer class with the perturbator and aggregator.

AttributionExplainer is the class that instantiates any attribution method based on whether or not it uses gradients (with the boolean argument use_gradient), its perturbation type and its aggregation type. It takes the custom parameters (perturbator, aggregator, use_gradient) and the classic parameters for explanation:

  • model (PreTrainedModel): Hugging Face model to explain,
  • tokenizer (PreTrainedTokenizer): Hugging Face tokenizer associated with the model,
  • batch_size (int): batch size for the attribution method,
  • perturbator (BasePerturbator | None = None): Instance used to generate input perturbations. If None, the perturbator returns only the original input,
  • aggregator (Aggregator | None = None): Instance used to aggregate computed attribution scores. If None, the aggregator returns the original scores,
  • device (torch.device | None = None): device on which the attribution method will be run,
  • use_gradient (bool): If True, computes gradients instead of inference for targeted explanations.

The AttributionExplainer class overloads the __call__ method to directly invoke explain. Thus, explainer(inputs, targets) is equivalent to explainer.explain(inputs, targets). The explain method takes two parameters:

  • inputs: the samples on which the explanations are requested, see inputs section for more detail.
  • targets: another parameter to specify what to explain in the inputs, can be a specific class or a set of classes (for classification), or texts (for generation).