Cockatiel¶
Implementation of the COCKATIEL framework from COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP by Jourdan et al. (2023).
interpreto.concepts.Cockatiel
¶
Cockatiel(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', force_relu=False, **kwargs)
Bases: NMFConcepts
Code: concepts/methods/cockatiel.py
Implementation of the Cockatiel concept explainer by Jourdan et al. (2023)1.
-
Jourdan F., Picard A., Fel T., Risser A., Loubes JM., and Asher N. COCKATIEL: COntinuous Concept ranKed ATtribution with Interpretable ELements for explaining neural net classifiers on NLP. Findings of the Association for Computational Linguistics (ACL 2023), pp. 5120–5136, 2023. ↩
Attributes:
Name | Type | Description |
---|---|---|
model_with_split_points |
ModelWithSplitPoints
|
The model to apply the explanation on.
It should have at least one split point on which |
split_point |
str | None
|
The split point used to train the |
concept_model |
SemiNMF
|
An Overcomplete NMF encoder-decoder. |
force_relu |
bool
|
Whether to force the activations to be positive. |
is_fitted |
bool
|
Whether the |
has_differentiable_concept_encoder |
bool
|
Whether the |
has_differentiable_concept_decoder |
bool
|
Whether the |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ModelWithSplitPoints
|
The model to apply the explanation on. It should have at least one split point on which a concept explainer can be trained. |
required |
|
int
|
Size of the SAE concept space. |
required |
|
str | None
|
The split point used to train the |
None
|
|
device | str
|
Device to use for the |
'cpu'
|
|
bool
|
Whether to force the activations to be positive. |
False
|
|
dict
|
Additional keyword arguments to pass to the |
{}
|
Source code in interpreto/concepts/methods/overcomplete.py
fit
¶
fit(activations, *, overwrite=False, **kwargs)
Fit an Overcomplete OptimDictionaryLearning model on the given activations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
Tensor | dict[str, Tensor]
|
The activations used for fitting the |
required |
|
bool
|
Whether to overwrite the current model if it has already been fitted. Default: False. |
False
|
|
dict
|
Additional keyword arguments to pass to the |
{}
|
Source code in interpreto/concepts/methods/overcomplete.py
encode_activations
¶
encode_activations(activations)
Encode the given activations using the concept_model
encoder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
LatentActivations
|
The activations to encode. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The encoded concept activations. |
Source code in interpreto/concepts/methods/overcomplete.py
decode_concepts
¶
decode_concepts(concepts)
Decode the given concepts using the concept_model
decoder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ConceptsActivations
|
The concepts to decode. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
The decoded model activations. |
Source code in interpreto/concepts/base.py
get_dictionary
¶
Get the dictionary learned by the fitted concept_model
.
Returns:
Type | Description |
---|---|
Tensor
|
torch.Tensor: A |
Source code in interpreto/concepts/base.py
interpret
¶
interpret(interpretation_method, concepts_indices, inputs=None, latent_activations=None, concepts_activations=None, **kwargs)
Interpret the concepts dimensions in the latent space into a human-readable format. The interpretation is a mapping between the concepts indices and an object allowing to interpret them. It can be a label, a description, examples, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
type[BaseConceptInterpretationMethod]
|
The interpretation method to use to interpret the concepts. |
required |
|
int | list[int] | Literal['all']
|
The indices of the concepts to interpret. If "all", all concepts are interpreted. |
required |
|
list[str] | None
|
The inputs to use for the interpretation.
Necessary if the source is not |
None
|
|
LatentActivations | dict[str, LatentActivations] | None
|
The latent activations to use for the interpretation.
Necessary if the source is |
None
|
|
ConceptsActivations | None
|
The concepts activations to use for the interpretation.
Necessary if the source is not |
None
|
|
Additional keyword arguments to pass to the interpretation method. |
{}
|
Returns:
Type | Description |
---|---|
Mapping[int, Any]
|
Mapping[int, Any]: A mapping between the concepts indices and the interpretation of the concepts. |
Source code in interpreto/concepts/base.py
input_concept_attribution
¶
Computes the attribution of each input to a given concept.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ModelInputs
|
The input data, which can be a string, a list of tokens/words/clauses/sentences, or a dataset. |
required |
|
int | list[int]
|
The concept index (or list of concepts indices) to analyze. |
required |
Returns:
Type | Description |
---|---|
list[float]
|
A list of attribution scores for each input. |
Source code in interpreto/concepts/methods/cockatiel.py
concept_output_attribution
¶
Computes the attribution of each concept for the logit of a target output element.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ModelInputs
|
An input datapoint for the model. |
required |
|
Tensor
|
Concept activation tensor. |
required |
|
int
|
The target class for which the concept output attribution should be computed. |
required |
Returns:
Type | Description |
---|---|
list[float]
|
A list of attribution scores for each concept. |