Neurons as Concepts¶
interpreto.concepts.NeuronsAsConcepts
¶
NeuronsAsConcepts(model_with_split_points, split_point=None)
Bases: ConceptAutoEncoderExplainer
Code: concepts/methods/neurons_as_concepts.py
Concept Bottleneck Explainer where the latent space is considered as the concept space.
TODO: Add doc with papers we can redo with it.¶
Attributes:
Name | Type | Description |
---|---|---|
model_with_split_points |
ModelWithSplitPoints
|
The model to apply the explanation on.
It should have at least one split point on which |
split_point |
str
|
The split point used to train the |
concept_model |
IdentityConceptModel
|
An identity concept model for harmonization. |
is_fitted |
bool
|
Whether the |
has_differentiable_concept_encoder |
bool
|
Whether the |
has_differentiable_concept_decoder |
bool
|
Whether the |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ModelWithSplitPoints
|
The model to apply the explanation on. It should have at least one split point on which a concept explainer can be trained. |
required |
|
str | None
|
The split point used to train the |
None
|
Source code in interpreto/concepts/methods/neurons_as_concepts.py
interpret
¶
interpret(interpretation_method, concepts_indices, inputs=None, latent_activations=None, concepts_activations=None, **kwargs)
Interpret the concepts dimensions in the latent space into a human-readable format. The interpretation is a mapping between the concepts indices and an object allowing to interpret them. It can be a label, a description, examples, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
type[BaseConceptInterpretationMethod]
|
The interpretation method to use to interpret the concepts. |
required |
|
int | list[int] | Literal['all']
|
The indices of the concepts to interpret. If "all", all concepts are interpreted. |
required |
|
list[str] | None
|
The inputs to use for the interpretation.
Necessary if the source is not |
None
|
|
LatentActivations | dict[str, LatentActivations] | None
|
The latent activations to use for the interpretation.
Necessary if the source is |
None
|
|
ConceptsActivations | None
|
The concepts activations to use for the interpretation.
Necessary if the source is not |
None
|
|
Additional keyword arguments to pass to the interpretation method. |
{}
|
Returns:
Type | Description |
---|---|
Mapping[int, Any]
|
Mapping[int, Any]: A mapping between the concepts indices and the interpretation of the concepts. |
Source code in interpreto/concepts/base.py
input_concept_attribution
¶
input_concept_attribution(inputs, concept, attribution_method, **attribution_kwargs)
Attributes model inputs for a selected concept.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ModelInputs
|
The input data, which can be a string, a list of tokens/words/clauses/sentences or a dataset. |
required |
|
int
|
Index identifying the position of the concept of interest (score in the
|
required |
|
type[AttributionExplainer]
|
The attribution method to obtain importance scores for input elements. |
required |
Returns:
Type | Description |
---|---|
list[float]
|
A list of attribution scores for each input. |
Source code in interpreto/concepts/base.py
concept_output_attribution
¶
concept_output_attribution(inputs, concepts, target, attribution_method, **attribution_kwargs)
Computes the attribution of each concept for the logit of a target output element.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ModelInputs
|
An input data-point for the model. |
required |
|
Tensor
|
Concept activation tensor. |
required |
|
int
|
The target class for which the concept output attribution should be computed. |
required |
|
type[AttributionExplainer]
|
The attribution method to obtain importance scores for input elements. |
required |
Returns:
Type | Description |
---|---|
list[float]
|
A list of attribution scores for each concept. |