Concepts Dictionary Metrics¶
Concept dictionary metrics evaluate the concept space via its dictionary. They can either be compute on a single dictionary or compare two or more dictionaries.
from interpreto.concepts.metrics import MetricClass
metric = MetricClass(concept_explainer1, concept_explainer2, ...)
score = metric.compute()
Stability¶
interpreto.concepts.metrics.Stability
¶
Stability(*concept_explainers, matching_algorithm=COSINE_HUNGARIAN)
Code concepts/metrics/dictionary_metrics.py
Stability metric between sets of dictionaries, introduced by Fel et al. (2023)1. Also called Consistency by Paulo et Belrose (2025)2.
- If only one dictionary is provided, the metric is a self comparison of the dictionary.
- If two dictionaries are provided, the metric is a comparison between the two dictionaries.
- If more than two dictionaries are provided, the metric is the mean of the pairwise comparisons.
-
Fel, T., Boutin, V., Béthune, L., Cadène, R., Moayeri, M., Andéol, L., Chavidal, M., & Serre, T. A holistic approach to unifying automatic concept extraction and concept importance estimation. Advances in Neural Information Processing Systems. 2023. ↩
-
Paulo, G et Belrose, N. Sparse Autoencoders Trained on the Same Data Learn Different Features 2025. ↩
Parameters:
Name | Type | Description | Default |
---|---|---|---|
|
ConceptAutoEncoderExplainer | Float[Tensor, 'cpt d']
|
The |
()
|
|
DistanceFunctionProtocol
|
The algorithm used to match concepts between dictionaries. Defaults to ConceptMatchingAlgorithm.COSINE_HUNGARIAN. |
COSINE_HUNGARIAN
|
Examples:
>>> import torch
>>> from interpreto.concepts import NMFConcepts
>>> from interpreto.concepts.metrics import Stability
>>> # Iterate on random seeds
>>> concept_explainers = []
>>> for seed in range(10):
... # set seed
... torch.manual_seed(seed)
... # Create a concept model
... nmf_explainer = NMFConcepts(model_with_split_points, nb_concepts=20, device="cuda", force_relu=True)
... # Fit the concept model
... nmf_explainer.fit(activations)
... concept_explainers.append(nmf_explainer)
>>> # Compute the stability metric
>>> stability = Stability(*concept_explainers)
>>> score = stability.compute()
Raises:
Type | Description |
---|---|
ValueError
|
If no |
ValueError
|
If the matching algorithm is not supported. |
ValueError
|
If the dictionaries are not torch.Tensor. |
ValueError
|
If the dictionaries have different shapes. |
Source code in interpreto/concepts/metrics/dictionary_metrics.py
compute
¶
Compute the mean score over pairwise comparison scores between dictionaries.
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
The stability score. |