Skip to content

Concept Explanation Metrics

As described in the Concept Explainers page, concept-based explanations are obtain by applying several steps:

  • Defining the concept-space, usually through dictionary learning.

  • Interpreting the direction of the concept space.

  • Eventually, measuring the importance of each concept in the prediction.

Concept-based metrics evaluate either one of the two first components, or both. To evaluate the importance of a concept, attribution metrics are used.

Each metric takes in different arguments, hence no common API can be defined, apart from the compute method:

from interpreto.concepts.metrics import MetricClass

metric = MetricClass(...)
score = metric.compute(...)
Metric family Property Metrics
Dictionary Metrics Concept-space Stability Stability
Reconstruction Metrics Concept-space Faithfulness MSE
FID
Custom
Sparsity Metrics Concept-space Complexity Sparsity
Sparsity Ratio

Evaluating the concept-space

The concept space is define by a concept model encoding latent activations into concept activations. Some concept models can also reconstruct the latent activations from the concept activations.

Several properties of the concept-space are desirable:

  • The concept-space should be faithful to the latent space data distribution.

  • The concept-space should have a low complexity to push toward interpretability.

  • The concept-space is stable across different training regimes.

Concept-space faithfulness

Concept-space faithfulness is often measured via the reconstruction error of the latent activations when projected back and forth in the concept space. Either in the latent space or in the logits space. The distance used to compare activations has a big impact on what is measured.

In interpreto you can use the ReconstructionError to define a custom metric by specifying a reconstruction_space and a distance_function. Or you can use the MSE or FID metrics.

Concept-space complexity

The concept-space complexity is often measured via the sparsity of its activations.

In interpreto you can use: Sparsity and SparsityRatio.

Concept-space stability

The concept-space should stay the same across different training regimes. i.e. with different seeds, different data splits...

In interpreto you can use: Stability to compare concept-model dictionaries.