Concept Explanation Metrics¶

As described in the Concept Explainers page, concept-based explanations are obtain by applying several steps:

Defining the concept-space, usually through dictionary learning.
Interpreting the direction of the concept space.
Eventually, measuring the importance of each concept in the prediction.

Concept-based metrics evaluate either one of the two first components, or both. To evaluate the importance of a concept, attribution metrics are used.

Each metric takes in different arguments, hence no common API can be defined, apart from the compute method:

from interpreto.concepts.metrics import MetricClass

metric = MetricClass(...)
score = metric.compute(...)

Metric family	Property	Metrics
Dictionary Metrics	Concept-space Stability	Stability
Reconstruction Metrics	Concept-space Faithfulness	MSE FID Custom
Sparsity Metrics	Concept-space Complexity	Sparsity Sparsity Ratio

Evaluating the concept-space¶

The concept space is define by a concept model encoding latent activations into concept activations. Some concept models can also reconstruct the latent activations from the concept activations.

Several properties of the concept-space are desirable:

The concept-space should be faithful to the latent space data distribution.
The concept-space should have a low complexity to push toward interpretability.
The concept-space is stable across different training regimes.

Concept-space faithfulness¶

Concept-space faithfulness is often measured via the reconstruction error of the latent activations when projected back and forth in the concept space. Either in the latent space or in the logits space. The distance used to compare activations has a big impact on what is measured.

In interpreto you can use the ReconstructionError to define a custom metric by specifying a reconstruction_space and a distance_function. Or you can use the MSE or FID metrics.

Concept-space complexity¶

The concept-space complexity is often measured via the sparsity of its activations.

In interpreto you can use: Sparsity and SparsityRatio.

Concept-space stability¶

The concept-space should stay the same across different training regimes. i.e. with different seeds, different data splits...

In interpreto you can use: Stability to compare concept-model dictionaries.