Skip to content

Optimization-based Dictionary Learning

Base abstract class

interpreto.concepts.methods.DictionaryLearningExplainer

DictionaryLearningExplainer(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: ConceptAutoEncoderExplainer[BaseOptimDictionaryLearning], Generic[_BODL_co]

Code: concepts/methods/overcomplete.py

Implementation of a concept explainer using an overcomplete.optimization.BaseOptimDictionaryLearning (NMF and PCA variants) as concept_model.

Attributes:

Name Type Description
model_with_split_points ModelWithSplitPoints

The model to apply the explanation on. It should have at least one split point on which concept_model can be fitted.

split_point str | None

The split point used to train the concept_model. Default: None, set only when the concept explainer is fitted.

concept_model BaseOptimDictionaryLearning

An Overcomplete BaseOptimDictionaryLearning variant for concept extraction.

is_fitted bool

Whether the concept_model was fit on model activations.

has_differentiable_concept_encoder bool

Whether the encode_activations operation is differentiable.

has_differentiable_concept_decoder bool

Whether the decode_concepts operation is differentiable.

Examples:

>>> import datasets
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from interpreto import ModelWithSplitPoints
>>> from interpreto.concepts import ICAConcepts
>>> from interpreto.concepts.interpretations import TopKInputs
>>> CLS_TOKEN = ModelWithSplitPoints.activation_granularities.CLS_TOKEN
>>> WORD = ModelWithSplitPoints.activation_granularities.WORD
...
>>> dataset = datasets.load_dataset("stanfordnlp/imdb")["train"]["text"][:1000]
>>> repo_id = "Qwen/Qwen3-0.6B"
>>> model = AutoModelForCausalLM.from_pretrained(repo_id, device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained(repo_id)
...
>>> # 1. Split your model in two parts
>>> splitted_model = ModelWithSplitPoints(
>>>     model, tokenizer=tokenizer, split_points=[5],
>>> )
...
>>> # 2. Compute a dataset of activations
>>> activations = splitted_model.get_activations(
>>>     dataset, activation_granularity=WORD
>>> )
...
>>> # 3. Fit a concept model on the dataset
>>> explainer = ICAConcepts(splitted_model, nb_concepts=20)
>>> explainer.fit(activations)
...
>>> # 4. Interpret the concepts
>>> interpreter = TopKInputs(
>>>     concept_explainer=explainer,
>>>     activation_granularity=WORD,
>>> )
>>> interpretations = interpreter.interpret(
>>>     inputs=dataset, latent_activations=activations
>>> )
...
>>> # Print the interpretations
>>> for id, words in interpretations.items():
>>>     print(f"Concept {id}: {list(words.keys())}")

Parameters:

Name Type Description Default

model_with_split_points

ModelWithSplitPoints

The model to apply the explanation on. It should have at least one split point on which a concept explainer can be trained.

required

nb_concepts

int

Size of the SAE concept space.

required

split_point

str | None

The split point used to train the concept_model. If None, tries to use the split point of model_with_split_points if a single one is defined.

None

device

device | str

Device to use for the concept_module.

'cpu'

**kwargs

dict

Additional keyword arguments to pass to the concept_module. See the Overcomplete documentation of the provided concept_model_class for more details.

{}
Source code in interpreto/concepts/methods/overcomplete.py
def __init__(
    self,
    model_with_split_points: ModelWithSplitPoints,
    *,
    nb_concepts: int,
    split_point: str | None = None,
    device: torch.device | str = "cpu",
    **kwargs,
):
    """
    Initialize the concept bottleneck explainer based on the Overcomplete BaseOptimDictionaryLearning framework.

    Args:
        model_with_split_points (ModelWithSplitPoints): The model to apply the explanation on.
            It should have at least one split point on which a concept explainer can be trained.
        nb_concepts (int): Size of the SAE concept space.
        split_point (str | None): The split point used to train the `concept_model`. If None, tries to use the
            split point of `model_with_split_points` if a single one is defined.
        device (torch.device | str): Device to use for the `concept_module`.
        **kwargs (dict): Additional keyword arguments to pass to the `concept_module`.
            See the Overcomplete documentation of the provided `concept_model_class` for more details.
    """
    concept_model = self.concept_model_class(
        nb_concepts=nb_concepts,
        device=device,  # type: ignore
        **kwargs,
    )
    super().__init__(model_with_split_points, concept_model, split_point)

fit

fit(activations, *, overwrite=False, **kwargs)

Fit an Overcomplete OptimDictionaryLearning model on the given activations.

Parameters:

Name Type Description Default

activations

Tensor | dict[str, Tensor]

The activations used for fitting the concept_model. If a dictionary is provided, the activation corresponding to split_point will be used.

required

overwrite

bool

Whether to overwrite the current model if it has already been fitted. Default: False.

False

**kwargs

dict

Additional keyword arguments to pass to the concept_model. See the Overcomplete documentation of the provided concept_model for more details.

{}
Source code in interpreto/concepts/methods/overcomplete.py
def fit(self, activations: LatentActivations | dict[str, LatentActivations], *, overwrite: bool = False, **kwargs):
    """Fit an Overcomplete OptimDictionaryLearning model on the given activations.

    Args:
        activations (torch.Tensor | dict[str, torch.Tensor]): The activations used for fitting the `concept_model`.
            If a dictionary is provided, the activation corresponding to `split_point` will be used.
        overwrite (bool): Whether to overwrite the current model if it has already been fitted.
            Default: False.
        **kwargs (dict): Additional keyword arguments to pass to the `concept_model`.
            See the Overcomplete documentation of the provided `concept_model` for more details.
    """
    split_activations = self._prepare_fit(activations, overwrite=overwrite)
    self.concept_model.fit(split_activations, **kwargs)

encode_activations

encode_activations(activations)

Encode the given activations using the concept_model encoder.

Parameters:

Name Type Description Default

activations

LatentActivations

The activations to encode.

required

Returns:

Type Description
Tensor

The encoded concept activations.

Source code in interpreto/concepts/base.py
@check_fitted
def encode_activations(self, activations: LatentActivations) -> torch.Tensor:  # ConceptsActivations
    """Encode the given activations using the `concept_model` encoder.

    Args:
        activations (LatentActivations): The activations to encode.

    Returns:
        The encoded concept activations.
    """
    if hasattr(self.concept_model, "device") and self.concept_model.device != activations.device:
        activations = activations.to(self.concept_model.device, non_blocking=True)
        self.concept_model.to(activations.device)
    return self.concept_model.encode(activations)  # type: ignore

decode_concepts

decode_concepts(concepts)

Decode the given concepts using the concept_model decoder.

Parameters:

Name Type Description Default

concepts

ConceptsActivations

The concepts to decode.

required

Returns:

Type Description
Tensor

The decoded model activations.

Source code in interpreto/concepts/base.py
@check_fitted
def decode_concepts(self, concepts: ConceptsActivations) -> torch.Tensor:  # LatentActivations
    """Decode the given concepts using the `concept_model` decoder.

    Args:
        concepts (ConceptsActivations): The concepts to decode.

    Returns:
        The decoded model activations.
    """
    if hasattr(self.concept_model, "device") and self.concept_model.device != concepts.device:
        concepts = concepts.to(self.concept_model.device, non_blocking=True)
        self.concept_model.to(concepts.device)
    return self.concept_model.decode(concepts)  # type: ignore

get_dictionary

get_dictionary()

Get the dictionary learned by the fitted concept_model.

Returns:

Type Description
Tensor

torch.Tensor: A torch.Tensor containing the learned dictionary.

Source code in interpreto/concepts/base.py
@check_fitted
def get_dictionary(self) -> torch.Tensor:  # TODO: add this to tests
    """Get the dictionary learned by the fitted `concept_model`.

    Returns:
        torch.Tensor: A `torch.Tensor` containing the learned dictionary.
    """
    return self.concept_model.get_dictionary()  # type: ignore

interpret

interpret(*args, **kwargs)

Deprecated API for concept interpretation.

Interpretation methods should now be instantiated directly with the fitted concept explainer. For example:

TopKInputs(concept_explainer).interpret(inputs, latent_activations)

This method is kept only for backwards compatibility and will always raise a :class:NotImplementedError.

Source code in interpreto/concepts/base.py
@check_fitted
def interpret(self, *args, **kwargs) -> Mapping[int, Any]:  # TODO: 0.5.0 remove
    """Deprecated API for concept interpretation.

    Interpretation methods should now be instantiated directly with the
    fitted concept explainer. For example:

    ``TopKInputs(concept_explainer).interpret(inputs, latent_activations)``

    This method is kept only for backwards compatibility and will always
    raise a :class:`NotImplementedError`.
    """
    raise NotImplementedError("Use the new API: TopKInputs(concept_explainer).interpret(...).")

concept_output_gradient

Compute the gradients of the predictions with respect to the concepts.

To clarify what this function does, lets detail some notations. Suppose the initial model was splitted such that \(f = g \circ h\). Hence the concept model was fitted on \(A = h(X)\) with \(X\) a dataset of samples. The resulting concept model encoders and decoders are noted \(t\) and \(t^{-1}\). \(t\) can be seen as projections from the latent space to the concept space. Hence, the function going from the inputs to the concepts is \(f_{ic} = t \circ h\) and the function going from the concepts to the outputs is \(f_{co} = g \circ t^-1\).

Given a set of samples \(X\), and the functions \((h, t, t^{-1}, g)\) This function first compute \(C = t(A) = t \circ h(X)\), then returns \(\nabla{f_{co}}(C)\).

In practice all computations are done by ModelWithSplitPoints._get_concept_output_gradients, which relies on NNsight. The current method only forwards the \(t\) and \(t^{-1}\), respectively self.encode_activations and self.decode_concepts methods.

Parameters:

Name Type Description Default

inputs

list[str] | Tensor | BatchEncoding

The input data, either a list of samples, the tokenized input or a batch of samples.

required

targets

list[int] | None

Specify which outputs of the model should be used to compute the gradients. Note that \(f_{co}\) often has several outputs, by default gradients are computed for each output. The t dimension of the returned tensor is equal to the number of selected targets. (For classification, those are the classes logits and for generation, those are the most probable tokens probabilities).

None

split_point

str | None

The split point used to train the concept_model. If None, tries to use the split point of model_with_split_points if a single one is defined.

None

activation_granularity

ActivationGranularity

The granularity of the activations to use for the attribution. It is highly recommended to to use the same granularity as the one used in the fit method. Possibles values are:

  • ModelWithSplitPoints.activation_granularities.CLS_TOKEN: only the first token (e.g. [CLS]) activation is returned (batch, d_model).

  • ModelWithSplitPoints.activation_granularities.ALL_TOKENS: every token activation is treated as a separate element (batch x seq_len, d_model).

  • ModelWithSplitPoints.activation_granularities.TOKEN: remove special tokens.

  • ModelWithSplitPoints.activation_granularities.WORD: aggregate by words following the split defined by :class:~interpreto.commons.granularity.Granularity.WORD.

  • ModelWithSplitPoints.activation_granularities.SENTENCE: aggregate by sentences following the split defined by :class:~interpreto.commons.granularity.Granularity.SENTENCE. Requires spacy to be installed.

TOKEN

aggregation_strategy

GranularityAggregationStrategy

Strategy to aggregate token activations into larger inputs granularities. Applied for WORD and SENTENCE activation strategies. Token activations of shape n * (l, d) are aggregated on the sequence length dimension. The concatenated into (ng, d) tensors. Existing strategies are:

  • ModelWithSplitPoints.aggregation_strategies.SUM: Tokens activations are summed along the sequence length dimension.

  • ModelWithSplitPoints.aggregation_strategies.MEAN: Tokens activations are averaged along the sequence length dimension.

  • ModelWithSplitPoints.aggregation_strategies.MAX: The maximum of the token activations along the sequence length dimension is selected.

  • ModelWithSplitPoints.aggregation_strategies.SIGNED_MAX: The maximum of the absolute value of the activations multiplied by its initial sign. signed_max([[-1, 0, 1, 2], [-3, 1, -2, 0]]) = [-3, 1, -2, 2]

MEAN

concepts_x_gradients

bool

If the resulting gradients should be multiplied by the concepts activations. True by default (similarly to attributions), because of mathematical properties. Therefore the out put is \(C * \nabla{f_{co}}(C)\).

True

normalization

bool

Whether to normalize the gradients. Gradients will be normalized on the concept (c) and sequence length (g) dimensions. Such that for a given sample-target-granular pair, the sum of the absolute values of the gradients is equal to 1. (The granular elements depend on the :arg:activation_granularity).

True

tqdm_bar

bool

Whether to display a progress bar.

False

batch_size

int | None

Batch size for the model. It might be different from the one used in ModelWithSplitPoints.get_activations because gradients have a much larger impact on the memory.

None

Returns:

Type Description
list[Float[Tensor, 't g c']]

list[Float[torch.Tensor, "t g c"]]: The gradients of the model output with respect to the concept activations. List length: correspond to the number of inputs. Tensor shape: (t, g, c) with t the target dimension, g the number of granularity elements in one input, and c the number of concepts.

Source code in interpreto/concepts/base.py
@check_fitted
def concept_output_gradient(
    self,
    inputs: torch.Tensor | list[str] | BatchEncoding,
    targets: list[int] | None = None,
    split_point: str | None = None,
    activation_granularity: ActivationGranularity = ActivationGranularity.TOKEN,
    aggregation_strategy: GranularityAggregationStrategy = GranularityAggregationStrategy.MEAN,
    concepts_x_gradients: bool = True,
    normalization: bool = True,
    tqdm_bar: bool = False,
    batch_size: int | None = None,
) -> list[Float[torch.Tensor, "t g c"]]:
    """
    Compute the gradients of the predictions with respect to the concepts.

    To clarify what this function does, lets detail some notations.
    Suppose the initial model was splitted such that $f = g \\circ h$.
    Hence the concept model was fitted on $A = h(X)$ with $X$ a dataset of samples.
    The resulting concept model encoders and decoders are noted $t$ and $t^{-1}$.
    $t$ can be seen as projections from the latent space to the concept space.
    Hence, the function going from the inputs to the concepts is $f_{ic} = t \\circ h$
    and the function going from the concepts to the outputs is $f_{co} = g \\circ t^-1$.

    Given a set of samples $X$, and the functions $(h, t, t^{-1}, g)$
    This function first compute $C = t(A) = t \\circ h(X)$, then returns $\\nabla{f_{co}}(C)$.

    In practice all computations are done by `ModelWithSplitPoints._get_concept_output_gradients`,
    which relies on NNsight. The current method only forwards the $t$ and $t^{-1}$,
    respectively `self.encode_activations` and `self.decode_concepts` methods.

    Args:
        inputs (list[str] | torch.Tensor | BatchEncoding):
            The input data, either a list of samples, the tokenized input or a batch of samples.

        targets (list[int] | None):
            Specify which outputs of the model should be used to compute the gradients.
            Note that $f_{co}$ often has several outputs, by default gradients are computed for each output.
            The `t` dimension of the returned tensor is equal to the number of selected targets.
            (For classification, those are the classes logits and for generation, those are the most probable tokens probabilities).

        split_point (str | None):
            The split point used to train the `concept_model`.
            If None, tries to use the split point of `model_with_split_points` if a single one is defined.

        activation_granularity (ActivationGranularity):
            The granularity of the activations to use for the attribution.
            It is highly recommended to to use the same granularity as the one used in the `fit` method.
            Possibles values are:

            - ``ModelWithSplitPoints.activation_granularities.CLS_TOKEN``:
                only the first token (e.g. ``[CLS]``) activation is returned ``(batch, d_model)``.

            - ``ModelWithSplitPoints.activation_granularities.ALL_TOKENS``:
                every token activation is treated as a separate element ``(batch x seq_len, d_model)``.

            - ``ModelWithSplitPoints.activation_granularities.TOKEN``: remove special tokens.

            - ``ModelWithSplitPoints.activation_granularities.WORD``:
                aggregate by words following the split defined by
                :class:`~interpreto.commons.granularity.Granularity.WORD`.

            - ``ModelWithSplitPoints.activation_granularities.SENTENCE``:
                aggregate by sentences following the split defined by
                :class:`~interpreto.commons.granularity.Granularity.SENTENCE`.
                Requires `spacy` to be installed.

        aggregation_strategy:
            Strategy to aggregate token activations into larger inputs granularities.
            Applied for `WORD` and `SENTENCE` activation strategies.
            Token activations of shape  n * (l, d) are aggregated on the sequence length dimension.
            The concatenated into (ng, d) tensors.
            Existing strategies are:

            - ``ModelWithSplitPoints.aggregation_strategies.SUM``:
                Tokens activations are summed along the sequence length dimension.

            - ``ModelWithSplitPoints.aggregation_strategies.MEAN``:
                Tokens activations are averaged along the sequence length dimension.

            - ``ModelWithSplitPoints.aggregation_strategies.MAX``:
                The maximum of the token activations along the sequence length dimension is selected.

            - ``ModelWithSplitPoints.aggregation_strategies.SIGNED_MAX``:
                The maximum of the absolute value of the activations multiplied by its initial sign.
                signed_max([[-1, 0, 1, 2], [-3, 1, -2, 0]]) = [-3, 1, -2, 2]

        concepts_x_gradients (bool):
            If the resulting gradients should be multiplied by the concepts activations.
            True by default (similarly to attributions), because of mathematical properties.
            Therefore the out put is $C * \\nabla{f_{co}}(C)$.

        normalization (bool):
            Whether to normalize the gradients.
            Gradients will be normalized on the concept (c) and sequence length (g) dimensions.
            Such that for a given sample-target-granular pair,
            the sum of the absolute values of the gradients is equal to 1.
            (The granular elements depend on the :arg:`activation_granularity`).

        tqdm_bar (bool):
            Whether to display a progress bar.

        batch_size (int | None):
            Batch size for the model.
            It might be different from the one used in `ModelWithSplitPoints.get_activations`
            because gradients have a much larger impact on the memory.

    Returns:
        list[Float[torch.Tensor, "t g c"]]:
            The gradients of the model output with respect to the concept activations.
            List length: correspond to the number of inputs.
                Tensor shape: (t, g, c) with t the target dimension, g the number of granularity elements in one input, and c the number of
                concepts.
    """
    if not self.has_differentiable_concept_decoder:
        raise ValueError(
            "The concept decoder of this explainer is not differentiable. This is required to compute concept-to-output gradients. "
            f"Current explainer class: {self.__class__.__name__}."
        )

    # put everything on device
    self.concept_model.to(self.model_with_split_points.device)

    # forward all computations to
    gradients = self.model_with_split_points._get_concept_output_gradients(
        inputs=inputs,
        targets=targets,
        encode_activations=self.encode_activations,
        decode_concepts=self.decode_concepts,
        split_point=split_point,
        activation_granularity=activation_granularity,
        aggregation_strategy=aggregation_strategy,
        concepts_x_gradients=concepts_x_gradients,
        tqdm_bar=tqdm_bar,
        batch_size=batch_size,
    )

    # normalize the gradients if required
    if normalization:
        gradients = [self.__normalize_gradients(g) for g in gradients]
    return gradients

List of available methods

interpreto.concepts.ConvexNMFConcepts

ConvexNMFConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: DictionaryLearningExplainer[ConvexNMF]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the ConvexNMF from Ding et al. (2008)1 as concept model.

ConvexNMF implementation from overcomplete.optimization.ConvexNMF class.


  1. C. H. Q. Ding, T. Li and M. I. Jordan, Convex and Semi-Nonnegative Matrix Factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 2010, pp. 45-55 

Parameters:

Name Type Description Default
model_with_split_points ModelWithSplitPoints

The model to apply the explanation on. It should have at least one split point on which a concept explainer can be trained.

required
nb_concepts int

Size of the SAE concept space.

required
split_point str | None

The split point used to train the concept_model. If None, tries to use the split point of model_with_split_points if a single one is defined.

None
device device | str

Device to use for the concept_module.

'cpu'
**kwargs dict

Additional keyword arguments to pass to the concept_module. See the Overcomplete documentation of the provided concept_model_class for more details.

{}

interpreto.concepts.DictionaryLearningConcepts

DictionaryLearningConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: DictionaryLearningExplainer[SkDictionaryLearning]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the Dictionary Learning concepts from Mairal et al. (2009)1 as concept model.

Dictionary Learning implementation from overcomplete.optimization.SkDictionaryLearning class.


  1. J. Mairal, F. Bach, J. Ponce, G. Sapiro, Online dictionary learning for sparse coding Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp. 689-696. 

interpreto.concepts.ICAConcepts

ICAConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: SkLearnWrapperExplainer[ICAWrapper]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the ICA from Hyvarinen and Oja (2000)1 as concept model.


  1. A. Hyvarinen and E. Oja, Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5), 2000, pp. 411-430. 

interpreto.concepts.KMeansConcepts

KMeansConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: SkLearnWrapperExplainer[KMeansWrapper]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the K-Means as concept model.

interpreto.concepts.NMFConcepts

NMFConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', force_relu=False, **kwargs)

Bases: DictionaryLearningExplainer[NMF]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the NMF from Lee and Seung (1999)1 as concept model.

NMF implementation from overcomplete.optimization.NMF class.


  1. Lee, D., Seung, H. Learning the parts of objects by non-negative matrix factorization. Nature, 401, 1999, pp. 788–791. 

Parameters:

Name Type Description Default
model_with_split_points ModelWithSplitPoints

The model to apply the explanation on. It should have at least one split point on which a concept explainer can be trained.

required
nb_concepts int

Size of the SAE concept space.

required
split_point str | None

The split point used to train the concept_model. If None, tries to use the split point of model_with_split_points if a single one is defined.

None
device device | str

Device to use for the concept_module.

'cpu'
force_relu bool

Whether to force the activations to be positive.

False
**kwargs dict

Additional keyword arguments to pass to the concept_module. See the Overcomplete documentation of the provided concept_model_class for more details.

{}

encode_activations

encode_activations(activations)

Encode the given activations using the concept_model encoder.

Parameters:

Name Type Description Default
activations LatentActivations

The activations to encode.

required

Returns:

Type Description
Tensor

The encoded concept activations.

interpreto.concepts.PCAConcepts

PCAConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: SkLearnWrapperExplainer[PCAWrapper]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the PCA from Pearson (1901)1 as concept model.


  1. K. Pearson, On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 2(11), 1901, pp. 559-572. 

interpreto.concepts.SemiNMFConcepts

SemiNMFConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: DictionaryLearningExplainer[SemiNMF]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with the SemiNMF from Ding et al. (2008)1 as concept model.

SemiNMF implementation from overcomplete.optimization.SemiNMF class.


  1. C. H. Q. Ding, T. Li and M. I. Jordan, Convex and Semi-Nonnegative Matrix Factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), 2010, pp. 45-55 

interpreto.concepts.SparsePCAConcepts

SparsePCAConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: DictionaryLearningExplainer[SkSparsePCA]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with SparsePCA as concept model.

SparsePCA implementation from overcomplete.optimization.SkSparsePCA class.

interpreto.concepts.SVDConcepts

SVDConcepts(model_with_split_points, *, nb_concepts, split_point=None, device='cpu', **kwargs)

Bases: SkLearnWrapperExplainer[SVDWrapper]

Code: concepts/methods/overcomplete.py

ConceptAutoEncoderExplainer with SVD as concept model.