LLM Labels¶
LLMLabels uses a large language model to generate natural-language labels for each concept,
based on the top-k activating inputs. This provides a human-readable summary of what each concept represents.
API Reference¶
interpreto.concepts.interpretations.LLMLabels
¶
LLMLabels(*, concept_explainer, activation_granularity=None, aggregation_strategy=MEAN, llm_interface, concept_encoding_batch_size=1024, sampling_method=TOP, k_examples=30, k_context=0, use_vocab=False, use_unique_words=0, unique_words_kwargs={}, k_quantile=5, system_prompt=None)
Bases: BaseConceptInterpretationMethod
Code concepts/interpretations/llm_labels.py
Implement the automatic labeling method using a language model (LLM) to provide a short textual description given some examples of what activate the concept. This method was first introduced in 1, we implement here the step 1 of the method.
-
Steven Bills, Nick Cammarata, Dan Mossing, Henk Tillman, Leo Gao, Gabriel Goh, Ilya Sutskever, Jan Leike, Jeff Wu, William Saunders* Language models can explain neurons in language models 2023. ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ConceptEncoderExplainer
|
The fitted concept explainer used for encoding activations. |
required |
|
ActivationGranularity
|
The granularity of the activations to use for the interpretation.
See :method: |
None
|
|
GranularityAggregationStrategy
|
The aggregation strategy to use for the activations.
See :method: |
MEAN
|
|
LLMInterface
|
The LLM interface to use for the interpretation. |
required |
|
int
|
The batch size to use for the concept encoding. |
1024
|
|
SAMPLING_METHOD
|
The method to use for sampling the inputs provided to the LLM. |
TOP
|
|
int
|
The number of inputs to use for the interpretation. |
30
|
|
int
|
The number of context tokens to use around the concept tokens.
In the prompt, in the examples, the k context tokens before and after the concept token are selected.
It is recommended to set it to between 5 and 10 for TOKEN and WORD granularities.
However, if the granularity is CLS_TOKEN or SAMPLE,
or |
0
|
|
bool
|
If True, the interpretation will be computed from the vocabulary of the model. |
False
|
|
bool
|
If True, the interpretation will be computed from the unique words of the inputs.
Incompatible with |
0
|
|
dict
|
The kwargs to pass to the |
{}
|
|
int
|
The number of quantiles to use for sampling the inputs, if |
5
|
|
str | None
|
The system prompt to use for the LLM. If None, a default prompt is used. |
None
|
Source code in interpreto/concepts/interpretations/llm_labels.py
interpret
¶
interpret(concepts_indices, inputs=None, latent_activations=None, concepts_activations=None)
Give the interpretation of the concepts dimensions in the latent space into a human-readable format.
The interpretation is a mapping between the concepts indices and a short textual description.
The granularity of input examples is determined by the activation_granularity class attribute.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
int | list[int] | Literal['all']
|
The indices of the concepts to interpret. If "all", all concepts are interpreted. |
required |
|
list[str] | None
|
The inputs to use for the interpretation.
Necessary if not |
None
|
|
Float[Tensor, 'nl d'] | None
|
The latent activations matching the inputs. If not provided, it is computed from the inputs. |
None
|
|
Float[Tensor, 'nl cpt'] | None
|
The concepts activations matching the inputs. If not provided, it is computed from the inputs or latent activations. |
None
|
Returns:
| Type | Description |
|---|---|
Mapping[int, str | None]
|
Mapping[int, str | None]: The textual labels of the concepts indices. |
Source code in interpreto/concepts/interpretations/llm_labels.py
interpreto.commons.llm_interface.LLMInterface
¶
interpreto.concepts.interpretations.extract_ngrams
¶
extract_ngrams(inputs, n=1, count_min_threshold=1, return_counts=False, lemmatize=False, words_to_ignore=None)
Extract n-grams (from 1-gram up to n-gram of words) from a list of texts.
If n=3, it extracts 1-grams, 2-grams, and 3-grams.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
Iterable[str]
|
The texts to extract n-grams from. |
required |
|
int
|
The maximum n-gram size. All sizes from 1 to n are extracted. |
1
|
|
int
|
The minimum total number of occurrences of an n-gram in the whole |
1
|
|
bool
|
Whether to return the counts of each n-gram. Defaults to False. |
False
|
|
bool
|
Whether to lemmatize words before counting. |
False
|
|
list[str] | None
|
A list of words to ignore (applied to individual tokens before forming n-grams). |
None
|
Returns:
| Type | Description |
|---|---|
list[str] | Counter[str]
|
list[str] | Counter[str]: The list of unique n-grams or the counts of each n-gram. |