ConSim
interpreto.concepts.metrics.ConSim
¶
ConSim(model_with_split_points, user_llm, activation_granularity, classes=None, split_point=None)
Code: concepts/metrics/consim.py
ConSim for Concept-based Simulatability. Was introduced by Poché et al. in 2025[^1].
It evaluates all three components of the concept-based explanation:
-
the concepts space
-
the concepts interpretation
-
the concepts importance
To evaluate explanations on a given model \(f\), ConSim evaluates to which extent explanations help a meta-predictor \(\Psi\) to simulate the predictions of the model \(f\).
In our case, the role of the meta-predictor will be played by user_llm, and interface calling
a model either from local, or from a remote API, such as OpenAI or HuggingFace.
Therefore, most of the code correspond to building the prompts for the LLM.
There are three steps to ConSim:
-
Step 0: Instantiate the ConSim metric with the
model_with_split_points(\(f\)) and theuser_llm(\(\Psi\)). -
Step 1: Select interesting examples for ConSim with the
select_examplesmethod. Samples are selected to see how well \(\Psi\) can simulate \(f\). Thus there are samples for every classes and many initial errors from \(f\). -
Step 2: Evaluate the ConSim score with the
evaluatemethod. It is an accuracy score between \(\Psi\) and \(f\) predictions. But we selected interesting examples, so it cannot be compared to a natural accuracy on the dataset. Therefore, we need to compare it to a baseline ().
Tip
We highly recommend to do the steps 1 and 2 several times with different seeds to get more statistically significant results. The initial papers[^1] used five different seeds..
-
A. Poché, A. Jacovi, A.M. Picard, V. Boutin, and F. Jourdan. ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability. In the Proceedings of the 2025 Association for Computational Linguistics (ACL). ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
ModelWithSplitPoints
|
ModelWithSplitPoints The model to explain. Is is a wrapper around a model and a tokenizer to easily get activations. |
required |
|
LLMInterface | None
|
LLMInterface | None
The LLM interface that will serve as the meta-predictor.
If not provided the user will have to call the ConSim prompts manually.
If your preferred LLM API is not supported, you can implement your own LLM interface.
You just have to implement the The format of the prompt is:
|
required |
|
ActivationGranularity
|
ActivationGranularity The granularity of the activations to use for the explanations. |
required |
|
list[str] | None
|
list[str] | None The names of classes of the dataset. |
None
|
|
str | None
|
str Where to split the model to explain. |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
classes |
list[str] | None
|
list[str] | None The names of classes of the dataset. |
prompt_types |
type[PromptTypes]
|
PromptTypes Enum of the possible prompts types to use. |
model_with_split_points |
ModelWithSplitPoints The model to explain. Is is a wrapper around a model and a tokenizer to easily get activations. |
|
split_point |
str
|
str Where to split the model to explain. |
user_llm |
LLMInterface | None
|
LLMInterface | None
The LLM interface that will serve as the meta-predictor.
If your preferred LLM API is not supported, you can implement your own LLM interface.
You just have to implement the The format of the prompt is:
|
TODO
validate example in practice
Examples:
Preamble to a metric, fit a concept explainer:
>>> import datasets
>>> from interpreto import ConSim, ModelWithSplitPoints, ICAConcepts, OpenAILLM
>>>
>>> # ------------------------
>>> # Load a model and wrap it
>>> model_with_split_points = ModelWithSplitPoints(
... "textattack/bert-base-uncased-ag-news",
... split_points=["bert.encoder.layer.10.output"],
... model_autoclass=AutoModelForSequenceClassification, # type: ignore
... batch_size=4,
... )
>>>
>>> # --------------------------------------
>>> # Load a dataset and compute activations
>>> dataset = datasets.load_dataset("fancyzhx/ag_news")
>>> classes = ["World", "Sports", "Business", "Sci/Tech"]
>>> activations = model_with_split_points.get_activations(dataset["train"]["text"])
>>>
>>> # -------------------------
>>> # Fit the concept explainer
>>> concept_explainer_1 = ICAConcepts(model_with_split_points, nb_concepts=50)
>>> concept_explainer.fit(activations)
The two steps of ConSim:
>>> # ------------------------------------------------------------------
>>> # Step 0: Define the User-LLM and instantiate the ConSim metric
>>> user_llm = OpenAILLM(api_key="YOUR_OPENAI_API_KEY", model="gpt-4.1-nano")
>>> consim = ConSim(
... model_with_split_points,
... user_llm,
... activation_granularities=ModelWithSplitPoints.activation_granularities.TOKEN,
... classes=classes,
... )
>>>
>>> # ----------------------------------------------
>>> # Step 1: Select interesting examples for ConSim
>>> samples, labels, predictions = consim.select_examples(
... dataset["train"]["text"], dataset["train"]["label"],
... )
>>>
>>> # -------------------------------------------------------------
>>> # Step 2: Evaluate the ConSim score, do not forget the baseline
>>> baseline = consim.evaluate(samples, labels, predictions, prompt_type=PromptTypes.L2_baseline_with_lp)
>>> consim_score = consim.evaluate(samples, labels, predictions, concept_explainer_1, prompt_type=PromptTypes.E3_global_and_local_concepts_with_lp)
Source code in interpreto/concepts/metrics/consim.py
select_examples
¶
select_examples(inputs, labels, nb_lp_samples=20, nb_ep_samples=20, seed=0, batch_size=64, device=None)
Select examples for the ConSim metric. It first computes the models' predictions on the inputs.
Then, it selects nb_lp_samples + nb_ep_samples samples from the inputs.
The goal is to select uniformly between each class (with respect to the labels).
There should be as many samples where the initial model prediction are correct as incorrect.
The samples are then randomly shuffled.
The first nb_lp_samples samples are selected for the learning phase.
The last nb_ep_samples samples are selected for the evaluation phase.
Therefore, there is no guarantee on the repartition inside learning and evaluation phase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
list[str]
|
list[str] The inputs to predict. |
required |
|
Tensor
|
torch.Tensor The labels of the inputs. |
required |
|
int
|
int The number of samples to select for the learning phase. |
20
|
|
int
|
int The number of samples to select for the evaluation phase. |
20
|
|
int
|
int The seed to use for the random selection. |
0
|
|
int
|
int The batch size to use for the predictions. |
64
|
|
device | str | None
|
torch.device | str | None The device to use for the predictions. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
interesting_samples |
list[str]
|
list[str] The interesting samples. |
labels |
Tensor
|
torch.Tensor The labels of the interesting samples. |
predictions |
Tensor
|
torch.Tensor The predictions of the model on the interesting samples. |
Source code in interpreto/concepts/metrics/consim.py
evaluate
¶
evaluate(interesting_samples, predictions, concept_explainer=None, concepts_interpretation=None, global_importances=None, prompt_type=E3_global_and_local_concepts_with_lp, anonymize_classes=False, importance_threshold=0.05)
Evaluate the ConSim metric, thus the accuracy of the user_llm predictions with respect to the model predictions.
First local concepts importances are computed via the concept_explainer.
Then a prompt is constructed by integrating all the different elements and following the prompt_type.
The prompt is sent to the user_llm and the model predictions are extracted from the response.
Finally, the score is computed by comparing the model predictions with the user_llm predictions.
The prompts have five parts:
-
Initial Phase (IP.1) the first part is the task description, which is a list of questions to ask the LLM.
-
Initial Phase (IP.2) the second is a global concepts explanation on \(f\). Listing the important concepts for each class.
-
Learning Phase (LP.1) the third gives examples of samples and predictions from the model \(f\).
-
Learning Phase (LP.2) the fourth is a local concepts explanation on \(f\). Listing the important concepts in each example.
-
Evaluation Phase (EP.1) the fifth is a list of samples on which the meta-predictor \(\Psi\) will be asked to predict the model \(f\) predictions.
The answer of the LLM will be a list of predictions for each sample. ConSim compares these predictions to the model \(f\) predictions and computes the accuracy of the explanations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
|
list[str]
|
list[str] The interesting samples. |
required |
|
Tensor
|
torch.Tensor The predictions of the model on the interesting samples. |
required |
|
ConceptAutoEncoderExplainer | None
|
ConceptAutoEncoderExplainer | None The concept explainer. Can be None for the baseline. |
None
|
|
dict[int, str] | None
|
dict[int, str] | None The words that activate the concepts the most and the least. A dictionary with the concepts as keys and another dictionary as values. The inner dictionary has the words as keys and the activations as values. Can be None for the baseline. |
None
|
|
dict[str, dict[int, float]] | None
|
dict[str, dict[int, float]] | None The importance of the concepts for each class. A dictionary with the classes as keys and another dictionary as values. The inner dictionary has the concepts as keys and the importance as values. Can be None for the baseline. |
None
|
|
PromptTypes
|
PromptTypes The type of prompt to use. Possible values are:
|
E3_global_and_local_concepts_with_lp
|
|
bool
|
bool Whether to anonymize the classes. Class names will be replaced by "Class_i" where i is the index of the class. It prevents the user-llm to solve the task by simply guessing the class. |
False
|
|
float
|
float The threshold to select the most important concepts for each class. The threshold correspond to the cumulative importance of the concepts to keep. |
0.05
|
Returns:
| Type | Description |
|---|---|
float | None | tuple[list[tuple[Role, str]], list[str]]
|
score or prompts and model predictions: float | None | tuple[list[tuple[Role, str]], list[str]] Possible outputs:
|
Source code in interpreto/concepts/metrics/consim.py
1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 | |
interpreto.concepts.metrics.consim.PromptTypes
¶
Bases: Enum
There are six types of prompts, including two baselines and an upper bond:
Attributes:
| Name | Type | Description |
|---|---|---|
`L1_baseline_without_lp` |
IP.1 and EP.1 are included in the prompt. Only the task description, but explanations or learning phase. |
|
`E1_global_concepts_without_lp` |
IP.1, IP.2, and EP.1 are included in the prompt. Only task description and global concepts explanation, but no learning phase. |
|
`L2_baseline_with_lp` |
IP.1, LP.1, and EP.1 are included in the prompt. Task description and learning phase, but no explanations. |
|
`E2_global_concepts_with_lp` |
IP.1, IP.2, LP.1, and EP.1 are included in the prompt. Task description, global concepts explanation, and learning phase. But no local concepts explanation. |
|
`E3_global_and_local_concepts_with_lp` |
IP.1, IP.2, LP.1, LP.2, and EP.1 are included in the prompt. Task description, learning phase, and both global and local concepts explanation. |
|
`U1_upper_bound_concepts_at_ep` |
IP.1, IP.2, LP.1, LP.2, EP.1, and EP.2 are included in the prompt.
Same as |