📍 Interpreto Roadmap¶

Welcome to the roadmap of Interpreto 🪄. This document outlines the planned features and improvements for upcoming releases.

🧭 Upcoming Features¶

1. Evaluation Metrics for Attribution Methods¶

We plan to integrate a set of standardized evaluation metrics to assess the quality and reliability of attribution methods. We'll start by adding the insertion/deletion metrics.

2. Gradient-based Attribution: Token and Word Granularity¶

We aim to enhance our gradient-based attribution methods (e.g., Integrated Gradients, Saliency) by adding:

Token-level attribution
Word-level attribution

We currently have these attribution methods only with the ALL_TOKENS granularity, which corresponds to having an attribution score for each token in the sentence (even special tokens). We want to align gradient-based methods with inference-based ones.

3. Integration of VarGrad¶

We will implement VarGrad, a variance-based extension of gradient attribution. This method improves robustness by computing the variance of attributions across multiple noisy forward passes, capturing uncertainty and stabilizing explanations.

4. LLM-Based Concept Interpretation: Theme Prediction¶

We will introduce a novel interpretation method for concept vectors using Large Language Models (LLMs):

The "theme prediction" method will generate human-readable summaries of concept vectors by prompting an LLM with Top-K examples (words, tokens, or sentences)
This offers a natural language explanation of abstract concepts, improving transparency and user comprehension

🙌 Contribute¶

Want to help shape the future of Interpreto? Check out our contributing guide and feel free to open an issue or pull request!