Given a classifier that outputs a decision for an instance , a counterfactual explanation consists of an instance such that the decision for on is different from (i.e. ) and such that the difference between and is minimal. ^[Guidotti - Counterfactual explanations and how to find them - literature review and benchmarking]
Note
Minimality can have different meanings according to the setting.
A counterfactual explainer, on the other hand, is a function that takes as input a classifier , a set of known instances, which can be seen as the dataset and a given instance and it returns a set of valid counterfactual examples, where is the number of counterfactual required. ^[Guidotti - Counterfactual explanations and how to find them - literature review and benchmarking].
Counterfactual explanations can be exploited to interpret the decisions returned by AI systems employed in various settings, such as classification, regression, knowledge engineering, planning and recommendation.
In order for a counterfactual to be good, it should satisfy a set of properties, like plausibility, similarity, diversity etc.
Properties of counterfactual explainers
- Efficiency
- Stability
- Fairness
Categorization of counterfactual explainers
#todo Insert this into Case-based vs Instance-based Counterfactual Generation.
Depending on the technique used to generate the counterfactuals, it’s possible to categorize the explainers in: #todo
- Optimization-based:
- Heuristic Search Strategy-based
- Instance-based
- Decision Tree-based
Counterfactuals can be distinguished between:
- Model Agnostic, if the explainer can take in input any black-box model;
- Model Specific, if the explainer is only able to explain a specific black-box model.
and also:
- Data Agnostic, if the explainer can ingest any type of data
- Data Specific, if the explainer can only ingest some specific types of data, such as images, text or tables.
Explainers are also divided into:
- Endogenous explainers, which return examples that are selected from the given dataset, or that use feature values from the given dataset;
- Exogenous explainers, which do not guarantee the presence of the generated feature values in the dataset, but they rely on instances which are obtained with interpolation of data and/or random data generation.
Techniques
Brute Force
Note for myself
In Thesis Notes it’s possible to see the brute force approach for sequential counterfactual examples, while in page=17 it’s possible to see the same approach but for non sequential counterfactuals. See if you need to insert them here to have a comparison or something.
RCE - Completely Pure Random
This technique is an alternative to brute force and consists in randomly varying random selected features and return the counterfactual if it’s valid. The approach hasn’t any guarantee of optimality.