Machine Learning Explanations: LIME framework

Machine Learning Explanations:
LIME framework
Giorgio Visani

About Me
Giorgio Visani
PhD Student @ Bologna University, Computer Science & Engineering
Department (DISI)
Data Scientist @ Crif S.p.A.
Find me on: Linkedè Bologna University GitHub ¥

Why do we need Explanations?
Difficult to understand on what grounds the algorithm took the
decision.
Right to Explanation Concept: each individual affected by
Algorithm’s decisions have the right to know the model’s rationale.
Especially in Europe, there are quite strict regulatory requirements for
using Machine Learning models in sensitive fields:
• GDPR [6]
• ”Ethical Guidelines for trustworthy AI” [5]
• Report from the ”European Banking Authority” [1]
Other purposes of explanations:
• Help Data Scientists understand better how the model behaves
(eg. for better parameter tuning)
• Interpret the powerful patterns discovered by the Machine
Learning models (in order to have a better uderstanding of the
phenomenon)

Background on Explanations
LIME
Generation Step
Weighting Step
Local Model Step
How to use LIME?

Models in the space
It is possible to code almost every information
into numbers
• Images: Intensity of the colour for each pixel
• Words: transform them with word embedding
• Categories: Create a variable with one value per
each category
The Learning Model: Y = f(X1, · · · , Xp)
is a plane in the space of the variables
Encodes the relationship between the variable of
interest Y and the other variables X
Any Prediction technique builds the best function f(x) to approximate
our data, given the constraints. Some of them produce simple f(x)
and return the formula, the black-box techniques create very
complicated functions and do not provide the math formula.
Giorgio Visani 2 / 23

Machine vs Statistical Learning I
Why ML is more powerful than classical statistical learning?
Linear Regression
Statistical Learning
Constraints on the
shape
Logistic Regression
1D ML model
Machine Learning
more flexible, it adapts
better to the data
2D ML model
ML handles well Interactions and Correlation between variables

Machine vs Statistical Learning II
Statistical methods ML models
• Constraints on the shape
• Less powerful
they adapt less to the data
• Simplicity: simple surfaces
• Parametric: provide the f
formula, given by a set of
parameters.
Can understand how the function
behaves without looking at the
picture of the geometrical space!
• No constraints on the shape
• Better prediction: understand
correlation and interaction
• Complex functions
• Don’t have the f formula!
With 1/2 variable, We can draw
the f surface, when it is more we
have no way to understand the
surface

Interpretability Issue
Interpretability: “the ability to explain or to present the results, in
understandable terms, to a human” [4].
Why ML is difficult to Interpret?
• No formula for the surface
or too complex formula
• Just use the graph to understand it
no graph for more than two X variables Imagine such a difficult surface
on a space of 50 variables.
Impossible to represent

Interpretable Models
Decision Tree
Linear Regression

Interpretable Tools
Two main approaches:
• Transparent ML models
Build powerful models with simple to
understand formulas
• Post-Hoc Techniques
to be used on difficult black-box
models
In the Post-Hoc Methods, further sub-divisions:
Credits to [9]

Surrogate Models
Surrogate:
A simpler model that mimics the ML model on the geometrical space,
but remains understandable.
Global Surrogates mimic the ML model on the entire geometrical
space
Local Surrogates focus on a small region and approximate the ML
model only in that part.

Model Agnostic Techniques
Exploit the geometrical foundations of Prediction Tools:
each model (be it Statistical or Machine/Deep Learning) tries to
approximate an unknown function f(x) in the Rp+1
geometrical space
spanned by the p independent variables X and the dependent
variable Y.
The dataset observations are points lying on the surface f(x).
The model produces a function f0
(x), built using the dataset points
(y, x).
Model Agnostic Tools: the general idea is to gain insights about the
function f0
(x). This can be done for each model.

LIME
Model agnostic, Local technique, developed in 2016 [8]
Objective: find the tangent plane to the ML surface, in the point
(yi , xi ) we want to explain.
The tangent formula is human-understandable and it should be a good approximation
for the ML function in the neighbourhood of (yi , xi ) (Taylor Theorem)
Analytically unfeasible
• don’t have a parametric formulation of the ML function
• the ML surface may have a huge number of discontinuity points → non
differentiable
Solution: sample points on the ML surface,
approximate the tangent with a linear model
through the points (Ridge Regression), in the
neighbourhood of the reference individual.

LIME Intuition
LIME’s goal is to find the tangent at a precise point (the reference
individual).
From Taylor Theorem, we know that each function f can be approximated
using a polynomial. The approximation error depends on the distance from
the reference point and on the degree of the polynomial (higher degree
ensures lower error).
Taylor Polynomial of degree 1:
fTaylor
(x0) = f(x0) + f0
(x0)(x − x0) + O(x2
)
The tangent corresponds to the Taylor polynomial of degree 1, the
simplest. Since it is the lowest polynomial degree, to obtain a good
approximation we should consider a small region around the
reference point — the smaller the more accurate is the linear approximation
of f(x) —

All in all, it is just about computing the tangent.
Why LIME is not computing the tangent analytically?
Would require just to calculate the derivative of f(x). It is simpler (get rid of
the generation step) and less time-consuming.
Unfortunately, ML has no f(x) formula. Without the formula, it is
impossible to calculate the derivative!!
LIME reconstructs a part of the f(x) function, using the generation
step, and approximates its derivative with Linear Regression.

LIME in a nutshell

Generation Step
LIME generates n points x0
i all over the Rp
space of the X variables,
also in far away regions from our red point.
x0
i stands for the LIME generated points, while xi are the observations of the
original dataset.
We generate only the X values for the n points x0
i , but we miss the Y
value for the new units. So we plug each x0
i into the ML model and we
obtain its prediction for the new point: ŷ0
i . We actually generated a
brand-new dataset.

How LIME does the Sampling?
First of all, LIME standardizes all the features (using x−mean(x)
stdev(x) ).
Important to compare the variables
Then we sample from each variable separately, as if the variable is
Gaussian.
This assumption is not a problem, because it just influences how the points
are placed in the geometrical space. In particular we will have more points
around the variable mean, while the concentration decreases moving away
from it.
For Categorical variables, we sample the category ID randomly, the
probability of obtaining a given category is the same as in the original
dataset

Why don’t just generate points close to the reference?
This is a delicate subject.
In principle, it would be better to
consider only the points in the
region of interest, although the
proper size of the region is not
fixed but depends on the reference
point.
In fact, the neighborhood should
include all the linear area of the
ML curve around the reference
point, therefore it depends on the
local curvature of f(x).
Different points have different
proper size for the linear local
region.
Figura 1: The best neighborhood size
depends on the reference point and
the curvature of the ML function
around it.

Weighting Step
Since we are not interested in far-away points (LIME is a local
method), we must ignore them. How to do it? LIME gives a weight to
each generated point, using a Gaussian (RBF) Kernel.
RBF

x(i)

= exp −

x(i)
− x(ref)

2
kw
!

the Kernel Width parameter
The Gaussian Kernel attributes a
value in the range [0, 1], the higher
the closer to the reference point. The
kernel width kw parameter decides
how large is the circle of the
meaningful weights around the red
dot.
RBF Kernel Formula:
RBF(x(i)
) = exp

||x(i)
− x(ref)
||2
kw

Kernel Width (kw) is the only free parameter, defines weights’ radius
Thanks to the weights, we understand if the points are far away or
close to the red dot

Local Explainable Model
As the last step, LIME uses a surrogate model to approximate the ML
model in the small region around our reference red dot, determined
by the weights.
We may choose any kind of explainable model for the approximation
(Decision Trees, Logistic Regression, GLM, GAM, etc), although my
preference is for Linear Regression (it can be viewed as the tangent
to ML model.
The default surrogate model in LIME’s Python implementation is Ridge
Regression, which belongs to the Linear Regression class of models

A brief digression on Ridge Regression
Ridge Regression is just a linear model: E(Y) = α +
Pd
j=1 βj Xj
But the coefficients are estimated using a penalty based on their
norm:
β̂R = X
X + λI
−1
X
y
Figura 2: Ridge line tangent to the ML model

How to use LIME: Feature Importance
The explainable model is usually
exploited to understand which
variables are the most important
for the ML prediction for the
specific individual
(highest coefficients highlight more
important variables). Figura 3: LIME trained on a Medical
Dataset. We understand which are the
major death risk factors

How to use LIME: What-If Tool
LIME models can also be used to test what-if scenarios:
If I were to earn 500$ more a year, how many points would I gain on
my credit score?
Important to remember that LIME model is valid only locally:
the scenario we test should be not too distant from our reference.
Such what-if tool is available only for surrogate models: it cannot be
done for other explainable methods such as the ones based on
feature attribution, because they don’t rely on prediction models.

References I
[1] European Banking Authority EBA. Report on Big Data and Advanced Analytics. In:
(2020).
[2] Damien Garreau e Ulrike Luxburg. Explaining the Explainer: A First Theoretical Analysis of
LIME. In: International Conference on Artificial Intelligence and Statistics. PMLR, 2020,
pp. 1287–1296. ISBN: 2640-3498.
[3] Riccardo Guidotti et al. Local Rule-Based Explanations of Black Box Decision Systems.
en. In: arXiv:1805.10820 [cs] (mag. 2018). arXiv: 1805.10820 [cs].
[4] Patrick Hall e Navdeep Gill. An Introduction to Machine Learning Interpretability-Dataiku
Version. O’Reilly Media, Incorporated, 2018.
[5] AI HLEG. Ethics Guidelines for Trustworthy AI. In: (2019).
[6] John Kingston. Using Artificial Intelligence to Support Compliance with the General Data
Protection Regulation. en. In: Artificial Intelligence and Law 25.4 (dic. 2017), pp. 429–443.
ISSN: 0924-8463, 1572-8382. DOI: 10.1007/s10506-017-9206-9.
[7] Thibault Laugel et al. Defining Locality for Surrogates in Post-Hoc Interpretablity. In: arXiv
preprint arXiv:1806.07498 (2018). arXiv: 1806.07498.
[8] Marco Tulio Ribeiro, Sameer Singh e Carlos Guestrin. Why Should i Trust You?: Explaining
the Predictions of Any Classifier. In: Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining. ACM, 2016, pp. 1135–1144. ISBN:
1-4503-4232-9.
[9] Gregor Stiglic et al. Interpretability of Machine Learning Based Prediction Models in
Healthcare. In: arXiv preprint arXiv:2002.08596 (2020). arXiv: 2002.08596.

Machine Learning Explanations: LIME framework

More Related Content

What's hot

Similar to Machine Learning Explanations: LIME framework

More from Deep Learning Italia

Recently uploaded

Machine Learning Explanations: LIME framework