Cours ML
Cours ML
-How to
formalize the data ?
3 input space X
>
-
spaces :
label Y
a
space
a
prediction space
eg am clasification of diabetes
patiente : X =
CR2 Y =
& yes ,
nol =
3 yes not
,
:
the learner has a cart (lar) function C
y y ,
The learner
generates a model (or prediction function h : X Y
x - h(x)
-
The empirical risk of a model h am a dataset is its
average
loss Ch) = Chri , gi
-y O
&
*
-
Y
Linear
Regression ERM
algorithm have X= R Y= SR
-
as an in
we =
The model we a re
learning is ho b) ,
= on + b
a in
- usual formalization :
-ERM
formulation
2/sec fery((y y) (y yk
(
Gatox + -
= =
, ,
= augmin hahi
La
Fid
.
The loss
function is strictly connex in a and b so we write .
The solution to en
se en se
of equation is b ,
(in TD)
cincoamado
et, o
arroga to te
paring cerpee et
atotensile
-
en
dimensional
one to each si
-
This stimulates the
offset
With this trick our prediction function becomes ha(a)-sign (atal
(
=
+
ifyx y
o else
intuitively in
Regression and classificate
be
② data Undergi right fit
low
might noisy .
When
you reach
very error .
: not complex enough . Best :
① we have
points .
Naïve vision
In Regression : 3 cases
of overfitting -
Benign
Tempered
-Catastrophic
ween
overfitting ,
the empirical or is close to 0 but the
generalization er ror is high (enor we get on futur datal
risk
EE :
assumptions : -
statistical :
points X a re
fixed but the labels a re random .
For instance in linear rogresse
* +m
-parian
1)
yi = i + Einoise Einchod)
(yi merloti ,
-
ML : ci not fixed/Naïves Bayes) Random design
classifier h Pxy(h(x) Y)
generalisat er ror :
given a ,
c .
e = *
m .
s .
e =
E,(x) y(2)-
La Ex nut(oi 1)
:
yi ,
IP(h(xi) test)
c . e
= =
y,
Expected -
Expected MSE
EEMSe =
Ex ...
yn Eyste -. Ynteil-yitest') recall that yiflai) + Ei avec Ei a noise
The leaves have the better the on i.
m o re
you . e s
Bias-Variance
II)
Mitigatting overfitting
2
options : -reduce bias
-
reduce variance
size of a tree , b
of leaves
>
-
reduce the variance but increases bias (hopefully less
of classifiers
2) Build ensembles :
bias = biasens(l) = bias((a)) , Vau(ens(a)) =
Vou
= vare(a)
-Vor(he(a)
SAME bias ,
SMALLER Variance
.
12
Regularization
the class leave
-a
way
to reduce the
complexity of of funct we
Im parametrized regression (e linear 1) we will impose either soft or hand constraints on the nor m
of o
g
.
-
=
aminiyagmini-gil
(Ivanov Regularization (Thikonov regularizate
Both a re equivalent
Let us retun to ou 1D
regression plx =
auguin (o -y x -y
,
+ =
=
A small regularizat is always better than no
regularizat at all (TP)
Supervised Generative
learning :
Bayes clarifier
-
The classificate setting
hix- :
angoal is to learn b which predict the label of example as well as possible
h(as-fyes
if al is
sports Empirical Mick : R(h) =
A . Macailyis
no otherwise
P(X Y),
If
cale) fusports the
errorae
XR
eare of
c) The True errort
clarifier
risk (or
Af : The twe
generalisation error is
El
random
von drown from P(X , X/
Ch) = abyi) .
IGX and Y a re
discret YPlay el , e
]
Heree(h(X) ,
Y) =
Eca(x) * =
Pa(x) * Y)
- plug in method
*
with he measurable function x-
·
E[1] = PC
tif x is continuous
.
Total law of expectation :
Eg(x x)), = E
, [Eg(x Y)1X]]
,
R(h) =
E 54e(x) = x]
,
= E
,
CE21ce(x
, =
xy(x]]
=
Ex[Py(h(x) = Y(x)]
=
Ex[1 P(h(x) -
= Y(x)]
Y = ex]
↓
ECE Maxe
= -
,
-
determine
when
conditionning on X
-
EIP(Y (x)ex
-
= =
gopt E Ral
augmin
h measurable
gopt (Y
au
augmin E
e = h(x) x
*
ces
gopt ex
eaugmax E= (X)awe =
augmin
Cy ex)
=
hopt (x) N [y le se
angreax
= =
,
Ex[max =
Another to write gopt GoP(x) augmax P(Y k(x)
way = =
-
REY
2 P(x)
heY
-Generative
learning
-
Our
goal is to get P(X1Y) and P(Y) ,
estimates
of P(X1Y) and PCY)
-
we have some data ((x ys), , ..,
(nn ,
yul]
-
Let =
P(Y yes) and
= T = (Y=
yes) in re n known
gri) =y)
of It Ply
=
-Likelihood on the data in ....,
-
Max
log Likelied approach : =
augmax log &(y .
---
,
yni)
augmax le
=
r
augmax Neslog(t
+
N log() with
Nyesie
=
=
Andy Nyes No
=d =
#(No Nye) n +
Ny
= + = = =
No Nyes
(No P(Ya = =
=
PredInd = 25 P(surlnd =
315 dom(nd = 3/5
Cred ,
sur ,
daml =
REY
I(x
augmax = x((y 2)(y 2)
= =
.
for k =
yes :
(Y= yes) (x = red/yes) =
Generative learning for continuous data : Linear Discriminant Analysis (LDA)
#) Recep
.
We have a dataset (x ya) ,
----
(an , yal
.
nieX ,
gitY with Y=
&G ...
Cr]
.
RDA : we assume all (xi , yi) ~P(X Y) ,
·
If Pl . , ) wa s unknown ,
we would have the optimal Bayes h (4) =
augmax
P(x/Y = cel(ca) with a classes
i f (aKP(c)
E
*
h (x) =
2 =
P(x((2)P((z)
C2 else
Im
generative learning ,
we estimate Pluich) and cel then we estimate a)
Last week ,
we worked on the case where X is discrete
cr(x(p t) ,
=
Let XER & An
easy extension to this multivariate case is to a ss u re that each
component x of X
follow independantly or (p sal,
·
P(x)
= cr... exp(-XNG (2 rd
ur(X(p 2) ,
X(2
·
N
· >
X()
INECR
men
S
For the
general case
of multivariate
gamerion in we need
ur(X(p [)
(28det()nexp(-f(X-T-(X-
() ex
1)
, = 1 -
Ig = is
diagonal chen
Stutics : ECX] =
p
Cor(Xt x() ,
=
Eje in vector fam
Van(x) = E[(X -
piT(x- p] =
E
;
f(j) +(m)x
·
Var (X(t) =
+(5)
d) Pr 2xf()f(2)
·
I =
,
Pr , 2x + ()xf(2) -(2
+
Let us
analyse -1(x-pITI (X p)
-
symmatric decomposed
= -)
I T is real I DUT U athonormal
so it can be . with eighen waches,
...(
D =
diagonal (valou propre
Et
*
X(2)
Lot z = UT(X-N) .
Z is a rector X in the base Un ,
Uz , Us .....
M
·
-
f(x
-
pitz- (X -
N) =
- 02 x2
P(x) L
-
-
=
D z
>
x(x)
-
=
(directo determinée
por l a propre
la mat de
Si covariance est
diagonale Si
pas diagonale
Ig P(x *,...., X( )
*
cr(X(p I) them the marginal (X- X2) and conditional (X X2)X-- X)
= ..., ave
gaussians
---- --
, , ,
,
#I) LDA
to
>
- We need
approximate P(X(Y = Ce
Assumptions : -
yESCa , Cry
P(XI(2)
- P(XIC) and a re
gaussians sharing the same covariance : Es =
Er = T
&
P(xica) =
u(x(m [1) ,
P(Cal =e P((2) Ta =
se
der
3 cases : a
-
·
here
culif
E E log
If we a re able to estimate : (F . , , pi) ,
then we can
compute if
C otherwise C2 else
exp
log
=en 1(x-plT(x + (x pi) ma)
+
p (x
en(
-
=
=
-
- -
+
N)
exp(-(a-Ni)
=
(x
-
--
Ecra --
(Coizer
cal =
classi
en se
a = (T
+ + ,2 , Na Nc ,
,
Il
= 220)
augmax
Let Ik =
Li :
yi =
Cal and Ne = Kel ·
plat ,
ya ....,
an
,
ynldplai , gild (sample al
=Plailyi , d) p(yi)d
= urlai ,il
Lomcompete
E and
aumax logie e
=
*
=
augmax *r{-zi
-
Nel [ (xi -
per) +
logz]
For RE24 2} ,
=
:
augmas -pen
N2=ren
# Is
get the linear classifier
ve
= we
E) Beyond LDA asumptions Kernal
Density estimator
.
LDA a ss u m e s each class is modeled by one
gauniar
(2) =
augmas r (ali ,.