0% found this document useful (0 votes)

1 views14 pages

Cours ML

Uploaded by

consciousnesscollapse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views14 pages

Cours ML

Uploaded by

consciousnesscollapse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Formalization of superficial learning

-How to
formalize the data ?

3 input space X
>
-
spaces :

label Y
a
space

a
prediction space

eg am clasification of diabetes
patiente : X =
CR2 Y =
& yes ,
nol =
3 yes not
,

:
the learner has a cart (lar) function C

y y ,

The learner
generates a model (or prediction function h : X Y
x - h(x)

The dataset is (as ,

ya ---- (an , yul with niex ,
yie Y

-
The empirical risk of a model h am a dataset is its
average
loss Ch) = Chri , gi

Empirical Risk Minimization (ERMI principle

A learner which outputs Eaugmin, Ra is said to

follow
Ex least but I nearment not
:
ordinary square a re ERM , neighbors a re

-y O

*
-

Y
Linear
Regression ERM
algorithm have X= R Y= SR
-

as an in
we =

The model we a re
learning is ho b) ,
= on + b

a in
- usual formalization :

-ERM
formulation

2/sec fery((y y) (y yk
(
Gatox + -
= =
, ,

= augmin hahi

How do we compute and 5

La
Fid
.
The loss
function is strictly connex in a and b so we write .
The solution to en
se en se

of equation is b ,
(in TD)
cincoamado
et, o
arroga to te
paring cerpee et

- Gard is the offset

atotensile
-

The "Comores cordinate trick" -Assume we add a va r equal to

dimensional
one to each si

-
This stimulates the
offset
With this trick our prediction function becomes ha(a)-sign (atal

In the ERM formulation

en
auguin h gi ,
with
e =y hylon

(
=
+
ifyx y
o else

complexity of ERM for linear classification

infitting : write things more
simply

intuitively in
Regression and classificate

be
② data Undergi right fit
low
might noisy .
When
you reach
very error .
: not complex enough . Best :

① we have
points .

Thy to fit perf the curves we clave oscillation :

thigs go badly on
futr data

Naïve vision

In Regression : 3 cases
of overfitting -

Benign

Tempered

-Catastrophic
ween
overfitting ,
the empirical or is close to 0 but the
generalization er ror is high (enor we get on futur datal
risk

EE :
assumptions : -
statistical :
points X a re
fixed but the labels a re random .
For instance in linear rogresse
* +m

-parian
1)
yi = i + Einoise Einchod)
(yi merloti ,

-
ML : ci not fixed/Naïves Bayes) Random design

wardom dist , an unknown dist .

each
examples have been drawn from a dist

classifier h Pxy(h(x) Y)
generalisat er ror :
given a ,
c .
e = *

m .
s .
e =
E,(x) y(2)-

Fixed design anumpte

-x an fixed for any we have a dist on
X
y
---

La Ex nut(oi 1)
:
yi ,

We need to find randam labels C 1 as-un

Gast"-Yuest
new : .

IP(h(xi) test)
c . e
= =
y,

tree diagram that

Now in
regressionReg :
represent a c u ve

Expected -

Expected MSE

EEMSe =
Ex ...
yn Eyste -. Ynteil-yitest') recall that yiflai) + Ei avec Ei a noise
The leaves have the better the on i.
m o re
you . e s

EEMSE = Bias + Var 2

+ +

Bias-Variance

II)
Mitigatting overfitting
2
options : -reduce bias

-
reduce variance

1) Control the complexity of the learnat clarifiers (e

g
limit the
.

size of a tree , b
of leaves
>
-
reduce the variance but increases bias (hopefully less

of classifiers
2) Build ensembles :
bias = biasens(l) = bias((a)) , Vau(ens(a)) =
Vou
= vare(a)
-Vor(he(a)
SAME bias ,
SMALLER Variance

.
12
Regularization
the class leave
-a
way
to reduce the
complexity of of funct we

Im parametrized regression (e linear 1) we will impose either soft or hand constraints on the nor m
of o
g
.
-

=
aminiyagmini-gil
(Ivanov Regularization (Thikonov regularizate

Both a re equivalent

Let us retun to ou 1D
regression plx =
auguin (o -y x -y
,
+ =

=
A small regularizat is always better than no
regularizat at all (TP)
Supervised Generative
learning :
Bayes clarifier

-
The classificate setting

-The classifier/The model / The predic

hix- :
angoal is to learn b which predict the label of example as well as possible

h(as-fyes
if al is
sports Empirical Mick : R(h) =

A . Macailyis
no otherwise

RDA : each labeled (ni yil

,
has been drawn

independantly from an unknown distribution

P(X Y),

If
cale) fusports the
errorae
XR

eare of
c) The True errort
clarifier

risk (or
Af : The twe
generalisation error is
El
random
von drown from P(X , X/

Ch) = abyi) .

IGX and Y a re
discret YPlay el , e

]
Heree(h(X) ,
Y) =

Va **) To the risk of h noted R(2) =

Eca(x) * =
Pa(x) * Y)

- plug in method

From the data ,

then estimate (X/x) then Coptimal , estimation/classifier
*
is the h
The OBC
function E
argmin RCh)

*
with he measurable function x-

reminder on the expectations

·
E[1] = PC

for amy g(x y) y)n(y(x)dy

Eg(x Y)(x)
) g(x
, , = ,
·

tif x is continuous

.
Total law of expectation :
Eg(x x)), = E
, [Eg(x Y)1X]]
,

R(h) =
E 54e(x) = x]
,

= E
,
CE21ce(x
, =
xy(x]]
=
Ex[Py(h(x) = Y(x)]

=
Ex[1 P(h(x) -

= Y(x)]

Y = ex]
↓
ECE Maxe
= -

,
-

determine
when
conditionning on X

-
EIP(Y (x)ex
-

= =

gopt E Ral
augmin
h measurable

gopt (Y
au
augmin E
e = h(x) x
*
ces

gopt ex
eaugmax E= (X)awe =

augmin
Cy ex)
=

hopt (x) N [y le se
angreax
= =
,

R(hoPt) (x-en(x)) Bayes Rich

Ex[max =
Another to write gopt GoP(x) augmax P(Y k(x)
way = =
-

REY

P(x(Y e) PLY er)

augmax
=
= =

2 P(x)

gopt Pla( a)N(V 2) (generative formulation

-
augmax = =

heY

-Generative
learning
-
Our
goal is to get P(X1Y) and P(Y) ,
estimates
of P(X1Y) and PCY)

-start with Y =Y= Eyes not ,

-
we have some data ((x ys), , ..,
(nn ,
yul]

-We will estimate P(Y) likelihoad

using maximum
approach

-Random PCY) is Bernouilli

Design = a

-
Let =
P(Y yes) and
= T = (Y=
yes) in re n known

gri) =y)
of It Ply
=
-Likelihood on the data in ....,

-
Max
log Likelied approach : =
augmax log &(y .
---
,
yni)

augmax le
=
r

augmax Neslog(t
+
N log() with
Nyesie
=

augmax Nyes log No log

+ -
Noegina

=
Andy Nyes No

=d =
#(No Nye) n +
Ny
= + = = =

No Nyes
(No P(Ya = =
=

Pleed(Yes) = 3/5 Hsurlyes) = 115 Hdomlyes) = 31

PredInd = 25 P(surlnd =
315 dom(nd = 3/5

Cred ,
sur ,
daml =

REY
I(x
augmax = x((y 2)(y 2)
= =

.
for k =
yes :
(Y= yes) (x = red/yes) =
Generative learning for continuous data : Linear Discriminant Analysis (LDA)

#) Recep

.
We have a dataset (x ya) ,
----
(an , yal

.
nieX ,
gitY with Y=
&G ...

Cr]

.
RDA : we assume all (xi , yi) ~P(X Y) ,

classifier hal the risk P(h(X) Y)

.
We a re
looking for a
minimizing *

·
If Pl . , ) wa s unknown ,
we would have the optimal Bayes h (4) =
augmax
P(x/Y = cel(ca) with a classes

i f (aKP(c)
E
*
h (x) =
2 =
P(x((2)P((z)
C2 else

Im
generative learning ,
we estimate Pluich) and cel then we estimate a)

Last week ,
we worked on the case where X is discrete

Today crd continuous and to P(x(ch) will

X =
approximate we use
gausians

Let X ECR random variable

following a
gausian distribution , X) o r X) =
off
p(t) density distribution
( %252(x
-

cr(x(p t) ,

=
Let XER & An
easy extension to this multivariate case is to a ss u re that each
component x of X
follow independantly or (p sal,

·
P(x)
= cr... exp(-XNG (2 rd

ur(X(p 2) ,
X(2

·
N

· >
X()

INECR
men
S
For the
general case
of multivariate
gamerion in we need

I ecraxel covaciance matrice (symetic , real ,

seri positive
definit
values 70
=
eighen

ur(X(p [)
(28det()nexp(-f(X-T-(X-
() ex
1)
, = 1 -

Ig = is
diagonal chen

Stutics : ECX] =
p

Cor(Xt x() ,
=
Eje in vector fam
Van(x) = E[(X -

piT(x- p] =
E

(X'x' car(X *), X(2)

*
Im 2D relation with stal dow and carrelation coef corelation )
Poe
= =

;
f(j) +(m)x

·
Var (X(t) =
+(5)

d) Pr 2xf()f(2)
·
I =
,

Pr , 2x + ()xf(2) -(2

+
Let us
analyse -1(x-pITI (X p)
-

symmatric decomposed
= -)
I T is real I DUT U athonormal
so it can be . with eighen waches,

...(
D =
diagonal (valou propre

Et
*

So = (UDEJ = UT D'u = UDUT

X(2)
Lot z = UT(X-N) .
Z is a rector X in the base Un ,
Uz , Us .....
M

·
-

f(x
-

pitz- (X -

N) =

- 02 x2
P(x) L
-

-
=
D z

>
x(x)

-
=

(directo determinée
por l a propre
la mat de
Si covariance est
diagonale Si
pas diagonale

Ig P(x *,...., X( )
*
cr(X(p I) them the marginal (X- X2) and conditional (X X2)X-- X)
= ..., ave
gaussians
---- --
, , ,
,

#I) LDA

to
>
- We need
approximate P(X(Y = Ce

Assumptions : -

yESCa , Cry

P(XI(2)
- P(XIC) and a re
gaussians sharing the same covariance : Es =
Er = T

&
P(xica) =
u(x(m [1) ,

P(x((z) = u(x)pz (2) ,

P(Cal =e P((2) Ta =

se
der
3 cases : a

-
·

here
culif
E E log
If we a re able to estimate : (F . , , pi) ,
then we can
compute if

C otherwise C2 else

exp
log
=en 1(x-plT(x + (x pi) ma)
+
p (x
en(
-

=
=
-
- -
+

N)
exp(-(a-Ni)
=
(x
-

--
Ecra --

(Coizer
cal =
classi
en se

IV) Estimating parameters ,

max likelihood

a = (T
+ + ,2 , Na Nc ,
,
Il

22(d) yn(d) likelihood

=
log plat ,
ya , ..., an , =
log

= 220)
augmax

Let Ik =
Li :
yi =
Cal and Ne = Kel ·

plat ,
ya ....,
an
,
ynldplai , gild (sample al

=Plailyi , d) p(yi)d

= urlai ,il

Lomcompete
E and

aumax logie e
=

*
=

augmax *r{-zi
-

Nel [ (xi -

per) +
logz]

For RE24 2} ,

=
:
augmas -pen
N2=ren

Doing similary for I ,

we get =-i

# Is
get the linear classifier
ve
= we
E) Beyond LDA asumptions Kernal
Density estimator

.
LDA a ss u m e s each class is modeled by one
gauniar

. DE assumes class a re modelled by many goussions

Cone per point !, centered on that point of cor matrixId

Let XECRI be a random rector of class ch (DA assumption KDE

place) ur(xilner , El pla(ca)

=
=

(2) =
augmas r (ali ,.

Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
Machine Learning and Data Science For Actuaries
No ratings yet
Machine Learning and Data Science For Actuaries
242 pages
Activity Guide Term 1 Sepedi Print
100% (1)
Activity Guide Term 1 Sepedi Print
179 pages
Internship Guideline - 2023-24
No ratings yet
Internship Guideline - 2023-24
27 pages
1. Statistical Learning Theory
No ratings yet
1. Statistical Learning Theory
100 pages
Unit-2_MLT
No ratings yet
Unit-2_MLT
84 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
03-bayes-nearest-neighbors
No ratings yet
03-bayes-nearest-neighbors
34 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Educational Media
No ratings yet
Educational Media
73 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
01B-DL2023-LinearModels
No ratings yet
01B-DL2023-LinearModels
47 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
226 Lecture5 Prediction
No ratings yet
226 Lecture5 Prediction
45 pages
Weatherwax Theodoridis Solutions
No ratings yet
Weatherwax Theodoridis Solutions
212 pages
2026 NSL Application Dauli
No ratings yet
2026 NSL Application Dauli
2 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Empirical Finance 6
No ratings yet
Empirical Finance 6
38 pages
Research Questionnaire
No ratings yet
Research Questionnaire
4 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
week2
No ratings yet
week2
43 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
Logistic Regression(Probability Concepts) and Perceptron
No ratings yet
Logistic Regression(Probability Concepts) and Perceptron
20 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
EE353 - 769 08 Linear Classification
No ratings yet
EE353 - 769 08 Linear Classification
22 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
t4-sol
No ratings yet
t4-sol
8 pages
Quantile Regression
No ratings yet
Quantile Regression
3 pages
Organizations in Action
0% (1)
Organizations in Action
50 pages
Assignment 10 solution
No ratings yet
Assignment 10 solution
8 pages
Machine Learning Lecture Notes Undergrad (1)
No ratings yet
Machine Learning Lecture Notes Undergrad (1)
19 pages
Factors Affect Grade 12 Senior High School Studentsin Choosing Coursefor Tertiary Education
100% (8)
Factors Affect Grade 12 Senior High School Studentsin Choosing Coursefor Tertiary Education
63 pages
Econometrics - Exercise set 2 (solution)
No ratings yet
Econometrics - Exercise set 2 (solution)
12 pages
SE PHD Shortlisted July 2018
No ratings yet
SE PHD Shortlisted July 2018
22 pages
Philo Module2PPT
No ratings yet
Philo Module2PPT
19 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Error Propagation
No ratings yet
Error Propagation
22 pages
cheatsheet 2
No ratings yet
cheatsheet 2
5 pages
480-note-lin
No ratings yet
480-note-lin
11 pages
CH 1
No ratings yet
CH 1
24 pages
Very Important History Project Due Module 16
No ratings yet
Very Important History Project Due Module 16
8 pages
SZC Pathfinder Manual
No ratings yet
SZC Pathfinder Manual
19 pages
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Class14 PDF
No ratings yet
Class14 PDF
29 pages
Example: Colour - Blue: Hyponymy
No ratings yet
Example: Colour - Blue: Hyponymy
20 pages
Lec 10
No ratings yet
Lec 10
8 pages
Lec 6
No ratings yet
Lec 6
14 pages
1-3 Not Answered
No ratings yet
1-3 Not Answered
5 pages
Mid-Term2024 SOL
No ratings yet
Mid-Term2024 SOL
4 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
State of Art Literature Review
100% (1)
State of Art Literature Review
4 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
01-02 Lesson Plan
No ratings yet
01-02 Lesson Plan
6 pages
The Higher Power of Lucky: Chapters 1-5 Reading Comprehension
No ratings yet
The Higher Power of Lucky: Chapters 1-5 Reading Comprehension
10 pages
Teacher Education Programmes
100% (1)
Teacher Education Programmes
6 pages
Cheat Sheet Sta374
No ratings yet
Cheat Sheet Sta374
1 page
Sachmaas Grammar N Creative Writing I Readers' Handbook
No ratings yet
Sachmaas Grammar N Creative Writing I Readers' Handbook
2 pages
MIT15 097S12 Lec04
No ratings yet
MIT15 097S12 Lec04
6 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
Abnormal Psychology
No ratings yet
Abnormal Psychology
13 pages
DLL Spa 10 q1 WK 4
No ratings yet
DLL Spa 10 q1 WK 4
5 pages
Direct To IELTS
92% (24)
Direct To IELTS
156 pages
3.1 Binary Classification
No ratings yet
3.1 Binary Classification
4 pages
Exercise 3 Computer Intensive Statistics
No ratings yet
Exercise 3 Computer Intensive Statistics
10 pages
Pat 02 Sol
100% (1)
Pat 02 Sol
5 pages
Engineering Mechanics Statics 12 Solution Manual
50% (4)
Engineering Mechanics Statics 12 Solution Manual
2 pages
WEB 2.0 Assignment: Sahil Rashid Mir B.E. Mte 6 SEM EXAM NUMBER: 118015
No ratings yet
WEB 2.0 Assignment: Sahil Rashid Mir B.E. Mte 6 SEM EXAM NUMBER: 118015
10 pages
Log-Linear Models and Conditional Random Fieldsels
No ratings yet
Log-Linear Models and Conditional Random Fieldsels
27 pages
Linear Models For Classification: Logreg - PDF - May 4, 2010 - 1
No ratings yet
Linear Models For Classification: Logreg - PDF - May 4, 2010 - 1
7 pages
Music and Movement
No ratings yet
Music and Movement
26 pages
Solutions To The Exercises On The Bias-Variance Dilemma
No ratings yet
Solutions To The Exercises On The Bias-Variance Dilemma
8 pages
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
No ratings yet
6.867 Section 3: Classification: 1 Intro 2 2 Representation 2 3 Probabilistic Models 2
10 pages
El Banquete de Platón
No ratings yet
El Banquete de Platón
2 pages
Model Selection and Multiple Hypothesis Testing
No ratings yet
Model Selection and Multiple Hypothesis Testing
6 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
189 Cheat Sheet Nominicards PDF
No ratings yet
189 Cheat Sheet Nominicards PDF
2 pages
Train Custom Data - Ultralytics YOLOv8 Docs
No ratings yet
Train Custom Data - Ultralytics YOLOv8 Docs
1 page
Ack No Ledge Ment
No ratings yet
Ack No Ledge Ment
4 pages
Villadiego, Azeleah N
No ratings yet
Villadiego, Azeleah N
1 page
Tanah Aina Farah Soraya Quotations
No ratings yet
Tanah Aina Farah Soraya Quotations
2 pages
Guided Reading Script - All Tutus Should Be Pink
No ratings yet
Guided Reading Script - All Tutus Should Be Pink
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Cours ML

Uploaded by

Cours ML

Uploaded by

Formalization of superficial learning

The dataset is (as ,

Empirical Risk Minimization (ERMI principle

A learner which outputs Eaugmin, Ra is said to

How do we compute and 5

- Gard is the offset

The "Comores cordinate trick" -Assume we add a va r equal to

In the ERM formulation

complexity of ERM for linear classification

Thy to fit perf the curves we clave oscillation :

wardom dist , an unknown dist .

Fixed design anumpte

We need to find randam labels C 1 as-un

tree diagram that

EEMSE = Bias + Var 2

1) Control the complexity of the learnat clarifiers (e

-The classifier/The model / The predic

RDA : each labeled (ni yil

independantly from an unknown distribution

Va **) To the risk of h noted R(2) =

From the data ,

reminder on the expectations

for amy g(x y) y)n(y(x)dy

R(hoPt) (x-en(x)) Bayes Rich

P(x(Y e) PLY er)

gopt Pla( a)N(V 2) (generative formulation

-start with Y =Y= Eyes not ,

-We will estimate P(Y) likelihoad

-Random PCY) is Bernouilli

augmax Nyes log No log

Pleed(Yes) = 3/5 Hsurlyes) = 115 Hdomlyes) = 31

classifier hal the risk P(h(X) Y)

Today crd continuous and to P(x(ch) will

Let X ECR random variable

I ecraxel covaciance matrice (symetic , real ,

(X'x' car(X *), X(2)

So = (UDEJ = UT D'u = UDUT

P(x((z) = u(x)pz (2) ,

IV) Estimating parameters ,

22(d) yn(d) likelihood

Doing similary for I ,

. DE assumes class a re modelled by many goussions

Let XECRI be a random rector of class ch (DA assumption KDE

place) ur(xilner , El pla(ca)

You might also like