0% found this document useful (0 votes)
73 views

Machine Learning Handwritten Notes

The document discusses various machine learning techniques, particularly focusing on Ridge Regression and its role in reducing overfitting by introducing bias to the model. It explains the mathematical foundations of Ridge Regression, including cost functions and the impact of regularization on model performance. Additionally, it highlights the differences between Ridge and Lasso Regression, emphasizing their applications in feature selection and prediction accuracy.

Uploaded by

Tatiana Pará
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

Machine Learning Handwritten Notes

The document discusses various machine learning techniques, particularly focusing on Ridge Regression and its role in reducing overfitting by introducing bias to the model. It explains the mathematical foundations of Ridge Regression, including cost functions and the impact of regularization on model performance. Additionally, it highlights the differences between Ridge and Lasso Regression, emphasizing their applications in feature selection and prediction accuracy.

Uploaded by

Tatiana Pará
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

•• Download Machine Learning:

https://t.me/AIMLDeepThaught
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/
•• Download Machine Learning:
https://t.me/AIMLDeepThaught
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/
•• Download Machine Learning:
https://t.me/AIMLDeepThaught
•• Download Machine Learning:
https://t.me/AIMLDeepThaught
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/
•• Download Machine Learning:
https://t.me/AIMLDeepThaught
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/
equlanizaton

om6times when wc tfain a model Ui t


StaSt to

,
ovcs{it. A way to avold oves{4Aing data
especially o models (tke tineas segcessfons
hat a&e haviluy atkeoted by outiess we can
USe &egulaxizaion Tbis wi ead tota mose
N9eneSamodel that is technicall
less accusae
bt geoecaliges to he data betes.

. Kidge12 BeasesS
used to cedúce oveHHi ng)
E
Cost unclion= O

Tsaininq data :lou bioas

Testtng data:low/bigh
VOianceY

o I4hehendata (destdaka) is neae to besi (tne


pexkofmance wi be govd.
(low voeiance)_

hehen 4est dota


is
kas (away) to best pit linc
pexlosmancc uwin bebad.
(high vawianoc)
aim: To SeduceovekliHing

> best itlne


t
tvecseate multiple
tines to imp&ovNc
peonmance oj des+
data.

costPuncHon:
2
cost uncion
hec-
m i
A(slope

beCo)

muliple
. Bo+ein>>

keaturcs
hypex

slopc= 611

aepeeSent, then
paxamete

(slope )=2(slope

Slope slope O dikkeeeot iocS.

cost unciion is Same as lineat seqsessiorn


Costunclon.
Relationship between slopcand

20

-02 o 06 o8
sbittinq tousards 2ero global minima

GHlobal minima gets shited Houwadeds let with


incecase in Va.

Cost unclion = O+ (slope


•• Join me on LinkedIn for the
+Velatest updates on ML:
https://www.linkedin.com/groups/7436898/
change evaluc to cseate anothe best kt line.

slope

inveksly pcapaSHonal.
A =4 make swe +hat ou line'doesdt
OvCit.

O
the
is

lage, the
OSbinkage.
a

The cockhecieot
compleocity
amount o shinkage *

Value o ,
poMameee hat

he gseate
arc sh<unk towayds aero.
controls 4he

the amount

value nevee becomes 2eO

beCo) 0o+OI+2+ 933


eo+ O.g5 + 082o2 +0.10o3
4 will qet deletod

Ridge Regsession is used to inteodace bias


to4edata inosdet 4ogenerolíze the
data and inctease bias.
This is uselul iE you don'+ have much ainin9
dota.

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
O
o egsession

L RequlariaatHon NOSm)
+is used to &educe the leatwres belps in
Leature Selec4ion

CostuncHon

cost huncHon (hetayatepel

TCe 30
-20
0 t
-02 O o2 0.4 06 o 8 .2
be() -6o+ 61o+O2o2 +9303
o+0.S40 +0:23d2+o 103
leastcONElaed
data has ouHiers use Ridge Reg&ession.

Lasso Least Absolute Shsinkage and


Selection Opekato Regsession

Lasso Fegsession dends toeliminate the weigbts


Ohe
o the least impostant catur es by setting
Aheis weights to eNO

Elastc et
combinaton ot LI and l2 Regularigalion:

cost uncHon
= he(x)-) a (slope)2
L2
21slope

Can be chanqed to MAE, Rmse,


mSE

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/
Notes taken from
John Starmer of
Stat Quest
Youtube Videos
KeaalavigatonRidge (L2)Reasescion

efind he line Ahat Sesuts in thea


minim am sum of squaned
Sesiduals.

we end up awitth he ean ofHhe line

Siae - o75%uweight
weignt>
-inecept slope

Uwhen we have ok lot ok meascee ments we can


oe faisly conident hat least squaves lme
,
accxoately &eflectss he 6elaHonship betuween
sige and welght
2
Bu what if uwé onlyhave tuo.meascee ment s S
we 4 meus line since he neuo lneovetlaps
the 4usodata _points, he minimum 3um o squae
&esidaals O
Nes linme eanwsigeE 0:44 1:3xweight
Sumo) Ahe squaked Sesidaals fo6
desHng data ts lakge
which means Hhe neu ine das
high vauiance
o&iqinal dala
in m,neus line(6lue)is ovexéit +oaining data.
Ahe main idea bebind Ridge Regeession îs4
Hnd a neuw line ha doesnt f+he eainin
data aswell

Inothee
o
uwo ds, we initodace
BiaS intohoshe neo line ts
but in etuwen fos ha
o
+
small amount
to the data
smal1 omount ot bias,
we get a signi Rcant dkop in vaniance.

Ridge Regsession can péovide bette longg


ee p6edicHons.

when 1east squares deter mines values fo6+he


in 4his equaio
paamelees

sige q-amis îndercept +slopexwelgh


minimizes. -
he Sium o the squoJeed &esidaals.

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
Im con ast
Uwhen Redge Reg&ession delekminesvalues
fo8 he
pakamelers im +this eguaion

Si3e -acis indekcept +slopexuwet


4 minimiaes
he Sum o+he sguaHed Eesiduals
Ax he slope e i

lambda.
his poet adds a penay to
iao the aditional least squake
method.
and lambda (A) dede oines how sevexethat
penalty îs.

he sum of squaed sesidaals


fo& 4he esiderols least squaee ftt
O
is Cbecause theline oveklaps
the daa poinds). and heslope
is3.
0+A a(1.3) =O1(1.3) =1:69
O
sopeE
forblue line
(O 3(0:1)
) allHogethee
Ridae Reqsession tine
Red l63 , 04
13lae

Thus, uwe uanied +ominimize Hhe sum of +he


penolt e
squared 6esiduals plus he Rtdge Regsession
woad d choose he
Rrdae Regfession Line over he least square
Ane
withocut he small amountot Blas that he
penaltCeates,the least squoees £t+ hoas a
loae amount o Vaxiance.
Incontkast,the Ridge Regsession line,which
has he Small amounH o Bias dae to the
penal hasless vaiance

hete is a omeunit inceease (n


pêedteted sige
sie AMis tine suggests hatfoa
evetyOne unit înCHease
in Aelgaht

etaht

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/
1fhe slope of he lineis
Sdeeper

Sige

nen fo8evesy One unitinckease


4 inweraht
eigt
hen he psedicion size
inceeases byovetuwo units.

In othe wosds, uwhen 4he slope okhe tne îs steep


hen he
psedicion fossige isveky Semsitèvta
Sensitueo &eintively smal changes in
weight

when heslope is Smal,


Hoen f06 evexU
intokease

he paedetion
tn wetaht
One unit
w
fos sige boely
nu
eNt2or
1
indeases

ln o4hee twosds, Cohen'4he slope of he lne is 8malt,


hen paedtctions fos sige axe much less Sensilisee
changes n_eighk.
least squake ltne

Ridge segsession line

The Ridge Reg&ession penalte sesulted tn aline tho


has o Smalle slope
uwhich nmeans hat p6edictions made with he
Ridge Regsession lIne axe less sensittve do
useighi4han the least squaee line.

Ridae Regcession éRR)

Sum ofthe squaxed fesidaals +ar(slope)


can be any value from O +o posiHve In ftni

0,]Ridge =
Awhen

RR
&egtession line

ended up wih
he leoaSt
leastsquare

a smallee slopehan
squoRe line
ine

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
Ondthe lokgee we make A, the slope gets
asympto+tbatly aloseto O.

SO,he laxge gets,ouepEedictions foa siae


becomes less oand less sensiHve to weight.

Sohou do we decide whad vatue +ogive

we ius 4eqa banch otvalues fos\ and use


CrOss-alidalibn, tupicaliy 10-fold cross valldalion,
o detecmine uhich onesesuHs in lowest he
NOKIOnce.
upttu nouo RR was fo6
Contnuous vakiable.

Howevele, RR also uwosks ushen we use disceele


ariable. stae(S7Hleh
Yinlgkcep
die
Olscxetevatiable:

co6s esponds
O
nAer cep C
o 9
40heaverogesige0Smal 8
he mice on 4he
diet

siae 1 S+o7xHiqh fotdieouse


AO&mal diet Hiqh fat

sum ok +hese.fuwo
is he peedioion fot ahe

sie olhemice on hs Hiah fa diel.


O
hese distance beiuween 4he dota and
ne means oe minimi3ed
ushen RR dedemines value fo6 the pakamelees
in he equaton..
minimies

nesum othe Squovred eesiduals


dtetdifference

0, least squoted-ettoe RRtine E


1,hen onty ay to minimiae +Hhe whole ean
isto shsink diet distance douon.

In 6thee wosds,as geislaege; oue peedictRon


fos he mice on heblgh
sige ot he fotdiel
becomes less Sensitve 4ohe diffeence
bel ween the noamal diehand aiah-fatdie

The whole point ok doing RR is because small


Sample sige like hese can lead poo
leastsquares esmales hat Sesult im teible
o
machineleaning p&edicions

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/
Ridge Regeession can alsobe applied 40
LogisticReg&ession

he Ssunm oh theHkelihoods+ (slope)

Dole hen applied.4o Logistc Regsessíon,


Ridge Regsession optmiaes hesum o the
Aikelihoods însiead ofhe squared eesiduals
because legishc Regsession is solved asing
mamimum likeit hasd

Ridge Reg tession helps Seduaevaeiance by


shuinking paxamelees and making oue
peedicttons less sensikve tohem

In genetal,RR penalty contains


all ohe pakametets i3
emcept fo& the q.intercept.

me scm of he Squaked &esidualss weiqht

(slop2
+
+dietdistance)
le ast squaxes cant ftnd a slng leopmal
Soluton, Since any ine line Ahat qoeswoug
4he dot wil minimige Ahe sumof+he
squaxed sesidaals.

but RR can nda solutiom with CeosS


alidaion and he RR pelte penalty hat
fovoeS smalle pak amee valaes.

Sumo{ he sauaked sesiduals


Cslope)

ummaku
henthe sample sizes ate elatvely Small,
hen RR can impfove pfedictions made feom
nes data (iie. educe vaeiance by making
Ahe péedictions less sensive do the
Txaining.dala.

RR penalty HselE is dimes he sum of al


Squaked parameteks, eacep+ os he
- xOSS
ntecept and is detek mined ustng
validaion.
Lasso Reg&ession: (LI)

Ridge Regsession fenalty 2Cslope)


Lasso Reqsessions

Sum of all +he squa ed sesidaals


+x Jslope

LassoReqsession Penalty contalns allof the


estmated porametees eoccept fo8 the
y-lnietcept.

Ridgeand Lasso Seg&ession sh&ink paNamekrs


heydont have to sheink them all equally.
Bia difeeencebetueen Ridge and Lasso
Reg&ession is hat Ridge Reg8ession can
only shsink +he slope àsymptoticalky
close +to O uuhìle Lasso 6egeèssion can
shsink he slope all e way +o 0.

LR can emclude useless vakiables {eom equations,


bett han RR at &educing'the vaiance în
models hat contain a lot of Uselesss
Nariahles

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
lastic Det Regsession

Det Reg&ession sta+suwith least squates


elastic-
hen combines the Losso Reasession penaHy
wih the Ridge Reasession pena4

Sum ofhe squared &esidcaals

Ivarlable Ivaniablenl

(vaeiable) a. +(yaiablen
ote
E
LRand fRR penalty aet thei& ouon As

The hyb8id Elasic et


Regsesston is especially
good at dealing with.situattonswhentheee
Ose co66e laHons between paHamelees.

This is because oni+SOUwn Lasso Keg6ession


ends +o pick just one of he co6felated
Aetms and eltminates the ohes
whexasRR 4ends to shtink akojthe pacameters
Lo&he cos@eladed. vatiables 4ogether

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/
B combining LRand RR,
Elasic-Nt Reafession gsoups and shsinks
he patametees_associated aw14h 4he cosselaed
Naiables and leaves hem in eqn
SemoveS hem allat once:
agisHe Keaession
classification paobtenm'

datoset:
Study bous Play hOtUMS o/PCPass/ Fail)

2 Fail

3 fail
3 PasS
OutlieS 1 Pass

we Cannot pexkocm eg&ession ,we need to-


into Pass (
pexko& classtiktcalion Foil

Output>1,but wehave o6 h6eshold


tuwo cohditons only yso.S =O

Best fit Line

.5 -- Best fit Line

6 8 10 12 13 14
Fail (Butin Seality itis fass)
ohy we use loqisic Reg&cssion when we can
Solve alassipicaios psoblem using
Linea Regaession

and
due to outliers best it line gets change
6esults wi be wsong

we cannot &emove outies always.


Theeshold
n togistic
wil
can't
'we squash
not
be changed, once
"c
change the line.
he line. e
aed.

Squash("cut")
best tline using sigmoid
AnctHon.

he(ox)=Go+61

Sigmold tunckion output wi


6ange betn 0 and 1

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
So, even if a perSon Study to67hrs
yos5
he model wi shooFa
n we Can
e cannot ased ineO Sing
Kegsession in 4his Ype og
Psoblem statement
change and

we connot &emovNE OutHies alwaYS.


Theeshold can't be changed, once aed
we squash ("cus +he lin wbe
nogistic
winot change the line

best ktlinc
Squash "cut")(
using Sigmoid

o uncHon.

hex)=6o+610
Sigmod kuncion > output wil
&ange betn 0 and l.

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/
Pamofo Funchion:

Sigmoid kuncon beto&t


Ceate abest itlime
2 Squashing >
Sigmoid unchon

nea Kegsession Costkuncion

SCeo,6)
heco
MSE
heox)=eo+01
GH&adient desent
Conveo kunction
one qlobal minima)

LogistC Reg&essioncos4 umelion

Steps:
Cfeaie best itline
apply squashing3 using siqmoid unclion
S(e0,e1) = he o)
sigmoid uncion
he(o) s(e0+e1)
et tihaicniMAaABot
Aet Oo+e10
heC)= (R)
hec)
be()
1+ ot81)

be()=
e0+610()

but, aket applying sigmoid junction, cost


kunction coilbetom non-conven unction
and bavc a chancc5 +o gct loca minima

e No convea
unction
e
change theCost unchon to solve thè convexity
p&oblenm

Lg loss Cost Funclion

1oq he(x)
cost uncion

conveacunclion
he (a) =E

+ ®o+61)

cost(heox) -ylog(he(a)- (1-)log(-be

'0, cos+ (he(o) )


This will nevC, aiNe local

Cost (he(o),
minima

1og(be))

)1og(1-ha(a))
valuc
reath
minimige cost uncion (€6,e1)
by changtmg
e0, 61

Convetgence Algosithm

Repeat Convegence
J

3(eo, G)

o O andjal
take hscshold =0:5
by dekault
using ROC ond ToC CUxv€, we can
deine+h&eshold.

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
xkesmance Meeics

Conusion matviot
2 Acc acy

3. Pecision
Reca
5. P-Beta Scoe

datase*

Olp P-y(nodelpfedictE

Conkusion mali Aclal valuke

co8Eect
Oweong
peediction
pfediction
3 2
pfedic+ed O
value (G)
Actua value
TP Tue Posittve
TP FP Co6sect
TN: TEue Megakve
match
TO
AOL us&ong FP: False Positive

P&edicted valuAe match F: false NegaiHve

Accwxacy

reis
AccwKacy TP+TN
TP+ PP +FN+T

acCwe acy 3t1


3+2+1+1
S7,
dataset Binary classillccatton

lcoo datapointHs

oo Imbalanced daaset
(o)

dumb model 9we get 0 accwaCy

is 307.,it is not sukktcierst


the accwracy
odel is not good To OVeCome this psoblem
e Can use Kecali and Paecision
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/

Psecision

Peecision TP
Actual TP+ PP

Outok all the actual values


howmany akecossecHy
edieF T pedicted.

O aim is to seduce False fositve (FP)

mail spam o6 ham

TP: actuas spam and psedicted spam


FN actua 6to0 d)
spam and p&edicted>hoxm
(notgcod)
Peedicted
FP: actual> haxm and aetua
Cmailis not spam)
Spam
ceiHcal p&oblem)
on
ocus Seducing FP.

Diabe4es o ot diabees
Actual diabeles and Psedicied- Not
diabetes
Ceittca p8oblem Seduce PN
Recall
Oat o 'ai 4hepfedicted values
Recall TPTP+FO
howmany aURE cwKAentY
Psedicted.

Tommo6ow the stock market ts going to


C&ash
TwopointS ConsuAmtrS FN

Companies RP
(can Aake cextain
decision)

Actua fot FPcompaies cautake aclion

Te company aeHon ?Sale shar es


and discounted p<ice)
o
pedicted
TN
Consume

-Beta Scose

F-Beta Scose (1+B2) Psecision xRecall

2.Psecision
+ Recall
O FP and FN oke both impostant

PI Scose =2.PxR
P+R

PP is mose impostant
B 0:5
han P1o

f-O:5 scose =(1+0.25) PxR


O:25xP+8

3 1 F >>FP, (FPis
B2
less impo stant 4han Fru),

F2scose = (+4)PxR
4x P+R

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
In-depth Insights
Logistic Regression
papergrid
Date:

Clasalea
Classikicaion_psoblem, it just /kethe
eOhleM ECept hat the values
a
ake on oni small numbef o e
Segsessionadel
nooUOant topdedict
discsele Nalues

yEO,1 O Ve class
Binaey class 1 +Ve elagS
Hypohesis Kepseserstatio

iu iively
ages %an o
,
it doesh't omake seDselo8
0
smalles thao obeo we
be(ox)totakeNalues

To jioxbis le's changethe fosm kooaohat yéO1


tosatisky {osOus hypo¥besis he)
0sbe(oc)
Tbis accomplished
uncion by_plugging r
e' ftohe logistic

Ou Aew os uses 46e sigmoro Runchoo


alsocalled
he "Logis¥cPuncion".

be (o) g(eox)
ga)
funetioo q*), shocmD besc,
mapsanydea oumbee8 tothbe
0,1) iotesval, making
it usekul o6
4eansfosmingan
Obita6y-Nalued tunetioninto uncion a betteašuited
{o classiticatio

be(o0 awill giveusthe


p&obability*hatou6 output isl,
OuYput:1
thusbe()= 0.3
1-PCy=olot;e)
Date.
I
papergrid

beCo) PCye1|xe)=
PCyolxie) +PCy»|lx;0)

Oecisioo Rounda&y

do ocde8Ho ge ous disceete O08classiyieG¥ON, We


HSonslae4he outori o hc hypožhesiS uneonds(o11ocas.
COS

be(ox) O:5
beot)o: 0
Tbe way ousogistickuocior g
behaves is %at uben ds
inpu is gseavefhan of equal to 3e€0,its outpt is
9EcateS hao o6 equal 40 05

>o:5 when2>0
Remembes:
0,eIg /2 Cor)=

o, e 0g()
g(o)0
SoioEinput 40 is e heo +hat means

heo) 9(®') 05 uheo 9 0

Tbe decision
afea usbe
se
bypobesis uncion
o
boundas tsthe line
ql + +hot.sepe.sates
CSeated by ou&
*e
papergrid
Date:

Yi 5+-)X+0.270
5- O
153
o his Cgse Ou6 decision boundoafy tsa stsaiqbt
yectical ine_placcd on thegsapbwhes =8 add
evesyh{ng *o he lef q{ that denotes wbie
evefyhing fothe sigbt depotes q0.

again,he înpit to
eed o 6esiomoid functioo
6e tineaf, could be atunctioo
g)
ot
Ceg.ex
a
desesibes
doesot

eiccle(e.q eo+ 6:02+ 2o) o&any shape ojt


data
ouE

Cost RuncHon

wecanootuSe hesame cost {unctHoohat weuSe o


lineas segsessioo because he
Logistic Rinction uii
oause be outpkidobe wavy coausing niang local
opma
loothee wosds, Hoilloof be acorveo uncdion

Cost uoction lo8 logisCjunction:

S(O) m i1 cost(he ( y)
cost (he(oxt),u)=-Hog (be(od)iya
cost (he (oc)) -loq-he(0 ity=o

ngto) heta)
papergrid
Date:

=0
cost (beCo)1y) ba)E
ye0 and beo)->1
cos+(beo)1)0,i
COS (he(atd,y) ie and be(c)->o

ous cosSect answe 'y is0 hen 6eC0S uocon wi


6e 0i ou6 bypothesis_uncion Y
also oukouis 0. Ou
ypohesis appfoaches ,hcohe cost{uncion wi
Qppsoac6 ioginitg

ou coSRec answG& 'yisI,+hen the cost{uncion


pinbe o ous
i bypohesis {anckion opeist. {ous
bypethesis appcoaches 0,hen keGost unctioo coi
aposoach intinit

Nove that oeii0g the cost uncHoo to his way qua&atee


hat SC)is Coove Jo logishc_kAalion. Segkess io0

iro pllged (oetunclion and G7sadientOescent

we caD comp&eSS OuS GOSt KunCHion's woconditional caseS


oto o0e case

Cost(he(x)ig) =-y loq(he(o))- (1-y):log(-he(x)

Thus, uwheo we substitute yal io abv eg,


a cos+(he(o)) ylog (ho(o0)_
similafiy, weaet anožhes Tesm_Cwhen ye0.

'wEcanweideouSeotE CoSt uocioD as


olocOS
papergrid
Date:

SCe)-1 tog(hetod)) +(1-i)-leg1-haloz)


A vectosiged implementationis

SCO)= logb-Clu log(l-

Gadient Desccot:

Giencea {osm o gadient deseentis

RepeatR

we cOaO WOSkout he deSivaivG pa€ using calculusto


et

Repeat
\.o
halo)-yi
mi1
2
Notice
tnlioea
updaie
Segsessio0:
al
e
hat hisalgos ithm is idenical to he one awe used
St have do simulta0eously
valuesin the1o. he)
isdittesent

A Yectosized inmplemenaton is:

e - x(a(xe)-

Advanceod Oplimalfon

waot mina Se).


CoStunc¥ion S(O):
e,wehave c0dehot can compute
Gaiven
Se)
S(8) tasjao
egrid
Date:
I
olgosithnms:
•• Download Machine Learning:
Optiomigawoo

GSadient https://t.me/AIMLDeepThaught
desceo
Goojugae gsadient
RGS
LBRGIS.

odvartages O No neced to manuatly pick.ibyt


(2oden jasies ¥han gsadientdescen
dis adv" mofe comple doimplemen*

Mollclasg_elassiLcaions

Nocw wewill appsoacb #he class'caioo ofoata usbeo cwehave


MoEC6ao tw0 Co¥eg0&ieS
Insteao of y=Co,1 wc wiil eopand ousde{iOHoo sohat

Sioce y=lo,1, Jwedivideoufosoblem tnto


he index stasts at0) binasy_ elassikicaion otl (+ 6cz

600c, wepfeoi of 16epobabeliHy thatpe oblems;in


'y' isa
raemehc membec ogone Of 0uS clasSeS.

be) P(y:olgx;)
ha(o)= PCyzilox;e

he'(ox)=_PCyzolx;6
pSediction ma«ha (ox)
wea&c basically
lumpiog a he
one class andhen
choosi0g
othess into o single 5econdclass, uwe dothis &epeated
papergrid
luio9 6i0as Date: I

logisic
usehe egeessiontoeach
bypothesis
Asedietiog.
hat case,aod *6us
ekusned thie btghest
NalueaSOu

ne-Vs-alone-vs-eal)

AA-

o
-
Class
class2
class 3:x
:
A

heloc)=P(y=ilx;e) Ci=2,3)

To summasige
Tsain a logistic segsession class ikiee helo«)jo8 each

classile& to psedict the psobaly hat qei

a
OD new input ox,to make a p$ediokion piok the class

hat m acimizes
mao be()

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
papergrid
Date:
II

KequlaiQatM

De+A

(undeettHngo8 (accukade) Covexftt)


highbias) Chiahyaxiance)

wC bavevoo maoyeatureS,hen easOEd bypohesis


may it Saining set vesy wet but ail to geeesale
gencsalige ooco eoamples

Tbese afetwo main oplioosto


add&ESS heissue Oj
oveSiiog

Reduccheoumbec
of_{eauces:
manuglly selec¥ wbich {eotues
do keep
use a
model selechon
algoeitbro

2. Requlacization
keep al thejeaukes
O
,butseduce
pasa meieks 6
heoag oitude
Regulaization wasks wellgobeg
lot o slighkly usekul uebavea
ceseukce.LeakueeS.

oe 6oveoverHA0g
{om
can seduce the uweiqht ou,byupohesis
that someok he fungkon1
eams inOudG
papergrid
Date: I

Lundtio cosoy byincseasing


heis cost
Egwe
wgoted fo make he
kollowing unoton mose
9uadKaNio:

doithout elinminale
actually
geHing eio o fhese featues
changing *beofm ofcut
oue cOSt unchon. bypdthesis, C oe
COn iosfeod modig

mine
2m 2
2 (hao)-yi +1000 e+looo.e
Now io08de& {08be cost_LuncHo0 toget
closett0 2ek0
we oil bave to eoluce be valueS O O3 and
toneot
This uwil in dun gseatlySeduce the
in
ReYO
N
oue bupothesis ungio0
Nalues okSZA

As a&esut,we see that oenbypOesis


(aeapk)
fo0ks ke a
quadeako kunclonbutiHs he data beHe duetoeodea
Small idems eg0s anod 4«t,

i/
6i3e House ,Sige qf House
0o+1x+62t

Suppose ue penalige aod make 3,84 eally small.

mi6 1 (ba(i)-qi*+
2
10vo2 lovoet

pink akaph showg Hhis complele eaualion:2


papergrid
Date:

ta parom.
all Oou heYa paamele
CouCould alsoSegulaeige
i a single sunmaions aS

mio belori)-yi+2J
e 2 il

The 2 oslamb da,is be Regulaei2otHO0 paranmele.


deYeemioes bow much 4he coss o Ow heta paramele

UsingheabovecostAncioo with becocSa Summgion


we CGO smoo¥h heoutpet o oue bypohesis {uocionso
eduee oveSLHOg

lambdo t
is choosen to be too laxge, may smooth out
he luockoo doo much and cause undeAiing.

equlauRed mea ÅeqkesSien

Gsadieot Descent
we wil1 g8adient descent {uocion to sepogYe
moodify
out Go_16om he
Sesf O hc_poramcYers because do we
o0t wan to AeOalize eo.

Repeat
o Go-X. 2 (baoci)-qi.

j ba(od)-i\

iESh2,s
papergrid
1
Date:

fbe dekm eper{osmS oU seguloeigatoo oith some

S
sanipulaton OUG update sule ca0also be sepsesenled

mi
aluays willbeless ¥haD

otutiely we can seetas Eeducingthealueo


someamount OD eveSy update
Gb b
se2nd fexm is exoacHy sameas it wa befose

Nosmal Bquokoni

To add thecquaHon ishe same


fegulaxizaion,
in
as
, we adod anotbe&dexm
inside
ou,oiginal eocept hat
he paanheses
xTx+2LXy
whese L=
aluays
O 3/O

1
down the
Lisam aeio with O atthe topleftt and ps
evesywhcse else
diagonal, with O's

I XP*).
shold hove dimension(nt)
intuitivelu ,
this is

ideoYity mo¥tiot (hough ayeaor nof


the
theèade
dentity A.
muplied
*no), ih a sinoe eealno.
ineleding

Howevel,when we add
Imx, 4hen X 'Xis non-OVeSH6le
hen X'X+*L 6ecomes invesihle.
the deem A:L
papergrid
Date:

ion
Keqularigeod oqistte Keqkess

CoStuno¥ioo
GOS uOCion {OSlogishc
Segsession GS
laq(1-8bec
SO)=2i.loq(ha(ox))+(q).

G Ca0 Segulorige this cquadion 6yaddiagaderm


Ho he eod

3Ce) -L logba (oxi)) (I).l0g(1-ba(oxi

20 je

meansto emplieitly caxclude hebiasesm,


Go. i.C. heevectos isindereo8om oto n (holdling Ot
values Bo though en)
god his Sum eotplicitly skips Go by Sunning Som
Ido D Skipping
Thus, when compukog he cquaion,uwe should
coninuously updaiehsuofolousing equgkions

Grsadieot Ocscent

Sepeat

o eo-o:
M, (baC)-yi).

- i alg

ei heCcxi)-i)a+

ste) (12
uppant Vectorachine
CsYM)
* can solve both classiicationand Regsession
poblenm.

classikicaiion SVc (Suppo&+Vecto


classiieo)

2 Regsession SVR (
Suppoct Nectos
Regsesso6 )
Some basicsS
line
Equation of

mo+C OR

Bo+Bi OR

aNtby +C 0
a coekkicient
-C
a +bo2+C 0 2 Xb/
Iniercept

'+b-O
line passes h&ouqh
Oiqin wTo =6
matria multiplicahorp
U02

Cw b0 (uo w T6ansposco)

Equaion oftne passing 4h6ough 0igin is

P.c
ahgle g0° wehave +oLnd the
distance distance O} point
TCplane
eom he plane

distance
ine
2) and in20

angle 90°
P plane in 3D)

Distance ofo point to heplane,


distance (d) = w'P

,
11
co11

11w11.11P11 cose

where w'P w TEanspose wNecto,


loll magniude okw
CAnit vec}o A vectos which has a magnitude o
is basicall called unit Vectos.

d 32+42
25
3
d
Cwheevectos, d= d
(11d maanitude
315,4ls) C3r5(415 s1os

unit vecto is a
way to get ocased on disection
bot on maqni fude
podvectoe
d wTP
d luwll:11P.l). cose

point above he plane


as 9<3o° 6<go0
Cose uwil always be +VC. e>g0 PaCota,to

point belou he plans, ase90


Cose willalways be NG
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/

any point latting abovehe plane,then


must be iess han90° (+value)

anq pointaling below the plane, then


must be qfeadect. than 90 ve
vatuc)
6
ownusardvecoE

(-ve)

(+ve)
u E
E
point below heplane Pat2)E
6300
cose wil always bé t+ve'

polnt above he plane, e>6


cose will always be -ve'

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
Gteomelic Intuition Bebind Suppos Vectog
Machine

Suppest Vectos_ classiRet (SVC)

maxaina plane
bestHt line
Suppo&+ Veco& manginas
points which axe plane
C&ossed by
maginal pane)
Equidistant
d should be maoxi mum
you can havemose than One Suppo@tvectoe

There willbe many_possible hypexplanes +hat sepaxate


dikeent classes.

we haVe leant in LR 4hat the p&obability ot a point


class given at vey élose
belonging to any
to the hypeeplan wi be close to o.s.

So, we want a bypeeplane hat sepexates (tve) ps


and (-ve)pts as kad auay as possible.

keykey idea o SVm Such hyperplane iscalled


magin- mamimiging plane
Hand maxgina Sotmaxqina
Maxgina plane Line passes h6ough neaxest
points.

Had maxqinal: In hard maxgin, we will not ind


Ony. eooSS and w
any will be able to
clely sepatate all he points by
using he maqina plane.

BatinSealwdald thexe will be many overlapp


w ith many etrosS, soMarginal planetinës (witl
be called so t maxgin

margina plane should be equidistance


rom best it line

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
•• Join me on LinkedIn for the latest updates on ML:
https://www.linkedin.com/groups/7436898/

SVM Mathematica lnuiion

Ave

-Ve

sTotb ao'+b 0

w+bz+1
w'o+bE +1

fori
-s (-X2)
2fb:-1 2

w+b-0 unit vecto8 o

d=distance should
1-2
w
be maximum 2
llwl

distance (d)=_w (1-) 2


Cos Funclions

we bave to mamimi2e the valuc Ok2


the valucs oh w,b
by changing
2 distance bedoeen maHqinad plane.

consttaint such that i


eAsdtteps
fosall classiked cossect points,

constaints> qi T+b)1

macimize 2 minimie 1lwll lossfunchon


w,b w,b

minimia

lossunchion o cus On minimization

•• Join me on LinkedIn for the latest updates on ML:


https://www.linkedin.com/groups/7436898/
COst uncion

minimiae by cha
2
min 1ull
2 C Hinge loss
3o makqin.
o&

ohere, Gi Hoomany points uwe can igno6e


to mi-classikcalion.
Hypeepakamelee
CiCeta): Summalion o he distance
o incos 6ect data points k6om
Ahe maeginal plane.

Suppoet Vecfo& Kegsesso&:

Psoblem Stadement 3asedonthe siae o the house,


we have topeedict psiGe o
Psice
he house
w+b+€
cw b
wx+b-6
maxqinal
:epsilo
eto
Sig
CostPunctio:

minimige oll
2
Ci
sb Hinqe loss

mAE
Etaoi
consitaint:yiwixis +Ei

TEuth point peilon


pedicted
Point
E: magin euof
of
Eieutds above he
(todecide osiginas plane)
E
maxgin.

Hyperparameter

keep adijusing E
+o qetbest mostgin
wecant say
imcosseat point in
Segsesso&.

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
Peice w+bt6.w +'b

w+b -E

1 pcdicted.

sige

In Regsessof, no compleie incossect Value,


In because it willbe coninuouS Valuue.

s Sym impacked by he outiexs5


Yes, 3VM isimpacked by the ouHiers.

Does Standadizaton isneed in svm?


Yes, eneed 4o perhosm No&mal izaton
and standakdi2aion.
SVm kexnel

2D
kennels
VM Kenels nean 5VC
SVmkernel

o4a
mea
Sepetable dat
inear Scperable data.

when we cceale this (O)ype o


bestkt
line

and maginas plane we ae


acualy
Solving heinea Sepeable daka.

called as Linea svc,(Fig@)

Idata is not a linea Seperable daa, qouaill


not bcabté to ceeate best kH ine and not able
to CSeate'a magin al plane even though
we cseate it heaccuracy win be verylocu
(fiq2)
Fo his typc opfoblcms, we have some
moe SVMketnels.

•• Download Machine Learning:


https://t.me/AIMLDeepThaught
obat does sVm ketnels do some tramslosmahan
The main aim is toaPply
dechnique (Some mahemaltcal \osmeda)
on the dataset

the dimension
Thisanslo&mation incteases
he daa. (mahemaical kasmala

T6anskofmaton Increasing the


SVMKexnels dimension olthe data,

:iesto
line
lincqe sepekable
>willgive eixo& :

we will ranshocm thedata trom 10 20:!

ao2
Ae
Now, we can use lineam

Seperable line.
So
2-3
= =7 49
9 and soon:

wbat isthe advantage o doing 4his ranslosmalion3

ae ansosmation)uwe can apply tinead


3vm O6 SVC.
*when wedide Conveet D 20hen wc
can divide all 4he points using single tine
which is called Linea svc

ypes o SVM kernel

1 Polynomia kerned
2. RBF kexne
Siamoid
3. kexnel

keene
2
olunomial

A not seperable wit


best kit line.

S0,we ne edto
ConveNt 20 3D.

Fig 3
Top vic ow
Owe ain aim waS to
înceasC dimcnsion
20 3B0
<
30, hypexplane is magina
CEeated Jplame

best
itlinc

Fomula
1a Jos Polynomial kcrnel Fia

C,)=( 2+1
d dimension
I
we axeconvekting 2030, hevalue of d3.

2
2|
2, 2
:.
2

:
3 unique valucs 2,

ow, iniiially
at the time o Fig 3 we have:3 caara

•• Join me on LinkedIn for the(i


dee latest updates on Machin
Learning: https://www.linkedin.com/groups/7436898/
cSe
2e
No, akHe, the Yanskosmaton
polymomia kenel

Heuu
we have 6

2
lcatweS cam be
1o6mewa
Jeatuwres

plotted
ob

as he30
a4: which means that

and once we have


wiwill be
2

all
be
uoill
wilt be 22

hese poinis, uwe


hc points.
will be
ablc to clealy sepeeale

use polynomial keenel, t qet bettes accumacy

(2 Radiad
Radia Basis Function Kexne (RBF Keenel)

-11-f
K(,)= e 22
hype paUramctet
Sigmold kexne
Hcan be used as he psooyo6 necueo
netO&ks.
k(o,) =tanb(6xxit
•• Download Machine Learning:
https://t.me/AIMLDeepThaught
•• Join me on LinkedIn for the latest updates on Machine
Learning: https://www.linkedin.com/groups/7436898/
•• Join me on LinkedIn for the latest updates on Machine
Learning: https://www.linkedin.com/groups/7436898/
•• Download Machine Learning:
https://t.me/AIMLDeepThaught
•• Download Machine Learning:
https://t.me/AIMLDeepThaught

•• Join me on LinkedIn for the latest updates on Machine


Learning: https://www.linkedin.com/groups/7436898/

You might also like