Unit 1 Advanced Algorithm
Unit 1 Advanced Algorithm
I• . ·
h:aturc extraction 'and cngincenng · on numerical . and text data.
. data, cmegoncal
• The concept of feature scaling. feature selection.
1h, 1111111 t I 111<11!1 I 1 l111111, 11, 1d Ii 11,,11, 11lulil1 11,, !1 11 11 11 ti ,111, 1 I• f '""''' I 1~111 LI•
lh1111tll) \111111 oll1111,11 iil,1l•tt l111,11
111111111- 11 1111 1 l1lr 11, II ~11111 11111111, ,111 11111111,111 1y 1,,1 1,., , , 111 IIIIIIV''\ 1111 ,1 vid• ,, j,,,, 1,11111111 1111,1111 ,,., '•l/111111, llu
11
1 1111 11 1111111111
\ 1 1 N, 111,il N, '""" 1111,d, I ,.,,,ti
l,1111 1 w11 l1111111 11 111111 ; 1,!111, 11111 111 11111p,111 d1 111111v I 11111111 h
1
ln l chtlo Ill ~, 11 11111111111 ' ,11111 l111 l r 11'1 1111 1111 111111,11 111 111111\ 111r111 1 ,Ill 1·111111111/"' Jf '/' 11 " ,,,,,1,1,-,,, f/11 111,1,c
111 I I Ii Ill I• 11 il,11.1
N1111111 l1 ul 11&,111: '1'1111 111,1y 11~r '111pp, 111 V,·, 1, 11 M111 11111, 1',VMJ 111111~11, l"J•I• ,,11111 ,11111 ,f,,, 1w,111,,., , 11 y,,111 ,1 ,
41
I 111 11111111 ,ti
1111\1 111 ~1•h I I 11111111111 1111~1•111111 1111• lu~k'/
I l11~~ll1rnll1111 I 11~11 ~: \VM l11v 1~111 1ryir~,11111 , 11 1111 ,I,·, " ''" I 11 11·~
llt-1111•~~11111 h,~k~: I 1111·111 II j.1 11•-~11111 , lt1111d11111 I 1111·~1. l'11l y111111111d ll'Y,11'~1111111 1·1,
I l11 ~lr1l1111 111~1,~1 h 11 11,111~ 1 li,,11n111µ, l11n1111 111m l I f11 ~1•· 1111p
I lw i.-1 1111·• di 111111 111111 11111111' 1yp1 11111!1111 y1111 l1 11v1 ,111d 1l w 1:1•1~ y,111 d11, y11111r111y 11~1; ;1 v;11 w1y 111 111111!1 +•.
M11t11·I S1•it•1111111 I 1•c·l,11l1 1111·N:
Rn11dorn
ProbfllJIIIHllc R,m mphng
Trnln/Toel
( 'ros,-vuliclal 1011:
ll " a n.:sampling procedure lo evalua1c modeb by splitting the data. Consider a \ituation where you have two modeh
and want 10 determine which one is 1hc rnoht appropriate for a certain i~,uc. Jn this ca-,e, we can u-,e a cross-va lidauon
JllllCC~S.
Su, let's say you arc working on an SVM model and have a data~t that iterates multiple time~. We will now divide the
tlalascts 11110 a few groups. One group out of the five will be used as te~t data. Machine learning modeh are evaluated on te~
dala .iftcr being trained <in training data.
1.(!t\ say you calculated the accuracy of each iteration; the figure below illu~Lrate\ the iteration and accuracy of that
i1era11on.
Advonced Algorlthm9 In Al i!lld ML Eng c ring
Model Selecllon and Featu~
------ 1.3
Test D rrain D
Accuracy
Dataset
Iteration 1
[ J 88%
lteratton 2 [ I ] 83%
Iteration 3
I I I I I ] 86%
Iteration 4
I I I I I J 82%
Iteration 5
I I I I ] 84%
Iteration 6
I I I I J 85%
I 1111111111
11 lflf r If I I, 1I '"" fl ! (I II 1
1111pl 1 lily c,,I 111 11 lfi1f11f,,J ,,1 l!Ht•I w/111 1111 I jt I II ''f If
111111,1, 1, 1 ti, I , 1 1111 111 ,111,11 11,hl ,If, f , 1rnl!l• I I ,,,,,,,, 1 I I 1,4 ,
11111
11111 1 If! 1l1111 I.tit,,, 11 Ill ,1,, 1d ''" ,I ,,111111! Ill's U l'I ,,, ,,,,,,,1., 1•1 ,,, ,, , ,,
ti t, '
(IJ)
1
1111111 brr 1111111~ ,wrdf'd 111 d1·\c11l11; 1'1 • 1111,rli,I', 1711· l1t l1 111\
1 1
I, 11
whL'IC. I, is 1he 11111xi11111cd vul11t· ril 1111.; likelil101rrl li111t:1i1111 ol !he 11111tlcl.
11 i\ Ille 11 u111hcr ol d111 11 p11i111~.
k i~11,c 111111,hcr ol free p111·111,1c1e1·~ 10 he c~li111u1cd.
111(' i\ 11 Hirc w 111111011ly c111ploycd i11 1i111c ~cticH 1111d Ji11c111 n:grc~~i1111 11111dt:lk. J 11,wcvt:r, 11 1n;,y he ,,ppl1cd bm,1dly for
1111y 111odc.:I, hu~L"d 111111111xi 111t1111 proh:1hi li1y.
Strud urul Risk Ml11l111lwl1011 (SHM):
Thc1'l' 111 c i11,11111cc~ ol ovcrl'i11 i11!l when Ille 111odcl hcc<n11cH hia~cd l(,wanJ the 1rnin111g datu, which "it~ prim:11y •,ouru:
or lcnrn ing.
/\ gcnl'rul11cd 111odcl ,nu,1 fn.:quc111ly be chosen fro111 a lilTlil cd dal:1 ~cl in tt1ach1r1c learning, wh ,t:h lcc1di, to lhc muc of
ovcrliu ing whcn 1he 111odcl beco111c~ too liued 10 1hc Np<:cilh;~ or lhc t1aini11g ~cl w1d pcrfm 111; poorly on new data. Hy
Wl'ighi11g 1hc 111odc.:l'N c11n1plcx i1 y 11g11 in,1 how well ii fits 1he 1r,ii11ing d~III. the Sl<M print:i plt: ~olvc\ thi'i i,~uc.
N
l{.,,,1 (1) - N' 1: l,(y 1, l'(x 1)J I AJ(IJ
I
I Im·, J(I) iNthe n1111pkxi1y or 1hc model.
Advanced Algorithms '" Al and ML ---------~1-.:.S_______~M~od~e~l_:Se~l~ec:.!t.:::lo:.:,:n_::o::,:nd~Fe::a:.;.;lu;.:.r~e~E...ng"'"t_oo_o_rl_n-n
\ktm, for r, aluating Rcgrt~\lon \lode!,:
\h..J.:I c,.1lua111,n 1, rnK1al m machine lc,1ming. It ,1mphf1c, prescnlinf rnur model to other, and helps you umlt:r~l,i nd
h•"' 11.:II II perform, Sciera! c1aluullon metric, arc J1,1ilablc. hut onl) a fe11 can he employed 1111h regression.
• \ ll'on \hsolutc Frror( \I \ E): The t-.1>\F acid, up each error's ahsolu1e \'alue It is an important metric toe, aluatc a
1mxkl "\ nu can ,1mpl~ calculate \1AC b~ 11nportinf
from ,kleam metric, impon mcan_absolutc_crror
• Mean Square Error (MSt): While MAE handles all errors equally, MSE 1s computed by adding the squares of the
real output and the C\pec1ed output. then di\'iding the result by 1he total number of data points. It provides an exact
number mtlicat1ng ho11 much your findi ng~ differ from what )OU projected.
from ,klearn.melflc, 1mpo11 mean_,quarcd_crror
• Adjusted R Square: R Square quuntifie, how much of the variation in the dependent variable 1he model can
account for. Its name, R Square. refers 10 the fact 1hm II is the square of the correlation coefficient (R).
When comparing machine learning modeb. )OU must cl1oose a 1001 or platform that can support your team's needs and
~ our bu,ines, goal.
W11h Cen,ius. )OU can monitor each model's health in one pl.ice. use the user-friendly interface 10 comprehend models
and anal) ze them for panicular problems.
• haluate performance wi1hou1 ground 1ru1h.
• Compare the past performance or a model.
• Create personali1ed da~hboards.
• Compare performance be11reen model iterations.
TRAINING A MODEL FOR SUPERVISED LEARNING FEATURES -
UNDERSTAND YOUR DATA BETTER, FEATURE EXTRACTION AND
ENGINEERING
Training a model for supervised learning involves several steps, and underslanding your data is a crucial part of this
process. Feature extraction and engineering are techniques 1ha1 help you represent your data in a way that is conductive to
learning for your machine learning model. Here's a step-by-step guide:
1. Understand Your Data:
(a) Exploratory Data Analysis (EDA):
• Examine the structure of your dataset.
• Check for missing values. outliers, and anomalies.
• Understand the distribution of your target variable.
(b) Statistical Summary:
• Use descriptil'e statistics to summarize key aspects of your data.
• Identify panems, trends, and relationships.
(c) Visualization:
• Create visualizations (histograms. scaller plots etc.) to gain insights.
• Identify potential correlations between features and the target variable.
2. Feature Extraction:
(a) Select Relevant Features:
• Identify features that are likely to have a significant impact on the target variable.
• Remove irrelevant or redundant features !hat may 1101 contribute to the model's perl'ormance.
(b) Handling Categorical Data:
• Encode categorical variables using techniques like one-hot encoding or label encoding.
(c) Feature Scaling:
• Standardize or normalize numerical features to ensure they are on a similar scale.
• This is crucial for algorithms sensitive to feature scales, such as gradient-based optimization methods.
Ath nrcd Algo1llhnts In Al Md Ml ~----~-.; ,;.:.____
16 Modnl S11ler.llon ond I Miura CnglnP.orlll
_____ -...!l
• I'" n. mpll 1 ,11.1.t d.,t,· l1'.1tur,·, lm111 .1 t1mr,1.11np.1 1,·,m· int,·111dH111 1,·rn1, m wmh,m l'\1qrng ll',llllll''
1h1 !'oh nnmial I 1•111un•, :
• lntn'<lu,, p1 11\n11m1,1l l\',11111r, 111 ,·.1ptu1,· non lull',111,·la11011, h1p,.
• I 111 111s1,1111 ,. ,qu,,r,· 111 ,uh,· c1·rta111 k.111111·, to arrn11111 lrn qu,1tlra111: or rnh,r pallcrn,
t \'I n,111,•11\ionnlil ~ R1•cl m·Iion:
• \ ,,. t1·d1111q111, h~,· P11nllp,II Cnmpnn1·nt \ n,llys,s (PC A) to rcdun: the cl11ncnsronal11} of the data,cr while
11·t 11111n~ ,·,wnt,.!l 111h1r111,11111n
-1. Until ~pllttin)!:
tnl I rmninJ! nnd r,•, tin)! SN,:
• Spl11 111ur dat,N·t 1111n tr,1111111!! and 1l·,1111g M:L, ro c1.1luu1c your model\ pcrfomiancc on 1111\ccn duta
Mocll'I t'rn inint-::
111\ Choow n \lodl'l :
•
'1,·k,·, •' ,u11,,hk al!!11n1h111 h<1, cd 011 }Our problem (For c, amplc, rcgrcs~ion. cla\sificarion) nnd darn
hnr.11·1c11 ,11r,
1
• A~,l·,, lhe 1nodcl'~ pcrforntuncc on th-.! tc~ting set to c~timmc it~ gcncrali1ation capability.
(hl Finc-tuuing:
• If n1·cdcd. linc-1u11c hypcq1arnmc1cr, to improve performance.
7. ll t ruth c Proctss:
(n) Rl'finl'mcnt :
• Bu,cd on model pcrl'o1lllillKC. go bac~ 10 feature engineering 01adjust rhe model architecture.
(bl Cross-V11liclntio11 :
• Pcri'or111 cro~, -, alldat1011 10 cn~urc robustncs~of your model.
8. Ucploymcnt:
• Once ,at1sfii.:d w11h the model. deplo) 11 to make pn:dictions on nc". unseen data.
FEATURE ENGINEERING ON - NUMERICAL DATA AND CATEGORICAL
DATA AND TEXT DATA
What is Feature Engineering ?
[ffl
~....: 1~· *c:::::J
natm:- -----. (QQ[ ---. * c:.:J _ .
lif\:HHHi!
Fig. 1.3
englnoorlng
lldv11nccd /llgor11hm11 In Ill ond Ml Modl'I Selecllon and Feature -
1.7
- • 1(1 ,.,
l 'v.:. .,., ~ ; "'I.:- .,.t~h ,,~ ...• '\'l ' ,, 'l \\\'l\ ..... , "'
'O
0
,d:
E
iii 2
~ .................. ,
................~.
-1
2ro
1s=------r
-2L _ __.i.o_ _ _~ s - - - : . 1;0;-------;-;
-S Alcohol
Fig. JA
The impacl of Swndardiza1ion and Normalisation on the Wine d111ase1
Methods for Scaling:
Now, since you have an idea of what is feature scaling. Let us explore what methods nrc available for doing fcatu~
scaling. Of all the methods available. the most common ones art!:
Normalization:
Also known as min-max scaling or min-max nonnalization. il is the simplest method and consists of rt:scaling the range
of features to scale the range in (0, IJ. The general formu la for normalimtion is given as:
, x - min (x)
x = max (x) - min (x)
Here. max(x) and min(x) are the maximum and the minimum ,alues of the feature respectil"ei}.
f.t1111111rott Alu111lll1111• 111 Al 1111,1Ml I II
\\, , 111 •1 '' "" 11101 1111 111.,1, .. 11 '''" d1ll,·1, 111 11111 l\,d~ 1t11, ,,1111pl,·, 1 h1111•,111r 111 h 1v..: 1111 \ 11 1,1hh L1y111)\ 111 •111 Y f,i, hi
'"" , 11 1 111,I h ill111~ 11 ti 1111111h(1\ 111,. \, ,It• 111 ,1111,r 1,1,111 ,., 11 11111 11111 ,111 M 1111 1,,1 11,.\ 111 , hi, 1h1 1111111111,1 hcc111111·,
\
' I I
I\ 111111 ~llih 111
111,1\I\I 111111( \l
\l11111h11111111111111
1' 111111 ,1.u1ot.,,,h 1111111 111 ,1~,·, lh1· , 11111, 111 1•,11 h 11 1111111 111 1111' d111.i have ,rin mr,111 1111(1 111111 var11111n· l ht general
1111 lhnd 11 1'.ii, 11 111111111 " "' d111•11111111· 1111· dl\l11h1111111111111111 111111\111111l,11d dn111111111 ll1r 1·11d1 lrn1u1c and c11k11l,11c !he new dal,I
p, 11111,,, 1111 l11l l1111 11lf li1111111l,1
!l
l k1I' n "1111· ,11111d 11d d,·1111111111,111 h1· k-111 111 ,· v,•1 1111, 1111d, i, th1· 11vc111gc ol 1hi.: f..:u111rc vi.:~·1or.
~rnll 11 1,: lo 11 1111 h•n.:lh : I h1· 111111 11I 1111, 1111·1hod " 111 ,1·:ik 1h1• 1·11111pn111·111, nl II ka1uri.: vec1nr wch 1hat 1hc complete
,,., 1111 l111\ k11r1h 111w I h1, '"1111II ) 1111·1111, d111d111f l'llrh 1·11111p1111c111 hy 1hr h1chdc1111 lc111,t1h nl the vec101
\
11\11 11\111, 1h1· I 11L111k1111 lc11g11t ol 1hi.: f'l:atun: veclor.
111 .idd1t11111 11 1 1h1· .1ho1r I w1dl'iy 11\l'd 1111·thnd,. !here 11rc ,n111c 01hc1 1m:1hod~ to sc;ilc 1hc f'ca1urc, v11. Power
I 1,111sl,1111w1 , (.l11.11111k 11.111, l,11 1111·1 . Ruh1" 1 Sl'llk1 cir Fo, 1hi.: ~1·0111· ol 1hi, d,~cu~sion. we arc dclibcra1cly 1101 diving 1n10
till' drtad, 111 tlw,,· t,·, h111qu,·,
'1'111· 1111lllo11-floll11r 11111·,11011: Nol'llu11i1.11llt111 or· St11nd11nli:t.11tlo11
II you 11111 r 1' 1 l'I hut11 11 111111·l1111L' lt:11111 i11g pipd inc. ym, 11111st huvc 11lw11ys l'uccd this questiu11 or whether 10 Normalize or
111 S1.111tl,11d111· Wh1k 1111•11· i, 110 olwio11, an,wcr 111 1his quc,1ion. i1 really depends on the applica1ion. there arc Mill a
l1·11 ~•1·111•1.!11111111111, th,11 l':tll h1· dr11w11.
Nornmli111tlo11 i, g11nd In uw whi.:11 1hc di~11ihu1ion of' dat11 docs 1101 follow a Gaussian clist, ihution. II can he useful in
11l):1H 1th11h th111do 11nt 11,~11111e 1111y d1~11ibutin11 or Ilic da111 like K Nearest Ni.:ighbors.
In N1:111 ,1I Nc1wnr~, 11lg111i1h1111h11t rcquin: dutn 011 11 0 I scnlc. 11or11111li1111ion is an c~sen1ial prc-proccs,ing step. Another
p11p11 l11r 1·, 11111pk or d11111 11on1111 l11a1i1111 i, i11111gc procc,,i ng. where pixel i111cn~itics have 10 he normali,ccl to fit within a
1·,•1ta1111.111g1· (1..:., 010 25'i for lhi.: RGB colm runge).
St1111d11rdlznlio11 rnn he hdpful in cnscs wlwrc the t1111a follows a Gaussian distribution. Though thi~ doc~ not have 10 be
ni:c..:"111ily11 11..: Si 11t-c ,11111danli1a1io11 doi.:s not h11vc II bounding range, so, even if' there arc outliers in the duta, they will not
he 11i'i'l:e1cd hy ~tnnd11 rd11a1io11.
In t:l us1ering analyses. st11ndardi1atio11 comes in h1111dy 10 compare si milarities between features based on certain distance
lltl'Hst11'l:s. 1\notht:1 pro111in..:111 example is 1hc Pri11cip11I Co111ponent Analysis, where we usually prefer s1andardi1:a1ion over
tl'lin -M11, sl'llling since we urt: interested in the componcn1s thal 11111xi 111izc the variance.
Thl'.1..: arc so1m: poi nts which can be considered while di.:ciding whe1hcr we need S1andardiza1ion or Nornmli1a1ion
• S11111dnrdia11ion 111ny be used when datn reprcsen1 Gaussian Distribu1ion, while Normalization is great with Non-
Gau~sian Distribu1ion.
• lntpuct of Outlier~ is very high in Normali,ation.
To co11d11clc, you can always start by lilting your model to raw, normalized, and standardized data and compare the
pct l'on11anci.: for the best results.
The link between Dah1 Scnling und Dat:1 Leakage
To apply Normali,a1ion or S1andarcliLation. we can use the prcbuilt !'unctions in scikit-learn or can create our own
cus1on1 function.
Da1a lca~ngc mainly occurs when some information from the training data is revealed to the validation data. In order to
prcvcn1 1hc same. the point 10 pay u11en1ion to is to f'it 1he scaler on the train da1a and then use it to transform the test data.
Deline Feolurc Selection:
Feature Selection is cleli ned us, "Ir is a process of a11ro11wriclllly or 111c1111wlly selecri11g rhe .rnbser of 111osr llppropriare and
relt'w1111 jea1111·1•.1· ro ti,, 11.l'i'd i11 model b11ildi11g."
What is Feature Selection?
Feature is 1111 attribute that ha~ an impact on a problem or is useful for the problem, and choosing the important features
for the model i, known a~ f'ealure selection. Each machine h.:arning proces~ depends on feature engineering, which mainly
contains two proccs,es. which arc Feature Selection and Feature Extraction. Although feature selection and extraction
Advanced Algorllhms In Al and M
_L_ _ _ _ _ _ _ _ _ _1_10 Modol Sulnc.llon nnd f Nllllrii I nutn,, , "V
91
pr<X'C"C'' ma) h,1,c 1hc ,aml· nhjcc1i,c, hn1h arc complc1cl 7 thflcrcnl frnm carh olhu I he 111a 111 dil k rclll l lw1wc 1111~
1111
1h,11 lc,llmc ~clccllon ,, ahoul ,clccung 1hc ,uhscl ol 1he original fca1111c ,cl, whcrl·a, ri:1111111· cxli ,il.11011 <'lrlllC\ 111'\\ lt,1111rr
h:.ilurc ,clcc110n I\ a "a) of reducing the inpul hlrlahh: for the model hy U\111)-' onl y 1clcv11 11 1 d,il,i 111 ordr r Ii, rcrhitt
O\Cr1i11111g in 1hc modd
Sn. we can dcline feature Selection a,. "It i.1 a 11roces,1 of fl11lo11111tically or 111f11111ally .v1•/1•ct/11,: t/11• ,1i,/111•1 of 1111 1
11
appropriate aud re/evaut feat11res tfl he 11,ed ;,, model b11iltli11g " l·e.ilurc ,clcct10n " pcdo1111ed hy e1llll'1 111d11rl1ny lh,
1mpona.n1 fca1un!\ or excluding the irrclc,anl feature, in the d,11a,c1 wr1hnu1 ch.ing,ng them
1'cccl for Feature Selection:
Belorc 1mplemcnttnl! any 1cchn1quc. i1 " 1mponam 10 under,1and. need lor lhc 1ed1111que 1111d \II for Ilic f·c,,1t11c
SdeCllon. As we know, in machine learning. 11 1, ncce~sary 10 provide a pre-prncc,scd and good 111pu1 data~cl 10 gel heller
outcomes. We collect a huge arnounl of data 10 train our model anti help ii to learn heller. Ucnerully, lhc daur~c, c1111~1,1, Ill
1101\} data. trrelcvant data. anti some pan of useful dala. Moreover, the huge amounl of data ,il,o ,low, down lhc 1r,11n111g
proce,, of the model. and "11h noise anti 1rrelevan1 da1a, the model may nol prcdrcl untl perlonn well. So. ii " very m:ccm,y
to rcmO\ e ,uch noises and lc,~-impon.int data from 1he da1a,c1 and 10 do 1hi,, and l·ea\Urc selccliun 1cd1111quc, arc u~cd.
Selecting 1hc be\l fcaturt:, help, 1he mo<lel 10 perform well. For example, wppo,e we wunl lo create a model 111111
automaticall) decide, which car , hould be crushed fo1 a spare pan, and 10 do thl\, we have a datu,el. Tiu, dat,l\cl Lc1111ai11, ;,
Model of lhc car. Year, Owner', name. Miles. So, in this dawset. the name of 1he owner docs nol con1ribu1e 10 the model
performance a, ii docs nm decide if the car ,hould be crushed or not. ,o we can remove 1his coluJJrn and ,elccl lhc re~, of lhc
features (column) for lhc model building
Belo,~ are ,ome benefils of using fc,11ure selection in macl1111c learning:
• It helps 111 avoiding the cur,e of dimensionality.
• 11 help, in the simphfica1ion of 1hc model so that it can be easily in1erpretcd by the re-,carchcr,.
• II reduces the training time .
• It reduces overliuing hence enhance the gcnerali7alion.
Feature Selection Techniques:
There arc mainly two types of Feature Selection techniques, which arc:
• Supervised Feature Selection technique: Supervised Feature selection techniques consider 1he 1argcl v,criahlc and
can be used for the labelled dataset.
• Unsupervised Feature Selection Lechniquc: Unwpcn 1,cd Feature \elcclion techniques ignore lhc turgcl varrahlc
and can be used for the unlabelled dataset.
Supervised Unsupervised
Feature Selection Feature Selection
Regularization
Missing value Forward Feature Selection
L1,L2
Random forest Backward Feature Selection
Information gain Importance
Fig. 1.5
1h ♦ 1
I 111111 I I• 111!
1l I 11 M,Hl~I pl ,1111111111
'I
I\,,\\ h' wl,\ l ,1 m,,,l,•I h.1,,•,I ,111 th,• l ,I\~'