0% found this document useful (0 votes)
3 views11 pages

Unit 1 Advanced Algorithm

The document discusses model selection and feature engineering in machine learning, emphasizing the importance of choosing the right model for specific data. It covers techniques such as cross-validation and bootstrap sampling to evaluate model performance and prevent overfitting. Additionally, it highlights the significance of feature extraction, scaling, and selection in improving model accuracy.

Uploaded by

priyankasakpal72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views11 pages

Unit 1 Advanced Algorithm

The document discusses model selection and feature engineering in machine learning, emphasizing the importance of choosing the right model for specific data. It covers techniques such as cross-validation and bootstrap sampling to evaluate model performance and prevent overfitting. Additionally, it highlights the significance of feature extraction, scaling, and selection in improving model accuracy.

Uploaded by

priyankasakpal72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

1...

Model Selection and



Feature Engineering
I~-::-:-- - - - - - - - - --]_
Chapter Outcomes...
\Iler reatling th1\ chapter. ,1udcnt~ will be able to understand :
• ln1m<luc11on of ~election of a mode.
• . ,eature~.
Tr,11n1ne' n model for super\ ·,sed learnmg r

I• . ·
h:aturc extraction 'and cngincenng · on numerical . and text data.
. data, cmegoncal
• The concept of feature scaling. feature selection.

I Learning Objectives ...


]
Select a suitable model for the given data with juMification.
Dc~cribe lhe process of using supervised learning on the given data.
00 De~cribe the process of feature extraction and engineering on the given data.
Compare feaiure engineering for the gi1en type of data.
[i] r t
Select ,ca ure ~ca1·mg. ,eaturc
r
sc Iccuon, ·
· d'1111ens1onal11y ·
· ·m the given
· reduction snua 1 with J·ustification.
· t·on

ID INTRODUCTION: SELECTING A MODEL


What is Model Selection?
"The process of selecting the machine learning model most appropriate for a given issue is known as model
selection."
th
Model selection is a procedure that may be used to compare models of the same type that have been set-up wi various
model hyperparameters and models of other types.
Why Model Selection?
Model selection is a procedure used by statisticians to examine the relati ve merits of different predictive methods and
identify which one best ftts the observed data. Model evaluation with the data used for training is not accepted in data science
because ii easily generates overoptimistic and overfitted models.
You may have to check things like:
• Overfi11ing and underfitling
• Generalization error
• Validation for model selection
For certain algorithms. the best way lo reveal the problem's structure lo the learning algorithm is through specific data
preparation. The next logical step is 10 define model selection as the process of choosing amongst model development
work nows.
So, depending on your use case, you choose an ML model.
[1.1]
-
A IV~•I rf1 111u1 1111,rn~ In Al 111111 Ml IP
7
llo1 11 1.. I 111111 I 1111 II, I ~1111111111 '"" 11111, I I ,111111111,

1h, 1111111 t I 111<11!1 I 1 l111111, 11, 1d Ii 11,,11, 11lulil1 11,, !1 11 11 11 ti ,111, 1 I• f '""''' I 1~111 LI•
lh1111tll) \111111 oll1111,11 iil,1l•tt l111,11

II I I 'II 1111 I 111111 Ill I IIJ1>d, I


I It 11111,I,, _ I" 1l111111111, I
11 1 , ' "'"'' 1111 11 1,
·• " 1,11 '" i.11111 , 1,, 1111v ,11111,d, 1 (11111 ,111 , 11,,,,, ,,~,ii• 1, ,,., 1,1 ''" ,1,, 11d,111 ,111111,, f
I I I" 111 lh1l 11

111111111- 11 1111 1 l1lr 11, II ~11111 11111111, ,111 11111111,111 1y 1,,1 1,., , , 111 IIIIIIV''\ 1111 ,1 vid• ,, j,,,, 1,11111111 1111,1111 ,,., '•l/111111, llu
11
1 1111 11 1111111111
\ 1 1 N, 111,il N, '""" 1111,d, I ,.,,,ti
l,1111 1 w11 l1111111 11 111111 ; 1,!111, 11111 111 11111p,111 d1 111111v I 11111111 h
1
ln l chtlo Ill ~, 11 11111111111 ' ,11111 l111 l r 11'1 1111 1111 111111,11 111 111111\ 111r111 1 ,Ill 1·111111111/"' Jf '/' 11 " ,,,,,1,1,-,,, f/11 111,1,c
111 I I Ii Ill I• 11 il,11.1

N1111111 l1 ul 11&,111: '1'1111 111,1y 11~r '111pp, 111 V,·, 1, 11 M111 11111, 1',VMJ 111111~11, l"J•I• ,,11111 ,11111 ,f,,, 1w,111,,., , 11 y,,111 ,1 ,
41
I 111 11111111 ,ti
1111\1 111 ~1•h I I 11111111111 1111~1•111111 1111• lu~k'/
I l11~~ll1rnll1111 I 11~11 ~: \VM l11v 1~111 1ryir~,11111 , 11 1111 ,I,·, " ''" I 11 11·~
llt-1111•~~11111 h,~k~: I 1111·111 II j.1 11•-~11111 , lt1111d11111 I 1111·~1. l'11l y111111111d ll'Y,11'~1111111 1·1,
I l11 ~lr1l1111 111~1,~1 h 11 11,111~ 1 li,,11n111µ, l11n1111 111m l I f11 ~1•· 1111p
I lw i.-1 1111·• di 111111 111111 11111111' 1yp1 11111!1111 y1111 l1 11v1 ,111d 1l w 1:1•1~ y,111 d11, y11111r111y 11~1; ;1 v;11 w1y 111 111111!1 +•.
M11t11·I S1•it•1111111 I 1•c·l,11l1 1111·N:

Rn11dorn
ProbfllJIIIHllc R,m mphng
Trnln/Toel

Cros'l Validation II Bootstrap I


Ff~. l.J
lfr~11 111pll1111 Methods:
A, 11lc 11111111; i111pli1:h, rcs,11npling 111c1hocl~ arc s1raigh1forw:1rd mc1hod~ c,f rearranging d&ta sample\ to \CC how well the
11u,dd pcil11nns on sampli;s of' darn i1 ha~ 1101 been lr~inc<l. Rcsampling, in other word,, enables us tCJ determine Lhc model's
gc11i;1ul II ahi Iii y.

'11 11.:n: ,11 i; two main 1ypes of' re-1,a111pling 1echniqucs:

( 'ros,-vuliclal 1011:
ll " a n.:sampling procedure lo evalua1c modeb by splitting the data. Consider a \ituation where you have two modeh
and want 10 determine which one is 1hc rnoht appropriate for a certain i~,uc. Jn this ca-,e, we can u-,e a cross-va lidauon
JllllCC~S.

Su, let's say you arc working on an SVM model and have a data~t that iterates multiple time~. We will now divide the
tlalascts 11110 a few groups. One group out of the five will be used as te~t data. Machine learning modeh are evaluated on te~
dala .iftcr being trained <in training data.
1.(!t\ say you calculated the accuracy of each iteration; the figure below illu~Lrate\ the iteration and accuracy of that
i1era11on.
Advonced Algorlthm9 In Al i!lld ML Eng c ring
Model Selecllon and Featu~
------ 1.3

Test D rrain D
Accuracy
Dataset

Iteration 1
[ J 88%

lteratton 2 [ I ] 83%

Iteration 3
I I I I I ] 86%

Iteration 4
I I I I I J 82%

Iteration 5
I I I I ] 84%

Iteration 6
I I I I J 85%

Fig. 1.2: Cross-Validation Example ... c


N I , ·I· I . . . d 8-l 4'R0 You now u~c Lile sam
0\\ · ct 5 ca cu ate the mean accuracy or all the 1tera11ons. which comes to aroun · •
procedure once again fo r the logistic regression model.
y . h SYM So accordino 0
to accuracy. you
ou can now compare the mean accu racy or the logistic regression model with t e · ·
might claim that a certain model is better for a given use case.
th
To implement cross-valid ation you can use sklearn.model_selection.cross_va l_scorc, li ke is:
>>> from sklearn import datasets. linear_model
>>> from sklearn.model_selection import cross_ val_score
» > diabetes = data,ets.load_diabetes()
>» X =diabetes.data[: 150]
»> y = diabetes.targetI:150 I
>>> lasso= lincar_modcl. La,so()
>>> print(cross_ val_score(lasso, X, y, cv=3))
10.33 15057 0.08022103 0.035318161
Bootstrap: 11
Another sampling technique is called Bootstrap, and it involves replacing the data with random samples. is used to
sample a dataset using replacement 10 estimate statistics on a population.
• Used with smaller datasets.
• The number of samples must be chosen.
• Size of all samples and test data should be the same.
• The sample with the most scores is therefore taken into account.
In simple terms, you start by:
• Randomly selecting an observation.
• You note that value.
• You put that value back.
Now, you repeat the steps N times, where N is the number of observations in the initial dataset. So the fi nal result is the
one bootstrap sample with N observations.
Probabilistic Measures:
Information Criterion is a kind of probabilistic measure that can be used to evaluate the effecti veness of statistical
procedures. IL~ methods include a scoring system that selects the most effective candidate models using a log-likelihood
framework or Maximum Likelihood Estimation (MLE).
f

111 111111ilh1!! 11111}' 1,,11, ,_ "" ffl•Hfrl I'' 1l11trn~1111 .. 1,111 1 1111,l111MI, fl(
I" 11111 flt""! 1,1111 '111111,I• t it )

I 1111111111
11 lflf r If I I, 1I '"" fl ! (I II 1

1111pl 1 lily c,,I 111 11 lfi1f11f,,J ,,1 l!Ht•I w/111 1111 I jt I II ''f If
111111,1, 1, 1 ti, I , 1 1111 111 ,111,11 11,hl ,If, f , 1rnl!l• I I ,,,,,,,, 1 I I 1,4 ,
11111
11111 1 If! 1l1111 I.tit,,, 11 Ill ,1,, 1d ''" ,I ,,111111! Ill's U l'I ,,, ,,,,,,,1., 1•1 ,,, ,, , ,,

ti t, '

A~11 ll 1 1111111111ull1111 I 1111111111 IA II 1 ,,, 1/// 11~1J,' ,,, I/ti,' t1 I ,,,


1 11 111 11
I I I
111 I ,I '"'"' 1111111• ,,, ,If "'' '" 11 111,,y I, 11 ,I,,,"'
1 1• ,, h' ' "
I I 11 ,,,,,,,,,,,,, l11 ,,,,, I
,,, ,,,, ,,~ ,.,,~ 1IJ1/if ,.
, 1 ,, 111 ,q,, lt 1111,11 , 1 ~ 11 1,,111,1, ,,,, , ,111 y h• IJ1l11 v ,, II 1 111 1 ., ,fm. v.i ;,,.-1 , , ~ .•
11,•~f• 14 ,1 11/111 r /fl fJfll/JJ! il! I ,. ~
I flWI l A II 11111111•- 1111 fll tlr 1111Jl1 ~ fl I 111 111,,tr4 II~
I'' rt.rli f Ir I Ill 1111 11 ,.,,1, I ,1• 1 111 11 y
,,v. ,,
I l11f (f Ii
/\ II
1 11
f lti 11111111,, 1 ,,/ ,1,, 1"'' I v1111,1l1l1
11
1
11 111 11 ' '
V.
I lt1 1111,,!1 I • JIii ,111' ' 1111·l1l11•111 f
,, I• 1, '"' ,,, 1f,1• 1,1w 11I ///,Ill 1fol;it£U/
, 4 ,1 ,. 111 111 ii,,. , i r1•~w·1 ,,, II, ' I
~ I 1, 1111' 11111111!1 I 111,1,,1 11 Ill I I"''" I ·I ,,~ ,. / I f,1'/1110 ,,,,,,, ill~ 1111_1• h fly,,1 f~ifl; ,,,,,.
• 11 ,, 111,,1 ,, ~,111J•Jd1·~ w,1 1, v•·w 111l11111y w111, , .
11 II f IIIIW I''" I " I "
I II Ii I~ 1r111d1I ',l1i1 l111Vf' i1 pi,,,, I11
llllllllllf' d 11i,1 I hr, 1111plll' 111111 111I ,, . Irr ,,, ,

Ml11l111111111J,·,c 1l111l111 • 1~'""'" (M Iii ,): 1,,


1 1 111111
1J:,w v,,,.,,,,•~\l•,11 n tk i,..,,, ;,_11,.,1 ii l l;;,fJ
, 11 • Mi ll 10111 1111 11t, 1'~1'11111 •1111111111111 ,,ll,,w, "'
/\,,''""'fl
I " ,1 I 11ml /111111~ 111•· , •1rr11·r ,,,,,,. ,,f ~,.,,, ,,, :1I """'"""'l!. Jl'At!e11,
111111'1 111111 111 11h~r1 vril d:1111 ',1111ply 1,111, II ,~ II 11•1 11111111•
l l'I ttj' ll il lllll 111111 111111 iillll' li'HIIIII IJ.'

Mill , 1.(111 It,(',:)


M11rlr l, I) '1111 11111/ld'- pwd11111111\

I.(hJ ,~ lhl' 111 ,111111'1 111


"
h,1 ~ 11l'rdcd ,,, ~x p11:¼~ 1111' 11111dcl.

(IJ)
1
1111111 brr 1111111~ ,wrdf'd 111 d1·\c11l11; 1'1 • 1111,rli,I', 1711· l1t l1 111\
1 1

I, 11

B11,v1·,h111 l11for11mlf1111 ( 'dH•1·l1111 /Ill( 'J:


Ill(' W/1\ dl'ti wd 1141111, Ilic 11 11 y1-~11111 p11ib:illil11y 11lr11 1111d j\ 11ppropr1:r1,.; fr,1 111111kb 11,:,1 w,e 1n:m11HJl1t lih:lif~~A
l''<llllllllillll tlt1 ri llj! 11 11111111!'·
I
Ill(' IJl(11kJ 2/11(1,J

whL'IC. I, is 1he 11111xi11111cd vul11t· ril 1111.; likelil101rrl li111t:1i1111 ol !he 11111tlcl.
11 i\ Ille 11 u111hcr ol d111 11 p11i111~.
k i~11,c 111111,hcr ol free p111·111,1c1e1·~ 10 he c~li111u1cd.
111(' i\ 11 Hirc w 111111011ly c111ploycd i11 1i111c ~cticH 1111d Ji11c111 n:grc~~i1111 11111dt:lk. J 11,wcvt:r, 11 1n;,y he ,,ppl1cd bm,1dly for
1111y 111odc.:I, hu~L"d 111111111xi 111t1111 proh:1hi li1y.
Strud urul Risk Ml11l111lwl1011 (SHM):
Thc1'l' 111 c i11,11111cc~ ol ovcrl'i11 i11!l when Ille 111odcl hcc<n11cH hia~cd l(,wanJ the 1rnin111g datu, which "it~ prim:11y •,ouru:
or lcnrn ing.
/\ gcnl'rul11cd 111odcl ,nu,1 fn.:quc111ly be chosen fro111 a lilTlil cd dal:1 ~cl in tt1ach1r1c learning, wh ,t:h lcc1di, to lhc muc of
ovcrliu ing whcn 1he 111odcl beco111c~ too liued 10 1hc Np<:cilh;~ or lhc t1aini11g ~cl w1d pcrfm 111; poorly on new data. Hy
Wl'ighi11g 1hc 111odc.:l'N c11n1plcx i1 y 11g11 in,1 how well ii fits 1he 1r,ii11ing d~III. the Sl<M print:i plt: ~olvc\ thi'i i,~uc.
N
l{.,,,1 (1) - N' 1: l,(y 1, l'(x 1)J I AJ(IJ
I
I Im·, J(I) iNthe n1111pkxi1y or 1hc model.
Advanced Algorithms '" Al and ML ---------~1-.:.S_______~M~od~e~l_:Se~l~ec:.!t.:::lo:.:,:n_::o::,:nd~Fe::a:.;.;lu;.:.r~e~E...ng"'"t_oo_o_rl_n-n
\ktm, for r, aluating Rcgrt~\lon \lode!,:
\h..J.:I c,.1lua111,n 1, rnK1al m machine lc,1ming. It ,1mphf1c, prescnlinf rnur model to other, and helps you umlt:r~l,i nd
h•"' 11.:II II perform, Sciera! c1aluullon metric, arc J1,1ilablc. hut onl) a fe11 can he employed 1111h regression.
• \ ll'on \hsolutc Frror( \I \ E): The t-.1>\F acid, up each error's ahsolu1e \'alue It is an important metric toe, aluatc a
1mxkl "\ nu can ,1mpl~ calculate \1AC b~ 11nportinf
from ,kleam metric, impon mcan_absolutc_crror
• Mean Square Error (MSt): While MAE handles all errors equally, MSE 1s computed by adding the squares of the
real output and the C\pec1ed output. then di\'iding the result by 1he total number of data points. It provides an exact
number mtlicat1ng ho11 much your findi ng~ differ from what )OU projected.
from ,klearn.melflc, 1mpo11 mean_,quarcd_crror
• Adjusted R Square: R Square quuntifie, how much of the variation in the dependent variable 1he model can
account for. Its name, R Square. refers 10 the fact 1hm II is the square of the correlation coefficient (R).
When comparing machine learning modeb. )OU must cl1oose a 1001 or platform that can support your team's needs and
~ our bu,ines, goal.
W11h Cen,ius. )OU can monitor each model's health in one pl.ice. use the user-friendly interface 10 comprehend models
and anal) ze them for panicular problems.
• haluate performance wi1hou1 ground 1ru1h.
• Compare the past performance or a model.
• Create personali1ed da~hboards.
• Compare performance be11reen model iterations.
TRAINING A MODEL FOR SUPERVISED LEARNING FEATURES -
UNDERSTAND YOUR DATA BETTER, FEATURE EXTRACTION AND
ENGINEERING
Training a model for supervised learning involves several steps, and underslanding your data is a crucial part of this
process. Feature extraction and engineering are techniques 1ha1 help you represent your data in a way that is conductive to
learning for your machine learning model. Here's a step-by-step guide:
1. Understand Your Data:
(a) Exploratory Data Analysis (EDA):
• Examine the structure of your dataset.
• Check for missing values. outliers, and anomalies.
• Understand the distribution of your target variable.
(b) Statistical Summary:
• Use descriptil'e statistics to summarize key aspects of your data.
• Identify panems, trends, and relationships.
(c) Visualization:
• Create visualizations (histograms. scaller plots etc.) to gain insights.
• Identify potential correlations between features and the target variable.
2. Feature Extraction:
(a) Select Relevant Features:
• Identify features that are likely to have a significant impact on the target variable.
• Remove irrelevant or redundant features !hat may 1101 contribute to the model's perl'ormance.
(b) Handling Categorical Data:
• Encode categorical variables using techniques like one-hot encoding or label encoding.
(c) Feature Scaling:
• Standardize or normalize numerical features to ensure they are on a similar scale.
• This is crucial for algorithms sensitive to feature scales, such as gradient-based optimization methods.
Ath nrcd Algo1llhnts In Al Md Ml ~----~-.; ,;.:.____
16 Modnl S11ler.llon ond I Miura CnglnP.orlll
_____ -...!l

·' I I ,11111 I "' Ill 'II Ill)!:


( I l rrr Ill \ n1 1,-.11111 ,·,:
l\i.1 ~ 111" ktlut 1111gl111,1pt111r 1mp11r1.1111 p.111,·111, ,1r ll'l,1111111,hr p,
llllll'',

• I'" n. mpll 1 ,11.1.t d.,t,· l1'.1tur,·, lm111 .1 t1mr,1.11np.1 1,·,m· int,·111dH111 1,·rn1, m wmh,m l'\1qrng ll',llllll''
1h1 !'oh nnmial I 1•111un•, :
• lntn'<lu,, p1 11\n11m1,1l l\',11111r, 111 ,·.1ptu1,· non lull',111,·la11011, h1p,.
• I 111 111s1,1111 ,. ,qu,,r,· 111 ,uh,· c1·rta111 k.111111·, to arrn11111 lrn qu,1tlra111: or rnh,r pallcrn,
t \'I n,111,•11\ionnlil ~ R1•cl m·Iion:
• \ ,,. t1·d1111q111, h~,· P11nllp,II Cnmpnn1·nt \ n,llys,s (PC A) to rcdun: the cl11ncnsronal11} of the data,cr while
11·t 11111n~ ,·,wnt,.!l 111h1r111,11111n
-1. Until ~pllttin)!:
tnl I rmninJ! nnd r,•, tin)! SN,:
• Spl11 111ur dat,N·t 1111n tr,1111111!! and 1l·,1111g M:L, ro c1.1luu1c your model\ pcrfomiancc on 1111\ccn duta
Mocll'I t'rn inint-::
111\ Choow n \lodl'l :

'1,·k,·, •' ,u11,,hk al!!11n1h111 h<1, cd 011 }Our problem (For c, amplc, rcgrcs~ion. cla\sificarion) nnd darn
hnr.11·1c11 ,11r,
1

(hl Trnin lht· \l ndl'I :


• h ·,·d th,· 1ra111111t' d,ua into the dw,cn modd .
• \d111,1 1110,kl parumcrcr, 11,111g 1cch n1quc, li ~c cros,-validn tion.
6. i\ locM E, 11t1111t i1111:
(n l E, nl1111tc on TC!lt Sl'I:

• A~,l·,, lhe 1nodcl'~ pcrforntuncc on th-.! tc~ting set to c~timmc it~ gcncrali1ation capability.
(hl Finc-tuuing:
• If n1·cdcd. linc-1u11c hypcq1arnmc1cr, to improve performance.
7. ll t ruth c Proctss:
(n) Rl'finl'mcnt :
• Bu,cd on model pcrl'o1lllillKC. go bac~ 10 feature engineering 01adjust rhe model architecture.
(bl Cross-V11liclntio11 :
• Pcri'or111 cro~, -, alldat1011 10 cn~urc robustncs~of your model.
8. Ucploymcnt:
• Once ,at1sfii.:d w11h the model. deplo) 11 to make pn:dictions on nc". unseen data.
FEATURE ENGINEERING ON - NUMERICAL DATA AND CATEGORICAL
DATA AND TEXT DATA
What is Feature Engineering ?

[ffl
~....: 1~· *c:::::J
natm:- -----. (QQ[ ---. * c:.:J _ .

~) / Raw data Features Insights

lif\:HHHi!
Fig. 1.3
englnoorlng
lldv11nccd /llgor11hm11 In Ill ond Ml Modl'I Selecllon and Feature -
1.7

f 1·11 1111 I I 1111111·1·1i111::


• I f tl . 1110s1 rclcv,,nr
,•n1111H·i:1111g rrlc" lo lhl' (H Ol'L" ol u,111i• domain ~111n1kdrc to ,clcl·t and trnn, on11 , ~
' ,1111L 11·
1,1r1,1hh, lto111 1111 1I I I • . ,r ,t 11~1,ca
1 l 1nodl' ing.
• ,111" 1m lllat111~ ,I p11:dKt1vc 111odcl 11\lng 111ach111c c.irn,ng ( •
1 ( l\,1I ·,lgorithrn,.
1
• 1h,· !!11•11 nl h ,lf111,· rng111t·,·r11111 ,111d ,dl'll 111n 1, 10 1111p1mt· 1hc pcdnnnancc of111,,chinc k.iming •
1);11 a l'n 1'111n·"lng:
• 1),,1., p1 rp1 111L'\Sl ll ~ i' .Ill llllflOll,1111\ll'fl111 1111; dala 1111111ng prnt·c,,
• II 1l'f1· " In ih,· 1 k,1111ng. 11.111, lnrming 1111d 11t1cg1a1ing ol dmu 10 111akc ii ready for analy,is. - . d· w
. ble for ,he , peed ic a
• 1hl' !! 0 ,il nl d.,1.1 p1,·pmi.:c"111g i, 10 1111p1 ovc the q11ali1y of 1hc dala and ICI make11 more ~ulla
11111111,g '"'~
h'uton· t•ugincning ll'l'hni1111c~ for n11111cricaf dnt,1, categorical datu. and text data separately:
f. N111111•rirnl Dain:
(n) Scnfing:
. · 1· r sc·1lc This is i mporwnl for
S1andmdi1t• 01 normali,c numerical fcawres 10 ensure they arc on a s1m1•1 • ·
algo1Hl1111, \l.!11,i1ivc 10 fca1ure ~cnle~.
(h) Uinning: .
. . hcl cap1urc non-linear
• Con,·c11 numrncal feature, in10 calcgorical feature~ by binning or buckc11ng. Th,~ can P
rcl.111on,f11p,.
(c) Polynomial Fcnlurcs:
• f111rod11cc polynomial fci1turc~ 10 captu re non-linear relationships in 1he da1a.
(d) Lui: Transform:
• /\ppfy a log 1rnn,for111mion 10 numerical lcalllre~ 10 handle skewed dis1ributions.
(c) lnlcractions:
• Crcalc 1111eraction lcrm~ between 1wo or more numerical feawre~.
(f) Outlier llandling:
• ltlcn1ify and handfi.: outl iers using 1cchniques such as 1runca1ion. transfonna1ion, or imputation.
2. Calcgoricaf Dala:
(a) One-llol Encoding:
• Convert ca1egorical variables in10 binary vcc1ors using one-hot encoding.
(b) Label Encoding:
• Tran~form categorical labels in10 numerical values if the ordi nal relationship is essential.
(c) Target Encoding:
• Encotlc categorical features based on the mean or median of the 1arge1 variable for each category.
(d) FrC(fUency Encoding:
• Encode cau.:goricaf variables based on 1heir freq uency in the data~et.
(c) Embeddings:
• Use embeddings for ca1egorical variables, especially use ful in deep learning models.
(f) Dummy Variables:
• Create dummy variables for ca1egorical featu res wiih multiple levels.
3. Text Dula:
(n) Tokcnizalion:
•Break text into individual words or subwords (10kcn iza1ion).
(b) TF-IDF (Term Frequency-Inverse Document Frequency):
• Conver! 1cx1 da1a into numerical vectors using TF-fDF 10 caplure the impor1ance of words in a document.
(c) Word Embeddings:
• Use pre-Irained word embeddings like Word2Vec, GloVc, or Fas!Tex110 represent words in a continuous vector
space.
(d) Bag-of-Words:
• Represent 1ex1a~ a bag of words, counting the frequency of each word .
IS

l' Tt',t I rncth.


l X
n fopk \ IcxkUir. _:
• L ~ tr('h! U;;'

·,p' J' ft" '


L '-<ntrmcnt \ l\ilv,i-.

- • 1(1 ,.,
l 'v.:. .,., ~ ; "'I.:- .,.t~h ,,~ ...• '\'l ' ,, 'l \\\'l\ ..... , "'

FE.\Tl'RE SC..\Ll~G. n:..\ Tl'RES SEl FCTlO~


M., i'1 h',11111\'' ,,r d.11,1 In d.11.1 l'"'-"',,11\S,
\\ nat ~ Fraturr ~ca.linf I n,1 ·t'-""knt , ,11 ,., •
Ft .., ~" r ~ '~ nit·tt-..'\l u,,-.1 r., 'W iwl'1<' tlk' r.rn~.,_- ,, I ' h<· 111 I l'l\'l'"'"''''t\,I! ,tq•
II , M
, I., "(l~ 1 ..., •u..,J lll.)111\lhlllll\'ll ,I,_," ,:,'ll<:•-11
... , l )
l"·rt," ''""I ,h nnf 1 '· · f l1<'11!hl. \\ IIII l h' •II· I,II~· ,, tn.
I'· •I ' l I'' ,.,)
1-1 hh' I_I!<' ~.,1.11 ' • .llh • I
Fo• n:imp c . wu h.11 ~ mu•, r _. r,i.·1,.·nJ,·nt '.1r.1 , , • . , ,uh! h,·lr 1h<·111 .,11 h' ~- in t "-' , .11111· r.1n~~.
• • . ·11 1,·.1111n: ,, .1hll~ Ill
, eJr'- . ,..:-.vvv-
'\ . " " ' ' ·<
.' o . Euro, I.•1nJ (I --' \ ('·t'•f'I (\',!'('-lilt · ' 11111 • , . ihn•' 11,·hm,1111•
I f 1~ I •' I''· \ '·' 'I ullltrnr
for e,:unple. l-entt'n'd .m,unJ O,,, nth,· r.lllf<' Il · '
l I\ J ·1-.·n-t,n•· •'
'"
' '· '"
I·1 ,1 th. 111.I,·1,·n.knt '.111. '
1t,( ·, 1,1 ,1k1•h11 .,m ' I 1111
·
111
1n
1 1
In order h', ,ual •e 1he Jti-,,.: kt u, t.,1.,· •10 <'\;lltll' ' 'h. L'CI 111 .1.-tiin<' k.1111111~ n·p,,, 1<' ) bi' '''' ~ '· ' l\: 11\-
. h l I J,·r-\,11<-J ' 111 1 ' 1 111I 1I 1t '!
the 11ine dJt.1.-et fn,m the ·\\ 1111: [).11.i-.:t t •1 ' I s t 111 f mli1.1m 11l •' 1<' • ·'''
·hm lit'' 1-.:,,rn1.1h •.111,111 ·"'' • • ' .
impart ot the nw mo,t ,,,mmM ,,:ihnf lt't '1 • 1 h ylne dotosot
:::====~A~lc~o~h~o~I~an~d~M~a~li~c~A~c~id~c~o;n~ta:1~11'..:o::..:t~e-1_ _ _ _ __ _
,.,.
7
oco lnout scale
6 oco Standardi:ed [N(µ=O . <' = 1)1
••• ~, rH113X scaled [min=O. max=1]

'O
0
,d:
E
iii 2
~ .................. ,
................~.

-1

2ro
1s=------r
-2L _ __.i.o_ _ _~ s - - - : . 1;0;-------;-;
-S Alcohol

Fig. JA
The impacl of Swndardiza1ion and Normalisation on the Wine d111ase1
Methods for Scaling:
Now, since you have an idea of what is feature scaling. Let us explore what methods nrc available for doing fcatu~
scaling. Of all the methods available. the most common ones art!:
Normalization:
Also known as min-max scaling or min-max nonnalization. il is the simplest method and consists of rt:scaling the range
of features to scale the range in (0, IJ. The general formu la for normalimtion is given as:
, x - min (x)
x = max (x) - min (x)
Here. max(x) and min(x) are the maximum and the minimum ,alues of the feature respectil"ei}.
f.t1111111rott Alu111lll1111• 111 Al 1111,1Ml I II

\\, , 111 •1 '' "" 11101 1111 111.,1, .. 11 '''" d1ll,·1, 111 11111 l\,d~ 1t11, ,,1111pl,·, 1 h1111•,111r 111 h 1v..: 1111 \ 11 1,1hh L1y111)\ 111 •111 Y f,i, hi
'"" , 11 1 111,I h ill111~ 11 ti 1111111h(1\ 111,. \, ,It• 111 ,1111,r 1,1,111 ,., 11 11111 11111 ,111 M 1111 1,,1 11,.\ 111 , hi, 1h1 1111111111,1 hcc111111·,

\
' I I
I\ 111111 ~llih 111
111,1\I\I 111111( \l
\l11111h11111111111111
1' 111111 ,1.u1ot.,,,h 1111111 111 ,1~,·, lh1· , 11111, 111 1•,11 h 11 1111111 111 1111' d111.i have ,rin mr,111 1111(1 111111 var11111n· l ht general
1111 lhnd 11 1'.ii, 11 111111111 " "' d111•11111111· 1111· dl\l11h1111111111111111 111111\111111l,11d dn111111111 ll1r 1·11d1 lrn1u1c and c11k11l,11c !he new dal,I
p, 11111,,, 1111 l11l l1111 11lf li1111111l,1

!l

l k1I' n "1111· ,11111d 11d d,·1111111111,111 h1· k-111 111 ,· v,•1 1111, 1111d, i, th1· 11vc111gc ol 1hi.: f..:u111rc vi.:~·1or.
~rnll 11 1,: lo 11 1111 h•n.:lh : I h1· 111111 11I 1111, 1111·1hod " 111 ,1·:ik 1h1• 1·11111pn111·111, nl II ka1uri.: vec1nr wch 1hat 1hc complete
,,., 1111 l111\ k11r1h 111w I h1, '"1111II ) 1111·1111, d111d111f l'llrh 1·11111p1111c111 hy 1hr h1chdc1111 lc111,t1h nl the vec101
\
11\11 11\111, 1h1· I 11L111k1111 lc11g11t ol 1hi.: f'l:atun: veclor.
111 .idd1t11111 11 1 1h1· .1ho1r I w1dl'iy 11\l'd 1111·thnd,. !here 11rc ,n111c 01hc1 1m:1hod~ to sc;ilc 1hc f'ca1urc, v11. Power
I 1,111sl,1111w1 , (.l11.11111k 11.111, l,11 1111·1 . Ruh1" 1 Sl'llk1 cir Fo, 1hi.: ~1·0111· ol 1hi, d,~cu~sion. we arc dclibcra1cly 1101 diving 1n10
till' drtad, 111 tlw,,· t,·, h111qu,·,
'1'111· 1111lllo11-floll11r 11111·,11011: Nol'llu11i1.11llt111 or· St11nd11nli:t.11tlo11
II you 11111 r 1' 1 l'I hut11 11 111111·l1111L' lt:11111 i11g pipd inc. ym, 11111st huvc 11lw11ys l'uccd this questiu11 or whether 10 Normalize or
111 S1.111tl,11d111· Wh1k 1111•11· i, 110 olwio11, an,wcr 111 1his quc,1ion. i1 really depends on the applica1ion. there arc Mill a
l1·11 ~•1·111•1.!11111111111, th,11 l':tll h1· dr11w11.
Nornmli111tlo11 i, g11nd In uw whi.:11 1hc di~11ihu1ion of' dat11 docs 1101 follow a Gaussian clist, ihution. II can he useful in
11l):1H 1th11h th111do 11nt 11,~11111e 1111y d1~11ibutin11 or Ilic da111 like K Nearest Ni.:ighbors.
In N1:111 ,1I Nc1wnr~, 11lg111i1h1111h11t rcquin: dutn 011 11 0 I scnlc. 11or11111li1111ion is an c~sen1ial prc-proccs,ing step. Another
p11p11 l11r 1·, 11111pk or d11111 11on1111 l11a1i1111 i, i11111gc procc,,i ng. where pixel i111cn~itics have 10 he normali,ccl to fit within a
1·,•1ta1111.111g1· (1..:., 010 25'i for lhi.: RGB colm runge).
St1111d11rdlznlio11 rnn he hdpful in cnscs wlwrc the t1111a follows a Gaussian distribution. Though thi~ doc~ not have 10 be
ni:c..:"111ily11 11..: Si 11t-c ,11111danli1a1io11 doi.:s not h11vc II bounding range, so, even if' there arc outliers in the duta, they will not
he 11i'i'l:e1cd hy ~tnnd11 rd11a1io11.
In t:l us1ering analyses. st11ndardi1atio11 comes in h1111dy 10 compare si milarities between features based on certain distance
lltl'Hst11'l:s. 1\notht:1 pro111in..:111 example is 1hc Pri11cip11I Co111ponent Analysis, where we usually prefer s1andardi1:a1ion over
tl'lin -M11, sl'llling since we urt: interested in the componcn1s thal 11111xi 111izc the variance.
Thl'.1..: arc so1m: poi nts which can be considered while di.:ciding whe1hcr we need S1andardiza1ion or Nornmli1a1ion
• S11111dnrdia11ion 111ny be used when datn reprcsen1 Gaussian Distribu1ion, while Normalization is great with Non-
Gau~sian Distribu1ion.
• lntpuct of Outlier~ is very high in Normali,ation.
To co11d11clc, you can always start by lilting your model to raw, normalized, and standardized data and compare the
pct l'on11anci.: for the best results.
The link between Dah1 Scnling und Dat:1 Leakage
To apply Normali,a1ion or S1andarcliLation. we can use the prcbuilt !'unctions in scikit-learn or can create our own
cus1on1 function.
Da1a lca~ngc mainly occurs when some information from the training data is revealed to the validation data. In order to
prcvcn1 1hc same. the point 10 pay u11en1ion to is to f'it 1he scaler on the train da1a and then use it to transform the test data.
Deline Feolurc Selection:
Feature Selection is cleli ned us, "Ir is a process of a11ro11wriclllly or 111c1111wlly selecri11g rhe .rnbser of 111osr llppropriare and
relt'w1111 jea1111·1•.1· ro ti,, 11.l'i'd i11 model b11ildi11g."
What is Feature Selection?
Feature is 1111 attribute that ha~ an impact on a problem or is useful for the problem, and choosing the important features
for the model i, known a~ f'ealure selection. Each machine h.:arning proces~ depends on feature engineering, which mainly
contains two proccs,es. which arc Feature Selection and Feature Extraction. Although feature selection and extraction
Advanced Algorllhms In Al and M
_L_ _ _ _ _ _ _ _ _ _1_10 Modol Sulnc.llon nnd f Nllllrii I nutn,, , "V
91

pr<X'C"C'' ma) h,1,c 1hc ,aml· nhjcc1i,c, hn1h arc complc1cl 7 thflcrcnl frnm carh olhu I he 111a 111 dil k rclll l lw1wc 1111~
1111
1h,11 lc,llmc ~clccllon ,, ahoul ,clccung 1hc ,uhscl ol 1he original fca1111c ,cl, whcrl·a, ri:1111111· cxli ,il.11011 <'lrlllC\ 111'\\ lt,1111rr
h:.ilurc ,clcc110n I\ a "a) of reducing the inpul hlrlahh: for the model hy U\111)-' onl y 1clcv11 11 1 d,il,i 111 ordr r Ii, rcrhitt
O\Cr1i11111g in 1hc modd
Sn. we can dcline feature Selection a,. "It i.1 a 11roces,1 of fl11lo11111tically or 111f11111ally .v1•/1•ct/11,: t/11• ,1i,/111•1 of 1111 1
11
appropriate aud re/evaut feat11res tfl he 11,ed ;,, model b11iltli11g " l·e.ilurc ,clcct10n " pcdo1111ed hy e1llll'1 111d11rl1ny lh,
1mpona.n1 fca1un!\ or excluding the irrclc,anl feature, in the d,11a,c1 wr1hnu1 ch.ing,ng them
1'cccl for Feature Selection:
Belorc 1mplemcnttnl! any 1cchn1quc. i1 " 1mponam 10 under,1and. need lor lhc 1ed1111que 1111d \II for Ilic f·c,,1t11c
SdeCllon. As we know, in machine learning. 11 1, ncce~sary 10 provide a pre-prncc,scd and good 111pu1 data~cl 10 gel heller
outcomes. We collect a huge arnounl of data 10 train our model anti help ii to learn heller. Ucnerully, lhc daur~c, c1111~1,1, Ill
1101\} data. trrelcvant data. anti some pan of useful dala. Moreover, the huge amounl of data ,il,o ,low, down lhc 1r,11n111g

proce,, of the model. and "11h noise anti 1rrelevan1 da1a, the model may nol prcdrcl untl perlonn well. So. ii " very m:ccm,y
to rcmO\ e ,uch noises and lc,~-impon.int data from 1he da1a,c1 and 10 do 1hi,, and l·ea\Urc selccliun 1cd1111quc, arc u~cd.
Selecting 1hc be\l fcaturt:, help, 1he mo<lel 10 perform well. For example, wppo,e we wunl lo create a model 111111
automaticall) decide, which car , hould be crushed fo1 a spare pan, and 10 do thl\, we have a datu,el. Tiu, dat,l\cl Lc1111ai11, ;,
Model of lhc car. Year, Owner', name. Miles. So, in this dawset. the name of 1he owner docs nol con1ribu1e 10 the model
performance a, ii docs nm decide if the car ,hould be crushed or not. ,o we can remove 1his coluJJrn and ,elccl lhc re~, of lhc
features (column) for lhc model building
Belo,~ are ,ome benefils of using fc,11ure selection in macl1111c learning:
• It helps 111 avoiding the cur,e of dimensionality.
• 11 help, in the simphfica1ion of 1hc model so that it can be easily in1erpretcd by the re-,carchcr,.
• II reduces the training time .
• It reduces overliuing hence enhance the gcnerali7alion.
Feature Selection Techniques:
There arc mainly two types of Feature Selection techniques, which arc:
• Supervised Feature Selection technique: Supervised Feature selection techniques consider 1he 1argcl v,criahlc and
can be used for the labelled dataset.
• Unsupervised Feature Selection Lechniquc: Unwpcn 1,cd Feature \elcclion techniques ignore lhc turgcl varrahlc
and can be used for the unlabelled dataset.

Feature Selection Techniques

Supervised Unsupervised
Feature Selection Feature Selection

Filters Embedded Wrappers


Method Method Method

Regularization
Missing value Forward Feature Selection
L1,L2
Random forest Backward Feature Selection
Information gain Importance

Chi-suqare Test Exhaustive Feature Selection

Fisher's Score Recursive Feature Elimination

Fig. 1.5
1h ♦ 1
I 111111 I I• 111!

1l I 11 M,Hl~I pl ,1111111111

\\ Ii , \l,,t, l \ ,I,, h,111'


\\ h, \ h'>l, I~, l,·,111,11'
1

111 \\ 1\1 \ 'h,,,,,, lh,• ll,•,, \1.,.1.-1,11 ~l.t.hllh' I , 1111111,· I

'I
I\,,\\ h' wl,\ l ,1 m,,,l,•I h.1,,•,I ,111 th,• l ,I\~'

1\ •,, 11h,· 1,-.111,11, s,-.1111,f


I 'l'l.1111 h -.11111,, 'wl,\·111111

You might also like