0 ratings 0% found this document useful (0 votes) 36 views 28 pages Big Data Model Paper 1
The document discusses Big Data, its characteristics, and the challenges associated with it, including data structure and processing requirements. It also covers various analytics types, Hadoop ecosystem components, and NoSQL databases like MongoDB, highlighting their advantages and functionalities. Additionally, it explains the CAP theorem and provides a sample Hadoop program for word counting.
AI-enhanced title and description
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save big data model paper 1 (2) For Later Qi> a. Wha is Biy Dala ? Fp ae chloe ote
= neyo ny ve lage dotacets tna ore quested vapid,
and Come frm nuetriple ounces « They are too cornpu% Jor
padiional systos to trode .
— pata Cam be Sbuckured (tables), Serrf-shuutured (XML, TSON),
m unshuctured (videos , Images, ouclio, legs)
— pate is Ceated ob a high sped, Cacal- me os neat vetal-hme)
—Thadktonal data tools Uke RDBMS Gm't bamdle tt oppetnely.
Chalter wit Biq Data
MNolwMe t- + marine dataces as diy fart t +o Stove rromage.
e nud dishibuled ple sqstorns lite HDFS.
4 space «
. Backitp 4 oven] take more Hme ¢ 7
+ Gxt lot data, online pans action Logs
@ Nadoditg ioe data is ops
. syptonas earl prow 4 qoute :
« Roquines high. Hecougepuak processing engines.
© Ski Proud dalection 4n bannichns .
@® Nosiely t- o ulti ple | Heal, fenages y Vidoes Logs .
» Hod to Create ome oluk om {7 a, types.
. Dela hans Alum tools ae needed,
ee ceeds free oe data,
© Nesatity i= + Data may be sneonsistent , Invomplele oF
| a (ila daelevant of incorrect dala.
@ Scanned with OKEN Scanneranineceee a SS a
o Taustunttrines acaraed ase laid do onsete.
0 Requires data validation teeluniques «
® Natue:- + Pleading mrcnting ul
» nol alt data has busi nins
eo noel 4 tools to deviue eneighic.
eo Busines Inkelli gence % needed Yp yr hidden patterns .
8n formation is dijpiuwlh,
teal value .
4b> pty Data Analatics 2 PAplain the et cc 4
Analyhtes.
=>) Biq Data Analyte
— Analyatng lawe dala +o fond. patizons, correlations 4 hrends -
= provides buss ners 4 Adeendice taigiits -
— Uses ML, Atel isles ; data rir beniq tts
— xi Revommendakion eystens, paud detation-
Classification o; His
1. Desoipive Amaly ties R- Caplauins what bappeneal in We past.
+ uses part data to qevitale auports 4 cummaaics.
6 helps understand ~“hends 4 Payormence -
Gxt Sales oepont 4 haw Ladle.
ae Diagnostic AnalytCs t+ @xplains Why somethin pened,
+ uses dill-deuin 4 data dove ce 4g hp
fe helps ident fy reasons ~~ suceerses® oy joilaes,
© £x+ Dp to webUle Payie analysis,
3, Predictive Aralyics t- ¢ PorecasK what will hanpen 3n petute
2 uses ML rodels 4 Adolistical Lédrniques -
@ Scanned with OKEN ScannerY. Povscuipve Analytics :— + Rewmmends Getionsty act awe
dutived outcomes.
+ uses ophwization amd ginulation algortth ms,
: 5s automate = deef tons +
+ exi- Rouke Opinization jor ee
4 .e Broly ho ).0, 2-0
43.0
fay ex plain CAP Thue.
=> CAP Thorn
— proposed by Eric Brau.
— Tn dishibuted gyskims ; only tuo out 3 the gollouing Hee are
quexanted + Concistenay y Availability Parti Hon Tolerance . ;
Consistenuys— 4 eunry rode to the Susie fas the Same data af-
te Adame time.
2 Reads always aetumm tre moet xecent wile.
Avoilabi lit 1G system awponds to every requert » :
4 come nodes ae lout.
e eneures quick Responses ptheugh may not be
Paxttion Tolerance t— « syten conkinues to gj
smenwages bl nodes are Lor.
ust handle nebuorle joins graegety
CAP Combi nalions ,— :
1 CA (No Partition Tolerance) ensures
TT nee,
@ Scanned with OKEN Scanner7 CRC No Availabitite ) 3 shmy ee but no albicans
eee d
ee
+ AP_(No Consistency ): Always acsponseve beet mma retum old data,
Congistenty
Q2> b) What is Nuosge? eae te charaditi¢nes 0} NUosge.
=> NwSOL
— wedim RDBMS > combines haditenal SQL paras wit
NoSGL pruyormance benudk »
— Avppoks SQL> Uses He Atowdlerd SAL quay lengunge.
— Awmproved Perjormante = Scales hovizen tally 4 uppers Vag
Uroug pur, 5
— designed bs cloud > butt yor dishhbuted envivmmenks .
CAP §
Charackrighs 0) NuoSQb
( ACID cempliane — ensures Adbeloility 4 Consistoney 4 hansactions,
(CO Hip Scolmbs ty — wares qypcenity wilh letge cLatas eld.
® ada Coneuweniy — managers many users 4 operations
Gmullancously -
© Reat- tive Data Hamelling — Supports Hme~ sensitive
applicakions :
@ Scanned with OKEN ScannerQed Explain the Hadoop Ecosystem Components yor Dada
preasing 4 Dela Analysis .
—_=
a] 4 Cove components
ole
3. HOPS (Hadoop dishibured File System)
— Brres big data Quo diasibuted machines .
— Breaks ls into blocks 4 Stores hom hedtndanity «
— provides eee + ge availability.
8, MapRedu ce :
w Pebaearorti model for prowwing large dea Self I porate.
S ee °4 Map Cpilewing 4 Sovting) g Restuse Capgrey ting) portions.
— works epg terrtty aun clusters -
3. YARN C yet omonar Resource neq oHator)
ey CLUsHA FLSOUACLS -
— Schedules and monitors jobs. : ;
— derouples ego Ce momnagenaurt pom jee prowoning .
Additional tools : ies
6 y. Hie f- + provides sQu-dike dikeapace for queuying in
: uel ge oct qummarizalion and omalytts «
gigi «Aoagtng pogor” i omalyadag ents diet
. surpts converted fo MapRecluce fnkernabia .
sane aT
@ Scanned with OKEN Scannerf én — Nos@L database on bop 4 HDFs .
— Seepports Atal-H me read | vorite
4. Sqoop — bensfers data jo RDBMS 4 Hadoop.
ge. Flume — Collects 4 moves lange volunes | 404 data ty
HDFS- .
gq. Dorie ~- Cutork How engine +0 schedule op fobs
\o » Zookeep — Coordinates Aishtbuted Zerices { racimbaing
Conptquration 7
Bad vottn a meal diagram yexplain HDeS Arclaibedlite.
=> HDES Ardulidmre
4. NameNode 3+ ads as martin sour.
+ manages tie metadata ae eee blocks , Locations .
° keeps ade 24 whith pataNode hey whéch loleck .
2. DaANode 5 6 Sterts adual data blocks,
« Sends periodic fueartberts +> NameNode,
£ prjorns block creation, deletion , and ruplical on on tnshouel |
3. Sevendary NameNode > +» Takes poricdiio Srrapshots 4
NaweNode wutadata .
. helps an recovery Cnet a bactup SOUL) +
4 .Ubink > + Interads wath NamuNode jo vend |uurtte dada ,
| comamunicakes directly with DakaNodes qr fe operations.
5. Blok Chrage —> « Hus axe Apr Ato bloccs ( ult ¢ [2eMmB)
+ Gary block %s Aaplioned Cagault 2 3 Copies) yor 4 WI nce.
@ Scanned with OKEN ScannerGY Client 9s Srberasls unity NowmeNode te read |
Wile doa -
+ Communi ces duiectty Wik Dakalodes pr pile operations
4 Datatwede 3. hres auual chita blots .
1 Gunds periodic ~“Wenrtbeads bo pyamuetlody .
eye Biot Canon , delitior 4 npliotions
3b) Explain HDPS Daum.
= Dawes rome
4, Nome Node ae
ne eumrdany NoweNode —)
ee
DabaNode
— > Atos 4 veiw blocks as inh
«Blinds block 2eporis 4 Wuaatbe
manager ile sigs rachel.
© maivkaines nanny ace qe
we
6 ee ae as ke
en 9 elas |
a ae metadata (22
angsty ge
© helps in aac p
by NaneNoele «
@ Scanned with OKEN ScannerA. Onak point Node a: eae jpn mage
yay f edi hogs
+ Supports paslet rem ey «
5. ea Node ——9_ » maintains peal H me Irnage ay le
ae NaweNede qn asad Fre,
wg oprralis
gcy Give HDFS Commands jor fue yollouuing P a
a) To o¢t gue dite of doutones f piles od were) {
=> Nadoop po--R]
¢ To cwalz a diuumy tn HDPS
nn dor _medin | sample
- r¥ pom loco ye cyptom Fo DES
0 0 a ple
12 oly — put | connaple | pet.txt| Samples .
ne console
(BD) To display contenis 1 om HDPS fale ™
5) ndeop po cer [ Soompit | dosh tad -
i 0 To remove a dimectorg yo HDFS
adeop ye — an | Somaple d
‘ exglata the aunatonuy 24 HDFS enced ond wile.
Pile Read Opmalion .
4, Utink sequixts for tle vead > tents ants NaweNo
pee Wr ble Locals.
Y
PFO ETE GOES PE
@ Scanned with OKEN Scanner3 CARRE ME AWN MEE EE
: Nowe Neds vespennds —> prrvides Uist q) DatatVodey hawirng
a Hur tgle lolocks -
(uel contacts PataNodes —» Conneds to Closer DalaNode bo
_ read data
_peads_in_ Blocks — data tc vead block ~by- block is Sequence.
« Paraltel reading > is can be read tn parallel pon
5 oa
raullple Data Nodes -
file wuite Opuabion
4, Cenk rusk to acate pie. > sends Suquint 4p NameNode
to Ocede a fle.
2. Block allocation > aneNode allocates lolocks 4 sds
palaNodes,
weve =
@ Scanned with OKEN Scanner3. Data pipeline ~ Clif Lovkts to your DalaNlode , voldietn
foronnds yo Second 4com,
fie 5 : a:
- oxplicalinn 5 tach block %S veplicaled to owe wedles (- faut ta
aeplicas). I
5s . Block Conkirmalton > DataNodes send a aero welgernent
ayer aucun jl vorile ,
4b) Tnaplnent Word Count Progen in Hadocp .
S Word Count Progam Ovewteng
A, obyelve t- « count tic numbs o}
Sn a text fle
2 Mappa, class:- «splits each lane tnto words.
«Gals Kaya valine pairs as (smd, 1),
1 rag eacr rama appends
protic class Tokent2AMapper extends Mappe £
prvolic void Aeduce CTeat Kou, Herable < TniWr. alues
lonlert context) § mney :
qnt 4WN =0 5
gov CIntOvitatele val + values) §
Sum 4 = val. gerd),
) 4
{
conenk. LOM (Hey Nod IntOritalle (sum) 4
4
| 4
4 DBO Claas: > conyequaes Jobs mapper , recur, 4 pile paths .
&. Exeudio Steps + Use Hadeop commands to compile 4 AU?
Bs a ita ates a dabaser.
5y a) What ds sada hy MargeDB ? Ulushate Hre pros
1 or qemating a bantquue key j
> MengeP8 pa i
4. NoSQL database —MangoB ds a dowumnent-oriented Nesge
Nosg@l wee
database . he
Q. Ue BSon douuments — stores data ¢n Tson-Like BSON forme iki
. Uses _BSON aouuments
CBinauy TON). |
|g feema les Shruckure — Colleutions don't require apre-dapined fi
| enna | nail
| a. Derigned yer falabitng — Supports aa qucritecune
| 4 oherdlng . ;
| oar Opvo Gowree — freely available + maintained by MangaDB
| Ince
I
(oie
@ Scanned with OKEN Scanner|
Advan ages :~
J ‘ped fomanee — yale yead | vovile eae comm pated }
RDBMS .
&. Salability — Fasily scales Woricontaty sing shnrding
3. flerible_ Schema = yields Can aay betuuen dourments tn
the game collection.
4 Rich Query language — Supprts pouungul queies , Indariong
4 aaaregation : :
5. Real-time Analytis — £uitable for big data 4 veal-B me
apps:
Creating o6 Genevaling a Unique
Dad pads — bney document” A unique td pide
which ate as the pamary +
ay Automatic Guwution +— 4 wid te nal speciiged, MengoPB gudo-
genase a la-byte jenadeimal objet la.
3) Custom Key 5 You cam maivually asiqn oy Uaague
Velue to id.
45 fur- £ "eas lol,
"name" % “eharjal® ;
"byomth" + MALE DS"
4
sy Uniqueness t— MengeDB evrurss SA values are
ombraat- uuitrin a Collection . |
@ Scanned with OKEN Scanner: rae - Dee
aby Anplont \miowd + Count - Cot — init -Stup 4
ote - aeqals
tasing MengeDB eg pegels |
jaa
a sLowdl) Fwnnubio +> lounks douuments in a collation e
* ,
}
~ cb. collection name , Count)
LXE Alo» Studien. Count ()
2 SatO funtion 4- Lots clo wmuenky
descending CD oder.
~ tes felleaion mame . pind 0 SOL($ ed td wv 13)
Cxt- Ab. Stusents « had C), gp CE masks + -43)
||
im ascending (1) oF i
8. MrnitO HOM t- nile te numbet ff clecumments sckurmned
— db. colchonengme Yndo- Uamat Cn)
gxi- absctudenls. Find C). init C3)
4. Skip o janchory 1 Akins A spect Fred number 44 douumenks .
ee ea O. Hip (n)
bxs- Ab: Hradents . ind C> “Sip (2)
5. Aggregate () punclion ; |
i PASO 5 ros \K yroup§
pilering 4 re prmalions | } i i ae
- db. Colethoy—name ragpregale C Cpipeting})
Sate db. Students agyreqate CL
£fqroups Std t Papade', total : Seoums 14% Y
3)
bay Cxplaim “Ue follocuinng MenqoDB. metrods with orowmples
GeO) Find Oy petlyO » Coun , Sep 0.
a
=
ee
@ Scanned with OKEN Scanner4. Save) muted > Iwsals of updates a doumunt .
— db. Colledion . Save (document)
ex db tudents » Save CSiids 1, names "Rahul 9)
2. Find mttrod —> Rebiours douuments frm. Collerkion .
— db. wlteton . pind Cquerq)
oat ato -rudants . ind (fname, * abut”)
2. pretty O metrod ~> formats dine output of yond) fo beer
Yeadabili
— db. Collum, bral, pretgo
Aea-
4 “counht) method 3 Counts makeing downs -
—db students . pnd CR grade * "A™S). countC)
5 SeipO) wttived > seipg tre ger N asus.
db Mudenls “endl, Seip (8)
6bS MongoDB Commands por tae following opaations (o»
Studints Colleton) , Given rudst ShudRoNNO, ChudNanae j
Grade , Hobbies , Dos.
Binet ee dels fon dt
3 db .Helends Trserk (£
Studketlias tol)
StudNome "Rahul",
Gres A",
Hobbies + 0" Gvicket —
DoT : hwo Date ("20a3 -o8-01 ")
D
@ Scanned with OKEN Scanner=e . - ana oe
on) Display he details 4 Atudent having avllne fol,
_sdb - stucents spina C{ StudRolltvo + tol 3)
ji) Chamage duc rebby of Atudunts "Rahul" prom Cricket +0 Preiball.
_ydb. students update C
& StudName + "Rahul" 3,
ggeer t S Hobbies ¢ [Fortbau"I 44
)
gy) Delele tre recorel era belonging to grade’ V 1
Sab Students remove (¢ Grade t"V"3)
vy) Rusia the deteils 0) Atucdens vohose namo Afarle with! Bp)
a db tents. pind Cg StuctNome+/% B14)
Fay What 4s Hive 7 ist Hae yeatones oy Mae
— Hiver—
— Daa wdarehousing Tee
system bualt on tp f Hadovp »
_— mlows wes fo write quuts fn a Language Kinsler tosQl.
— converts guns inh Mapkedarer. fobs totemally yor emustion
— nasirnlay used to hondle Aluctured 4 sonaim Sbrictured clots -
— works ULE with hasloop exoty stern 4 Supports Large dodws ets.
J hie is om opensource dats yoarehouse
dures >
Ogu intertaces (HiveQL) —ewtlier *° SAL HAV» making
| H camtr gor aunals 2
© sualability — procenses pelabytes dain uring Haoop clusters
® Envensability — Apps UDP S Coser dtginned potions)
te tlond yaionaty A
@ Scanned with OKEN ScannerLT
4 pauutt: Toleremt 4 Trhuils full — toleawncee fHadoop
S. ~ Supports Difloent Shree Types winks with HOPS; Apa,
Hpace , and Amazon $3,
Thy Caplan Hine file porwalls «
=> Hive Ple_jonmats
d. Teak File format > + Atal Clorage yormnal in Hive.
* Storts plain tear data, Separated boy dilinvitens Clite Cav),
« Sdmmple. but Slow yor large scale CG: ‘
2. Sequante Fie Pormak > + Binary fe fom
+ Supprs Compremion 4 Hoes value pals ,
« fasten than Next fomat fe Lage
Z.RCRile CRovw Column Comat) >
hybrid format combi wo 4 Colum ig Atorge.
‘ pen ad vending a ba one 4 columns .
4, ORC CopHrmized Revo ~y
© Stoves daka 8 Column wire tow
° Mah Compression rake 4 qo re Oa, . J
+ hudkalole for Road — doe optakonis .
5. Parquet Pommal—>Anowur columnar jormak opliwaiced
Com pie nested data. .
: Compt ble with Hue 4 or tools Uke Spark.
acy Explain Burketing with an Craamaple .
=) Bee a
— tt dtuidoy Arta into pix numba “4 ples oe Nh
bbascA m A hash yumtion a column.
TEE RI ITO
@ Scanned with OKEN ScannerSS —hLlUrtéi“C@ PS? TARE
— uscd for better dala managewmen| 4 ebiicunk
4 YOU ag.
= Parkhorm diuicdes deta Stele is if 4 eee auaid
gnto Smaller PIs writin a Hon. ns
— makes Joins 4 hawapling taster by reducing phe Acoma.
| éxt-
| CREATE TABLE Sfudents masks ¢
name STRING
oe\lno INT,
maaks INT.
{ 7)
CLUSTERED BY (rllno) INTO 4 BUCKETS,
, Tis Command ereakes 4 buckets bared om ellno.
1 ROU GAL asriqned to buckets wring a haabinng prston m
ans mM opliniuing qwrt udth pitts or joins on yollno.
Say wohat ds Piq 2 explain Hae. toy features °f Pig.
a a wight
— Apache Pia 43 a Wigh-leuel plagorm 4" prownting large
‘ a
— Pig uses Prg Latin Scripts Hak are converted {nto ieRedle es
— works with bogs 5 Haut, JSON, and mare» nee a
— Soipls oueube as MapReditee jobs.
Foatnrs © Pigs
(& Gase to Use. > Aivaple 4e7iple Aedule
ton of UDdFS
ONAL TO B15) ~
+ MAS. ToL shaniin. Hi
© aptimszabon -» ply autonnabicaly opHinaizes extention
plans -
{
Shea
4 com
|@ Cntensoitity —> allows |
| re
@ Scanned with OKEN Scanner|
[ 4. Schema Puvibility —» dousnly xtquine atid ‘ ag like
RDBMS.
5 .dvinading 4 Baten Mode — Cam run fn Local of Hadoop
tuk environments. |
Bb with a neat Atagram , explain fo Anatonny °1 Pig ,
> _frrafone of Pig nm
© Pig Latin Seip Te aap wacitiom by user to Pla Latin
© Parn> chars syntax + builds a logical plam .
@ oponiner -> ephinatees logical plan to create, eyicint Maple
Jobs.
® Comeiler Converts ep wiized plan to physical plan |
® Exeuton Enasne 9 Gxeiukts Hat plan Pn Hadloop Crap Recluce
ane):
oo
© -Datujtoro pollo steps pm input Cog --HDPS) to output
Saul .
- 43 Reduee ‘8
pig
a |u
+
[ Pansex :
[Ps optinntcnn tL] comptlea |
Bey Explain amy ye welattonal, opusters 4 Pra with any |
syovnple yer tauhn .
ba a
@ Scanned with OKEN Scanner| @ shamud Avalati > « hagpoks SL) Muab (ME) 14
pee sanneneneenanrneenest
I —
4, Lead loads data bro HDFS oy local LHe.
— LoAD | Students. 4x4! : '
> Arbo ; USING Pig Storage C',") As (name;
chakauterl » age: int) ;
o, PUTER > Filters Aetds bared on umdition.
_») Br FILTER A BY age 71k S
g, VORCACH.... GENERATE > Tramstorms data, cles plas.
= C= PORCACH A GENCRATE Nowe y ages
4. GROUP > epoups retmds by a pala.
—> De GRouP A BY age,
5. ORDER > Aorts dnta based on one oF more dls .
5s € = ORDER A BY name Accs
. ° Aa
§@-9 with a neak diagrann , veplain “Ue masa sealihes
OY SPARC,
yA
Movin Peaures of Apaine Sprnte
© speed > vIn-muwery procustng boots ees lo-l00X
poster then Mapkiduce « : l
+ Ideal [ot teadliue algorsbiun 4 tnbractine quads.
© Lust op Luce ~ + Sespperé AP\s tn Save sPyran ; Scala ond R.
. base #9 voile applica ons using built fo Ubrartes .
@ Scanned with OKEN Scanner