0% found this document useful (0 votes)
36 views28 pages

Big Data Model Paper 1

The document discusses Big Data, its characteristics, and the challenges associated with it, including data structure and processing requirements. It also covers various analytics types, Hadoop ecosystem components, and NoSQL databases like MongoDB, highlighting their advantages and functionalities. Additionally, it explains the CAP theorem and provides a sample Hadoop program for word counting.

Uploaded by

Shrejal Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
36 views28 pages

Big Data Model Paper 1

The document discusses Big Data, its characteristics, and the challenges associated with it, including data structure and processing requirements. It also covers various analytics types, Hadoop ecosystem components, and NoSQL databases like MongoDB, highlighting their advantages and functionalities. Additionally, it explains the CAP theorem and provides a sample Hadoop program for word counting.

Uploaded by

Shrejal Patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 28
Qi> a. Wha is Biy Dala ? Fp ae chloe ote = neyo ny ve lage dotacets tna ore quested vapid, and Come frm nuetriple ounces « They are too cornpu% Jor padiional systos to trode . — pata Cam be Sbuckured (tables), Serrf-shuutured (XML, TSON), m unshuctured (videos , Images, ouclio, legs) — pate is Ceated ob a high sped, Cacal- me os neat vetal-hme) —Thadktonal data tools Uke RDBMS Gm't bamdle tt oppetnely. Chalter wit Biq Data MNolwMe t- + marine dataces as diy fart t +o Stove rromage. e nud dishibuled ple sqstorns lite HDFS. 4 space « . Backitp 4 oven] take more Hme ¢ 7 + Gxt lot data, online pans action Logs @ Nadoditg ioe data is ops . syptonas earl prow 4 qoute : « Roquines high. Hecougepuak processing engines. © Ski Proud dalection 4n bannichns . @® Nosiely t- o ulti ple | Heal, fenages y Vidoes Logs . » Hod to Create ome oluk om {7 a, types. . Dela hans Alum tools ae needed, ee ceeds free oe data, © Nesatity i= + Data may be sneonsistent , Invomplele oF | a (ila daelevant of incorrect dala. @ Scanned with OKEN Scanner anineceee a SS a o Taustunttrines acaraed ase laid do onsete. 0 Requires data validation teeluniques « ® Natue:- + Pleading mrcnting ul » nol alt data has busi nins eo noel 4 tools to deviue eneighic. eo Busines Inkelli gence % needed Yp yr hidden patterns . 8n formation is dijpiuwlh, teal value . 4b> pty Data Analatics 2 PAplain the et cc 4 Analyhtes. =>) Biq Data Analyte — Analyatng lawe dala +o fond. patizons, correlations 4 hrends - = provides buss ners 4 Adeendice taigiits - — Uses ML, Atel isles ; data rir beniq tts — xi Revommendakion eystens, paud detation- Classification o; His 1. Desoipive Amaly ties R- Caplauins what bappeneal in We past. + uses part data to qevitale auports 4 cummaaics. 6 helps understand ~“hends 4 Payormence - Gxt Sales oepont 4 haw Ladle. ae Diagnostic AnalytCs t+ @xplains Why somethin pened, + uses dill-deuin 4 data dove ce 4g hp fe helps ident fy reasons ~~ suceerses® oy joilaes, © £x+ Dp to webUle Payie analysis, 3, Predictive Aralyics t- ¢ PorecasK what will hanpen 3n petute 2 uses ML rodels 4 Adolistical Lédrniques - @ Scanned with OKEN Scanner Y. Povscuipve Analytics :— + Rewmmends Getionsty act awe dutived outcomes. + uses ophwization amd ginulation algortth ms, : 5s automate = deef tons + + exi- Rouke Opinization jor ee 4 .e Broly ho ).0, 2-0 43.0 fay ex plain CAP Thue. => CAP Thorn — proposed by Eric Brau. — Tn dishibuted gyskims ; only tuo out 3 the gollouing Hee are quexanted + Concistenay y Availability Parti Hon Tolerance . ; Consistenuys— 4 eunry rode to the Susie fas the Same data af- te Adame time. 2 Reads always aetumm tre moet xecent wile. Avoilabi lit 1G system awponds to every requert » : 4 come nodes ae lout. e eneures quick Responses ptheugh may not be Paxttion Tolerance t— « syten conkinues to gj smenwages bl nodes are Lor. ust handle nebuorle joins graegety CAP Combi nalions ,— : 1 CA (No Partition Tolerance) ensures TT nee, @ Scanned with OKEN Scanner 7 CRC No Availabitite ) 3 shmy ee but no albicans eee d ee + AP_(No Consistency ): Always acsponseve beet mma retum old data, Congistenty Q2> b) What is Nuosge? eae te charaditi¢nes 0} NUosge. => NwSOL — wedim RDBMS > combines haditenal SQL paras wit NoSGL pruyormance benudk » — Avppoks SQL> Uses He Atowdlerd SAL quay lengunge. — Awmproved Perjormante = Scales hovizen tally 4 uppers Vag Uroug pur, 5 — designed bs cloud > butt yor dishhbuted envivmmenks . CAP § Charackrighs 0) NuoSQb ( ACID cempliane — ensures Adbeloility 4 Consistoney 4 hansactions, (CO Hip Scolmbs ty — wares qypcenity wilh letge cLatas eld. ® ada Coneuweniy — managers many users 4 operations Gmullancously - © Reat- tive Data Hamelling — Supports Hme~ sensitive applicakions : @ Scanned with OKEN Scanner Qed Explain the Hadoop Ecosystem Components yor Dada preasing 4 Dela Analysis . —_= a] 4 Cove components ole 3. HOPS (Hadoop dishibured File System) — Brres big data Quo diasibuted machines . — Breaks ls into blocks 4 Stores hom hedtndanity « — provides eee + ge availability. 8, MapRedu ce : w Pebaearorti model for prowwing large dea Self I porate. S ee °4 Map Cpilewing 4 Sovting) g Restuse Capgrey ting) portions. — works epg terrtty aun clusters - 3. YARN C yet omonar Resource neq oHator) ey CLUsHA FLSOUACLS - — Schedules and monitors jobs. : ; — derouples ego Ce momnagenaurt pom jee prowoning . Additional tools : ies 6 y. Hie f- + provides sQu-dike dikeapace for queuying in : uel ge oct qummarizalion and omalytts « gigi «Aoagtng pogor” i omalyadag ents diet . surpts converted fo MapRecluce fnkernabia . sane aT @ Scanned with OKEN Scanner f én — Nos@L database on bop 4 HDFs . — Seepports Atal-H me read | vorite 4. Sqoop — bensfers data jo RDBMS 4 Hadoop. ge. Flume — Collects 4 moves lange volunes | 404 data ty HDFS- . gq. Dorie ~- Cutork How engine +0 schedule op fobs \o » Zookeep — Coordinates Aishtbuted Zerices { racimbaing Conptquration 7 Bad vottn a meal diagram yexplain HDeS Arclaibedlite. => HDES Ardulidmre 4. NameNode 3+ ads as martin sour. + manages tie metadata ae eee blocks , Locations . ° keeps ade 24 whith pataNode hey whéch loleck . 2. DaANode 5 6 Sterts adual data blocks, « Sends periodic fueartberts +> NameNode, £ prjorns block creation, deletion , and ruplical on on tnshouel | 3. Sevendary NameNode > +» Takes poricdiio Srrapshots 4 NaweNode wutadata . . helps an recovery Cnet a bactup SOUL) + 4 .Ubink > + Interads wath NamuNode jo vend |uurtte dada , | comamunicakes directly with DakaNodes qr fe operations. 5. Blok Chrage —> « Hus axe Apr Ato bloccs ( ult ¢ [2eMmB) + Gary block %s Aaplioned Cagault 2 3 Copies) yor 4 WI nce. @ Scanned with OKEN Scanner GY Client 9s Srberasls unity NowmeNode te read | Wile doa - + Communi ces duiectty Wik Dakalodes pr pile operations 4 Datatwede 3. hres auual chita blots . 1 Gunds periodic ~“Wenrtbeads bo pyamuetlody . eye Biot Canon , delitior 4 npliotions 3b) Explain HDPS Daum. = Dawes rome 4, Nome Node ae ne eumrdany NoweNode —) ee DabaNode — > Atos 4 veiw blocks as inh «Blinds block 2eporis 4 Wuaatbe manager ile sigs rachel. © maivkaines nanny ace qe we 6 ee ae as ke en 9 elas | a ae metadata (22 angsty ge © helps in aac p by NaneNoele « @ Scanned with OKEN Scanner A. Onak point Node a: eae jpn mage yay f edi hogs + Supports paslet rem ey « 5. ea Node ——9_ » maintains peal H me Irnage ay le ae NaweNede qn asad Fre, wg oprralis gcy Give HDFS Commands jor fue yollouuing P a a) To o¢t gue dite of doutones f piles od were) { => Nadoop po--R] ¢ To cwalz a diuumy tn HDPS nn dor _medin | sample - r¥ pom loco ye cyptom Fo DES 0 0 a ple 12 oly — put | connaple | pet.txt| Samples . ne console (BD) To display contenis 1 om HDPS fale ™ 5) ndeop po cer [ Soompit | dosh tad - i 0 To remove a dimectorg yo HDFS adeop ye — an | Somaple d ‘ exglata the aunatonuy 24 HDFS enced ond wile. Pile Read Opmalion . 4, Utink sequixts for tle vead > tents ants NaweNo pee Wr ble Locals. Y PFO ETE GOES PE @ Scanned with OKEN Scanner 3 CARRE ME AWN MEE EE : Nowe Neds vespennds —> prrvides Uist q) DatatVodey hawirng a Hur tgle lolocks - (uel contacts PataNodes —» Conneds to Closer DalaNode bo _ read data _peads_in_ Blocks — data tc vead block ~by- block is Sequence. « Paraltel reading > is can be read tn parallel pon 5 oa raullple Data Nodes - file wuite Opuabion 4, Cenk rusk to acate pie. > sends Suquint 4p NameNode to Ocede a fle. 2. Block allocation > aneNode allocates lolocks 4 sds palaNodes, weve = @ Scanned with OKEN Scanner 3. Data pipeline ~ Clif Lovkts to your DalaNlode , voldietn foronnds yo Second 4com, fie 5 : a: - oxplicalinn 5 tach block %S veplicaled to owe wedles (- faut ta aeplicas). I 5s . Block Conkirmalton > DataNodes send a aero welgernent ayer aucun jl vorile , 4b) Tnaplnent Word Count Progen in Hadocp . S Word Count Progam Ovewteng A, obyelve t- « count tic numbs o} Sn a text fle 2 Mappa, class:- «splits each lane tnto words. «Gals Kaya valine pairs as (smd, 1), 1 rag eacr rama appends protic class Tokent2AMapper extends Mappe £ prvolic void Aeduce CTeat Kou, Herable < TniWr. alues lonlert context) § mney : qnt 4WN =0 5 gov CIntOvitatele val + values) § Sum 4 = val. gerd), ) 4 { conenk. LOM (Hey Nod IntOritalle (sum) 4 4 | 4 4 DBO Claas: > conyequaes Jobs mapper , recur, 4 pile paths . &. Exeudio Steps + Use Hadeop commands to compile 4 AU? Bs a ita ates a dabaser. 5y a) What ds sada hy MargeDB ? Ulushate Hre pros 1 or qemating a bantquue key j > MengeP8 pa i 4. NoSQL database —MangoB ds a dowumnent-oriented Nesge Nosg@l wee database . he Q. Ue BSon douuments — stores data ¢n Tson-Like BSON forme iki . Uses _BSON aouuments CBinauy TON). | |g feema les Shruckure — Colleutions don't require apre-dapined fi | enna | nail | a. Derigned yer falabitng — Supports aa qucritecune | 4 oherdlng . ; | oar Opvo Gowree — freely available + maintained by MangaDB | Ince I (oie @ Scanned with OKEN Scanner | Advan ages :~ J ‘ped fomanee — yale yead | vovile eae comm pated } RDBMS . &. Salability — Fasily scales Woricontaty sing shnrding 3. flerible_ Schema = yields Can aay betuuen dourments tn the game collection. 4 Rich Query language — Supprts pouungul queies , Indariong 4 aaaregation : : 5. Real-time Analytis — £uitable for big data 4 veal-B me apps: Creating o6 Genevaling a Unique Dad pads — bney document” A unique td pide which ate as the pamary + ay Automatic Guwution +— 4 wid te nal speciiged, MengoPB gudo- genase a la-byte jenadeimal objet la. 3) Custom Key 5 You cam maivually asiqn oy Uaague Velue to id. 45 fur- £ "eas lol, "name" % “eharjal® ; "byomth" + MALE DS" 4 sy Uniqueness t— MengeDB evrurss SA values are ombraat- uuitrin a Collection . | @ Scanned with OKEN Scanner : rae - Dee aby Anplont \miowd + Count - Cot — init -Stup 4 ote - aeqals tasing MengeDB eg pegels | jaa a sLowdl) Fwnnubio +> lounks douuments in a collation e * , } ~ cb. collection name , Count) LXE Alo» Studien. Count () 2 SatO funtion 4- Lots clo wmuenky descending CD oder. ~ tes felleaion mame . pind 0 SOL($ ed td wv 13) Cxt- Ab. Stusents « had C), gp CE masks + -43) || im ascending (1) oF i 8. MrnitO HOM t- nile te numbet ff clecumments sckurmned — db. colchonengme Yndo- Uamat Cn) gxi- absctudenls. Find C). init C3) 4. Skip o janchory 1 Akins A spect Fred number 44 douumenks . ee ea O. Hip (n) bxs- Ab: Hradents . ind C> “Sip (2) 5. Aggregate () punclion ; | i PASO 5 ros \K yroup§ pilering 4 re prmalions | } i i ae - db. Colethoy—name ragpregale C Cpipeting}) Sate db. Students agyreqate CL £fqroups Std t Papade', total : Seoums 14% Y 3) bay Cxplaim “Ue follocuinng MenqoDB. metrods with orowmples GeO) Find Oy petlyO » Coun , Sep 0. a = ee @ Scanned with OKEN Scanner 4. Save) muted > Iwsals of updates a doumunt . — db. Colledion . Save (document) ex db tudents » Save CSiids 1, names "Rahul 9) 2. Find mttrod —> Rebiours douuments frm. Collerkion . — db. wlteton . pind Cquerq) oat ato -rudants . ind (fname, * abut”) 2. pretty O metrod ~> formats dine output of yond) fo beer Yeadabili — db. Collum, bral, pretgo Aea- 4 “counht) method 3 Counts makeing downs - —db students . pnd CR grade * "A™S). countC) 5 SeipO) wttived > seipg tre ger N asus. db Mudenls “endl, Seip (8) 6bS MongoDB Commands por tae following opaations (o» Studints Colleton) , Given rudst ShudRoNNO, ChudNanae j Grade , Hobbies , Dos. Binet ee dels fon dt 3 db .Helends Trserk (£ Studketlias tol) StudNome "Rahul", Gres A", Hobbies + 0" Gvicket — DoT : hwo Date ("20a3 -o8-01 ") D @ Scanned with OKEN Scanner =e . - ana oe on) Display he details 4 Atudent having avllne fol, _sdb - stucents spina C{ StudRolltvo + tol 3) ji) Chamage duc rebby of Atudunts "Rahul" prom Cricket +0 Preiball. _ydb. students update C & StudName + "Rahul" 3, ggeer t S Hobbies ¢ [Fortbau"I 44 ) gy) Delele tre recorel era belonging to grade’ V 1 Sab Students remove (¢ Grade t"V"3) vy) Rusia the deteils 0) Atucdens vohose namo Afarle with! Bp) a db tents. pind Cg StuctNome+/% B14) Fay What 4s Hive 7 ist Hae yeatones oy Mae — Hiver— — Daa wdarehousing Tee system bualt on tp f Hadovp » _— mlows wes fo write quuts fn a Language Kinsler tosQl. — converts guns inh Mapkedarer. fobs totemally yor emustion — nasirnlay used to hondle Aluctured 4 sonaim Sbrictured clots - — works ULE with hasloop exoty stern 4 Supports Large dodws ets. J hie is om opensource dats yoarehouse dures > Ogu intertaces (HiveQL) —ewtlier *° SAL HAV» making | H camtr gor aunals 2 © sualability — procenses pelabytes dain uring Haoop clusters ® Envensability — Apps UDP S Coser dtginned potions) te tlond yaionaty A @ Scanned with OKEN Scanner LT 4 pauutt: Toleremt 4 Trhuils full — toleawncee fHadoop S. ~ Supports Difloent Shree Types winks with HOPS; Apa, Hpace , and Amazon $3, Thy Caplan Hine file porwalls « => Hive Ple_jonmats d. Teak File format > + Atal Clorage yormnal in Hive. * Storts plain tear data, Separated boy dilinvitens Clite Cav), « Sdmmple. but Slow yor large scale CG: ‘ 2. Sequante Fie Pormak > + Binary fe fom + Supprs Compremion 4 Hoes value pals , « fasten than Next fomat fe Lage Z.RCRile CRovw Column Comat) > hybrid format combi wo 4 Colum ig Atorge. ‘ pen ad vending a ba one 4 columns . 4, ORC CopHrmized Revo ~y © Stoves daka 8 Column wire tow ° Mah Compression rake 4 qo re Oa, . J + hudkalole for Road — doe optakonis . 5. Parquet Pommal—>Anowur columnar jormak opliwaiced Com pie nested data. . : Compt ble with Hue 4 or tools Uke Spark. acy Explain Burketing with an Craamaple . =) Bee a — tt dtuidoy Arta into pix numba “4 ples oe Nh bbascA m A hash yumtion a column. TEE RI ITO @ Scanned with OKEN Scanner SS —hLlUrtéi“C@ PS? TARE — uscd for better dala managewmen| 4 ebiicunk 4 YOU ag. = Parkhorm diuicdes deta Stele is if 4 eee auaid gnto Smaller PIs writin a Hon. ns — makes Joins 4 hawapling taster by reducing phe Acoma. | éxt- | CREATE TABLE Sfudents masks ¢ name STRING oe\lno INT, maaks INT. { 7) CLUSTERED BY (rllno) INTO 4 BUCKETS, , Tis Command ereakes 4 buckets bared om ellno. 1 ROU GAL asriqned to buckets wring a haabinng prston m ans mM opliniuing qwrt udth pitts or joins on yollno. Say wohat ds Piq 2 explain Hae. toy features °f Pig. a a wight — Apache Pia 43 a Wigh-leuel plagorm 4" prownting large ‘ a — Pig uses Prg Latin Scripts Hak are converted {nto ieRedle es — works with bogs 5 Haut, JSON, and mare» nee a — Soipls oueube as MapReditee jobs. Foatnrs © Pigs (& Gase to Use. > Aivaple 4e7iple Aedule ton of UDdFS ONAL TO B15) ~ + MAS. ToL shaniin. Hi © aptimszabon -» ply autonnabicaly opHinaizes extention plans - { Shea 4 com |@ Cntensoitity —> allows | | re @ Scanned with OKEN Scanner | [ 4. Schema Puvibility —» dousnly xtquine atid ‘ ag like RDBMS. 5 .dvinading 4 Baten Mode — Cam run fn Local of Hadoop tuk environments. | Bb with a neat Atagram , explain fo Anatonny °1 Pig , > _frrafone of Pig nm © Pig Latin Seip Te aap wacitiom by user to Pla Latin © Parn> chars syntax + builds a logical plam . @ oponiner -> ephinatees logical plan to create, eyicint Maple Jobs. ® Comeiler Converts ep wiized plan to physical plan | ® Exeuton Enasne 9 Gxeiukts Hat plan Pn Hadloop Crap Recluce ane): oo © -Datujtoro pollo steps pm input Cog --HDPS) to output Saul . - 43 Reduee ‘8 pig a |u + [ Pansex : [Ps optinntcnn tL] comptlea | Bey Explain amy ye welattonal, opusters 4 Pra with any | syovnple yer tauhn . ba a @ Scanned with OKEN Scanner | @ shamud Avalati > « hagpoks SL) Muab (ME) 14 pee sanneneneenanrneenest I — 4, Lead loads data bro HDFS oy local LHe. — LoAD | Students. 4x4! : ' > Arbo ; USING Pig Storage C',") As (name; chakauterl » age: int) ; o, PUTER > Filters Aetds bared on umdition. _») Br FILTER A BY age 71k S g, VORCACH.... GENERATE > Tramstorms data, cles plas. = C= PORCACH A GENCRATE Nowe y ages 4. GROUP > epoups retmds by a pala. —> De GRouP A BY age, 5. ORDER > Aorts dnta based on one oF more dls . 5s € = ORDER A BY name Accs . ° Aa §@-9 with a neak diagrann , veplain “Ue masa sealihes OY SPARC, yA Movin Peaures of Apaine Sprnte © speed > vIn-muwery procustng boots ees lo-l00X poster then Mapkiduce « : l + Ideal [ot teadliue algorsbiun 4 tnbractine quads. © Lust op Luce ~ + Sespperé AP\s tn Save sPyran ; Scala ond R. . base #9 voile applica ons using built fo Ubrartes . @ Scanned with OKEN Scanner

You might also like