0% found this document useful (0 votes)

49 views25 pages

Chapter 1 - Apache Big

This document provides an introduction to Apache Pig, a platform for analyzing large datasets. It discusses how Pig works by taking Pig Latin scripts written by users and converting them into sequences of MapReduce jobs. Pig provides a high-level language and execution framework for writing data flows and performs parallelization behind the scenes in Hadoop. The document outlines Pig's architecture and components and how it fits within the Hadoop ecosystem.

Uploaded by

Hai Do Viet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views25 pages

Chapter 1 - Apache Big

Uploaded by

Hai Do Viet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

BIG DATA ANALYSIS

APACHE PIG

Le Thi Minh Chau

Faculty Of Information Technology
HCMC University Of Technology And Education
Module Contents
2

 Introduction to Big Data and Hadoop

 Introduction to Pig
 Hadoop Pig Architecture

VB
LT
Module Contents
3

 Introduction to Big Data and Hadoop

 Introduction to Pig
 Hadoop Pig Architecture

VB
LT
Big Data and its Challenges
4

 Big data is a term for a collection of data sets so

large and complex that it becomes difficult to
process using on-hand database management
tools or traditional data. processing applications.
 System/Enterprises generate huge amount of
data from Terabytes to and even Petabytes of
information.
  It’s very difficult to manage such huge data…
VB
LT
Big Data and its Challenges
5

VB
LT
Why Hadoop?
6

VB
LT
Hadoop and its Characteristics
7

 Apache Hadoop is a framework that allows the

distributed processing of large data sets
across clusters of commodity computers using
a simple programming model.
 It is an Open-source Data Management
technology with scale-out storage and
distributed processing.

VB
LT
Hadoop and its Characteristics
8

VB
LT
Introduction to Hadoop
9

 HDFS
 Hadoop Distributed File System
 A distributed, scalable, and portable file system
written in Java for the Hadoop framework.
 Provides high-throughput access to application
data.
 Runs on large clusters of commodity machines.
 Is used to store large datasets.
VB
LT
Introduction to Hadoop
10

 MapReduce
 Distributed data processing model and execution
environment that runs on large clusters of
commodity machines.
 Also called MR.
 Programs are inherently parallel.

VB
LT
Hadoop Ecosystem
11

VB
LT
Hadoop Ecosystem
12

VB
LT
Module Contents
13

 Introduction to Big Data and Hadoop

 Introduction to Pig
 Hadoop Pig Architecture

VB
LT
What is Pig?
14

 It is an open source data flow language

 Pig Latin is used to express the queries and
data manipulation operations in simple scripts.
 Pig converts the scripts into a sequence of
underlying Map Reduce jobs.

VB
LT
Internalize Pig
15

VB
LT
Internalizing Pig
16

VB
LT
Why Pig?
17

VB
LT
Equivalent Java MapReduce Code
18

VB
LT
Internalizing Pig
19

VB
LT
Ways to handle Pig
20

 Grunt Mode
 It’s interactive mode of Pig
 Very useful for testing syntax checking
and ad-hoc data exploration.
 Script Mode
 Runs set of instructions from a file
 Similar to a SQL script file
 Embedded Mode
 Executes Pig programs from a Java
program
VB
 Suitable for creating Pig Script on the fly.
LT
Modes of Pig
21

 Local
 Need access to a single machine
 All files are installed and run using your local host and file system
 Is invoked by using the –x local flag.
 pig –x local
 Map Reduce
 The default mode
 Need access to a Hadoop cluster and HDFS installation
 Can also be invoked by using the –x mapreduce flag or just
pig
 pig VB
 pig –x mapreduce
LT
Module Contents
22

 Introduction to Big Data and Hadoop

 Introduction to Pig
 Hadoop Pig Architecture

VB
LT
Pig Components
23

VB
LT
Pig Programs Execution
24

 Pig is just a wrapper on top of MapReduce

Layer
 It parses, optimizes and converts the Pig script
to a series of Map Reduce jobs

VB
LT
Q&A
25

VB
LT

Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Unit V-Apache Pig
No ratings yet
Unit V-Apache Pig
10 pages
Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Chapter 5 - Introducing Pig Pig Architecture
No ratings yet
Chapter 5 - Introducing Pig Pig Architecture
81 pages
Digital Forensics and Cyber Crime: Sanjay Goel Pavel Gladyshev Daryl Johnson Makan Pourzandi Suryadipta Majumdar
No ratings yet
Digital Forensics and Cyber Crime: Sanjay Goel Pavel Gladyshev Daryl Johnson Makan Pourzandi Suryadipta Majumdar
261 pages
Apache Pig Tutorial
100% (1)
Apache Pig Tutorial
207 pages
BDA Unit-4
No ratings yet
BDA Unit-4
98 pages
Apache Pig Tutorial PDF
0% (1)
Apache Pig Tutorial PDF
21 pages
Big Data Notes Pig
No ratings yet
Big Data Notes Pig
38 pages
Bdaut 2
No ratings yet
Bdaut 2
66 pages
Unit 5 Lecture No-2 (PIG)
No ratings yet
Unit 5 Lecture No-2 (PIG)
101 pages
CSS 100 - Q&a
No ratings yet
CSS 100 - Q&a
37 pages
05a Pig
No ratings yet
05a Pig
52 pages
Ch-5 Transactions
No ratings yet
Ch-5 Transactions
65 pages
Unit-4 Bigdata Analytics: What Is Apache Pig?
No ratings yet
Unit-4 Bigdata Analytics: What Is Apache Pig?
47 pages
Unit IV - Big Data Programming
No ratings yet
Unit IV - Big Data Programming
17 pages
Pig
No ratings yet
Pig
61 pages
PCI Project
No ratings yet
PCI Project
19 pages
Enhanced AODV Routing Protocol With Reduced Overhead and Delay
No ratings yet
Enhanced AODV Routing Protocol With Reduced Overhead and Delay
27 pages
CPH Microproject
No ratings yet
CPH Microproject
14 pages
4 BigData Hadoop PigLatin
No ratings yet
4 BigData Hadoop PigLatin
25 pages
5 PIG and HIVE
No ratings yet
5 PIG and HIVE
81 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
14 pages
Notes 5 Unit Big Data
No ratings yet
Notes 5 Unit Big Data
23 pages
Apache Pig: Simplifying Big Data Analysis
No ratings yet
Apache Pig: Simplifying Big Data Analysis
10 pages
Chapter-3 - Pointer
No ratings yet
Chapter-3 - Pointer
45 pages
Unit 5
No ratings yet
Unit 5
76 pages
Big Data Unit-5
No ratings yet
Big Data Unit-5
81 pages
Harsh HTML
No ratings yet
Harsh HTML
8 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
3 Pig
No ratings yet
3 Pig
77 pages
NSE4 7.0 Exam
100% (1)
NSE4 7.0 Exam
112 pages
Pig Architecture
No ratings yet
Pig Architecture
7 pages
VMware Actualtests 2V0-620 v2015-05-12 by Isabell 138q
100% (1)
VMware Actualtests 2V0-620 v2015-05-12 by Isabell 138q
47 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Cse 17CS82 M2 S1 PPT
No ratings yet
Cse 17CS82 M2 S1 PPT
35 pages
BDA Unit5
No ratings yet
BDA Unit5
36 pages
4 Hadoop Ecosystem
No ratings yet
4 Hadoop Ecosystem
16 pages
Hype Cycle For Application Architecture and Integration, 2021
No ratings yet
Hype Cycle For Application Architecture and Integration, 2021
82 pages
Bda Unit Iv Notes
No ratings yet
Bda Unit Iv Notes
32 pages
Building Web Applications With Spring 3.0: by Bob Mccune
100% (3)
Building Web Applications With Spring 3.0: by Bob Mccune
45 pages
Apache PIG
No ratings yet
Apache PIG
41 pages
Pig Full Lecture
No ratings yet
Pig Full Lecture
38 pages
OOP2 - Lab 11 - Updated
No ratings yet
OOP2 - Lab 11 - Updated
5 pages
PIG A Big Data Processor
No ratings yet
PIG A Big Data Processor
49 pages
Value Stream Map and Process Cycle Efficiency Calculation For Wise Software Corp
No ratings yet
Value Stream Map and Process Cycle Efficiency Calculation For Wise Software Corp
3 pages
Unit 5
No ratings yet
Unit 5
10 pages
BDP U4
No ratings yet
BDP U4
58 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
BDA-Unit 5-Notes
No ratings yet
BDA-Unit 5-Notes
36 pages
06 Pig 01 Intro 1
No ratings yet
06 Pig 01 Intro 1
23 pages
Unit No. 8
No ratings yet
Unit No. 8
24 pages
Apache Pig: For Live Hadoop Training, Please See Courses
No ratings yet
Apache Pig: For Live Hadoop Training, Please See Courses
25 pages
Unit 4
No ratings yet
Unit 4
29 pages
Cyber Drill Topic
No ratings yet
Cyber Drill Topic
2 pages
What'S New in Fireware V12.5.3: Watchguard Training
No ratings yet
What'S New in Fireware V12.5.3: Watchguard Training
48 pages
CH 01 PPT
No ratings yet
CH 01 PPT
61 pages
PIG: A Big Data Processor: Tushar B. Kute
No ratings yet
PIG: A Big Data Processor: Tushar B. Kute
50 pages
Apache Pig Data Processing Guide
No ratings yet
Apache Pig Data Processing Guide
10 pages
The Use and Impact of Business Intelligence Tools
No ratings yet
The Use and Impact of Business Intelligence Tools
28 pages
What Is Apache Pig
No ratings yet
What Is Apache Pig
8 pages
Apache Pig
No ratings yet
Apache Pig
23 pages
Big Data Analytics
No ratings yet
Big Data Analytics
20 pages
Student Record System
No ratings yet
Student Record System
8 pages
Techskills Linuxsecurity 1 2 Managing Logs With Journald
No ratings yet
Techskills Linuxsecurity 1 2 Managing Logs With Journald
2 pages
BigData Unit 4
No ratings yet
BigData Unit 4
13 pages
Vmchange Used For Ejection of Tapes
No ratings yet
Vmchange Used For Ejection of Tapes
4 pages
PIG
No ratings yet
PIG
9 pages
Oracle Goldengate Online Training
No ratings yet
Oracle Goldengate Online Training
13 pages
Course On: Big Data Analytics
No ratings yet
Course On: Big Data Analytics
52 pages
Notes Unit 5 Bigdata
No ratings yet
Notes Unit 5 Bigdata
21 pages
Unit 4 Apachepig 210825041412
No ratings yet
Unit 4 Apachepig 210825041412
16 pages
Notes - 5 Unit Big Data
No ratings yet
Notes - 5 Unit Big Data
22 pages
Bda Unit 4 060115 Big Data Analytics Unit 4
No ratings yet
Bda Unit 4 060115 Big Data Analytics Unit 4
19 pages
Bigdata: What Is Pig?
No ratings yet
Bigdata: What Is Pig?
16 pages
UNIT 5 Complete Notes
No ratings yet
UNIT 5 Complete Notes
21 pages
SanAdmRepVac R1R Inst SFI
No ratings yet
SanAdmRepVac R1R Inst SFI
19 pages
Story Map Template
No ratings yet
Story Map Template
13 pages
Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce
No ratings yet
Apache Pig - A Data Flow Framework Based On Hadoop Map Reduce
6 pages
Apache Pig in Nosql Databases
No ratings yet
Apache Pig in Nosql Databases
5 pages
Amazon SimpleDB: LITE
From Everand
Amazon SimpleDB: LITE
Prabhakar Chaganti
No ratings yet
1.1 Interacting With Google Cloud
No ratings yet
1.1 Interacting With Google Cloud
13 pages
Blockchain Technology in Supply Chain Management
No ratings yet
Blockchain Technology in Supply Chain Management
2 pages
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
No ratings yet
Unit - V PIG Hadoop & Big Data: Pig Latin. This Language Provides Various Operators Using Which Programmers
9 pages
UNIT 2 Fndamentals of E-Commerce-Complete
No ratings yet
UNIT 2 Fndamentals of E-Commerce-Complete
53 pages
1st Quarter Module 2 - EMPOWERMENT TECHNOLOGY
No ratings yet
1st Quarter Module 2 - EMPOWERMENT TECHNOLOGY
13 pages
Scet Unit 5
No ratings yet
Scet Unit 5
9 pages
Pig
No ratings yet
Pig
16 pages
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
From Everand
Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps
Jason Scotts
3/5 (9)
Cloning Database
100% (3)
Cloning Database
6 pages

Chapter 1 - Apache Big

Uploaded by

Chapter 1 - Apache Big

Uploaded by

BIG DATA ANALYSIS

Le Thi Minh Chau

 Introduction to Big Data and Hadoop

 Introduction to Big Data and Hadoop

 Big data is a term for a collection of data sets so

 Apache Hadoop is a framework that allows the

 Introduction to Big Data and Hadoop

 It is an open source data flow language

 Introduction to Big Data and Hadoop

 Pig is just a wrapper on top of MapReduce

You might also like