0% found this document useful (0 votes)

94 views4 pages

Big Data with Hadoop and Spark Course

The document outlines a course on Big Data with Hadoop and Spark, focusing on concepts, hands-on experience, and real-world applications. It covers the Hadoop ecosystem, including HDFS, MapReduce, Hive, and Spark, along with their architectures and functionalities. By the end of the course, students will have advanced data processing skills and an understanding of scalable data storage and management.

Uploaded by

arnab91221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Data Integration,
Scala,
Data Models,
Hadoop,
Data Operations,
Spark Streaming,
Kafka Cluster,
Object Oriented Programming,
Data Analysis Techniques,
OLAP

0% found this document useful (0 votes)

94 views4 pages

Big Data with Hadoop and Spark Course

Uploaded by

arnab91221

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Data Integration,
Scala,
Data Models,
Hadoop,
Data Operations,
Spark Streaming,
Kafka Cluster,
Object Oriented Programming,
Data Analysis Techniques,
OLAP

BIG DATA WITH HADOOP AND SPARK

Course Objectives:
1. Understand the concepts of big data and its impact on businesses.
2. Learn about the Hadoop ecosystem and its components, such as HDFS, MapReduce, Hive,
Pig, and HBase.
3. Gain hands-on experience with Hadoop and Spark, including writing applications and running
them on a cluster.
4. Learn about the different types of big data analytics and how to use Hadoop and Spark to
perform them.
5. Be able to apply Hadoop and Spark to real-world big data problems.

Course Outcomes:
At the end of the course students will be able to:
CO1: Advanced Data Processing Skills.
CO2: Scalable Data Storage and Management.
CO3: Distributed Computing Concepts.
CO4: Real-time Data Processing.

UNIT-I:

1. Introduction to Big Data and Hadoop (7 Hours)

(a) What is Big Data?

(b) The Rise of Bytes
(c) Data Explosion and its Sources
(d) Types of Data – Structured, Semi-structured, Unstructured data
(e) Characteristics of Big Data
(f) Limitations of Traditional Large-Scale Systems
(g) Use Cases for Big Data
(h) Challenges of Big Data
(i) Hadoop Introduction - What is Hadoop? Why Hadoop?
(j) Supported Operating Systems
(k) Organizations using Hadoop
(l) Hadoop Job Trends
(m) History of Hadoop
(n) Hadoop Core Components – MapReduce & HDFS

UNIT-II

2. Hdfs Architecture (4 Hours)

(a) Regular File System v/s HDFS

Page 1 of 4
(b) HDFS Architecture
(c) Components of HDFS - NameNode, DataNode, SecondayNameNode
(d) Components of HDFS - NameNode, DataNode, SecondayNameNode
(e) HDFS Features - Fault Tolerance, Horizontal Scaling, Data Replication, Rack Awareness
(f) Anatomy of a file write on HDFS
(g) Anatomy of a file read on HDFS
(h) Hands on with Hadoop HDFS, WebUI and Linux Terminal Commands
(i) HDFS File System Operations
(j) Name Node Metadata, File System Namespace, NameNode Operation,
(k) Data Block Split
(l) Benefits of Data Block Approach
(m) Topology, Data Replication Representation
(n) HDFS Programming Basics – Java API
(o) Hadoop Configuration API
(p) HDFS API Overview
(q) When Hadoop is not suitable

3. Map Reduce (2.5 Hours)

(a) What is MapReduce and Why it is popular

(b) MapReduce Framework– Introduction, Driver, Mapper, Reducer, Combiner, Split,
Shuffle & Sort
(c) YARN ARCHITECTURE
(d) Hadoop 1.0 Limitations
(e) MapReduce Limitations
(f) YARN Architecture

UNIT-III

4. Hive (3.5 Hours)

(a) Limitations of MapReduce
(b) Need for High Level Languages
(c) Analytical OLAP - Data warehousing with Apache Hive
(d) What is Hive?
(e) Hive Query Language
(f) Background of Hive
(g) Hive Installation and Configuration
(h) Hive Architecture, Data Types, Data Model, Examples
(i) Create/Show Database, Drop Tables
(j) SELECT, INSERT, OVERWRITE, EXPLAIN

Page 2 of 4
(k) CREATE, ALTER, DROP, TRUNCATE, JOINS
(l) SerDe (Serialization / Deserialization)
(m) Partitions and Buckets
(n) Limitations of Hive
(o) SQL vs. Hive
(p) Different Formats like Avro, Parquet and ORC

5. Scala (Object Oriented and Functional Programming) (2.5 Hours)

(a) Getting started With Scala.
(b) Scala Background, Scala Vs Java and Basics.
(c) Interactive Scala – REPL, data types, variables, expressions, simple functions.
(d) Running the program with Scala Compiler.
(e) Explore the type lattice and use type inference
(f) Define Methods and Pattern Matching.
(g) Scala set up on Windows.
(h) Scala set up on Unix.
UNIT-IV

6. Functional Programming, Object Oriented, Programming, Integrations (2.5 Hours)

(a) Classes and Properties. Objects. Packaging and Imports. Traits.
(b) Objects, classes, inheritance, lists with multiple related types, apply
(c) What is SBT? Integration of Scala in Eclipse IDE. Integration of SBT with Eclipse.
(d) Batch versus real-time data processing

7. Spark Core (3 Hours)

(a) Introduction to Spark, Spark versus Hadoop
(b) Architecture of Spark.
(c) Data Partitioning and Parallelism
(d) Coding Spark jobs in Scala
(e) Exploring the Spark shell -> Creating Spark Context.
(f) RDD Programming. Operations on RDD.
(g) Transformations. Actions
(h) Loading Data and Saving Data.
(i) Key Value Pair RDD.
(j) RCA for Spark Application failures

Page 3 of 4
UNIT-V

8. Spark Sql (2 Hours)

(a) Introduction to Apache Spark SQL
(b) The SQL context
(c) Importing and saving data
(d) Processing the Text files, JSON and Parquet Files
(e) Data Frames
(f) Using Hive
(g) PySpark and ML demo with use cases
(h) Connectivity with MySQL
(i) Error Handling
9. Spark Streaming (1.5 Hours)

(a) Introduction of Spark Streaming.

(b) Architecture of Spark Streaming
(c) Processing Distributed Log Files in Real Time
(d) Discretized streams RDD.
(e) Applying Transformations and Actions on Streaming Data

10. Kafka (1.5 Hours)

(a) Understanding Kafka Cluster
(b) Installing and Configuring Kafka Cluster
(c) Kafka Producer. Kafka Consumer
(d) Producer and Consumer in Action
(e) Reading Data from Kafka
(f) Lab: Implement Kafka Producer, Consumer using real time streaming data

TOTAL………………. (30 Hours)

Text Books:
1. BIG DATA ANALYTICS: Introduction to Hadoop, Spark, and Machine-Learning, Raj
Kamal, Preeti Saxena, Publisher McGraw Hill Education, 2019.
2. Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale, Tom White, Edition 4,
"O'Reilly Media, Inc.", 2015.

Reference Books:
1. The Hadoop for Dummies by Dirk deRoos, Paul C. Zikopoulos, Roman B. Melnyk, Bruce
Brown, Rafael Coss.
2. Hadoop MapReduce Cookbook, Srinath Perera, Thilina Gunarathne.

Page 4 of 4

Big Data Analytics Course Overview 2024
No ratings yet
Big Data Analytics Course Overview 2024
6 pages
GAME
No ratings yet
GAME
2 pages
Bca Bigdata Fifth - Sem Approved Syllabus
No ratings yet
Bca Bigdata Fifth - Sem Approved Syllabus
23 pages
Big Data Curriculum for CS & CSE Students
No ratings yet
Big Data Curriculum for CS & CSE Students
2 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
Big Data - Road Map
No ratings yet
Big Data - Road Map
22 pages
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
No ratings yet
BE AIDS R 20 VII VIII Sem Syllabus - Compressed
55 pages
MCA - II Sem - Curriculum and Syllabus
No ratings yet
MCA - II Sem - Curriculum and Syllabus
15 pages
BCA - 409 Syallabus
No ratings yet
BCA - 409 Syallabus
2 pages
Big Data Overview and Technologies Course
No ratings yet
Big Data Overview and Technologies Course
3 pages
Comprehensive Guide to Big Data Analytics
No ratings yet
Comprehensive Guide to Big Data Analytics
2 pages
Python for Data Engineering & ML
No ratings yet
Python for Data Engineering & ML
11 pages
Big Data Analytics with PySpark Course
No ratings yet
Big Data Analytics with PySpark Course
2 pages
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Big Data Processing with Hadoop Course
No ratings yet
Big Data Processing with Hadoop Course
6 pages
Big Data Hadoop & Spark Course
No ratings yet
Big Data Hadoop & Spark Course
30 pages
Big Data Processing Course Overview
No ratings yet
Big Data Processing Course Overview
2 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
100% (1)
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Big Data analyticsNEW SYLLABUS FRAMING
No ratings yet
Big Data analyticsNEW SYLLABUS FRAMING
3 pages
Big Data & Hadoop Training Hyderabad
No ratings yet
Big Data & Hadoop Training Hyderabad
9 pages
Road Map 1741960074
No ratings yet
Road Map 1741960074
24 pages
Big Data Analytics Course
No ratings yet
Big Data Analytics Course
19 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
3 pages
Big Data Syllabus for VIII Semester
No ratings yet
Big Data Syllabus for VIII Semester
1 page
Big Data Analytics Syllabus
No ratings yet
Big Data Analytics Syllabus
3 pages
Scala and Spark Training: Objective
No ratings yet
Scala and Spark Training: Objective
4 pages
Big Data Course Overview and Modules
No ratings yet
Big Data Course Overview and Modules
4 pages
Big Data Technologies Teaching Guide
No ratings yet
Big Data Technologies Teaching Guide
6 pages
CSET 371 Course File
No ratings yet
CSET 371 Course File
81 pages
Big Data Syllabus
No ratings yet
Big Data Syllabus
1 page
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Big Data Analytics - Notes
No ratings yet
Big Data Analytics - Notes
13 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Big Data - Hadoop & Spark Training Syllabus: Tamilboomi
No ratings yet
Big Data - Hadoop & Spark Training Syllabus: Tamilboomi
4 pages
Hadoop Essentials for Big Data Solutions
No ratings yet
Hadoop Essentials for Big Data Solutions
2 pages
Bigdata Engineering Syllabus
No ratings yet
Bigdata Engineering Syllabus
14 pages
Data Engineer in 3 Months
No ratings yet
Data Engineer in 3 Months
2 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
Syllabus Hadoop
No ratings yet
Syllabus Hadoop
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
COMP9313: Big Data Management Overview
No ratings yet
COMP9313: Big Data Management Overview
76 pages
CCS334 BDA Syllabus
No ratings yet
CCS334 BDA Syllabus
5 pages
Spark Syllabus 1
100% (1)
Spark Syllabus 1
3 pages
Annexure - I - Syllabus PG-DBDA Aug 16
No ratings yet
Annexure - I - Syllabus PG-DBDA Aug 16
4 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
3 pages
Developer Training For Apache Spark and Hadoop
No ratings yet
Developer Training For Apache Spark and Hadoop
3 pages
Cloud Data Engineering Program Overview
No ratings yet
Cloud Data Engineering Program Overview
5 pages
Hadoop Development and Career Guide
No ratings yet
Hadoop Development and Career Guide
5 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
17 pages
Big Data Hadoop & Spark Training Course
No ratings yet
Big Data Hadoop & Spark Training Course
10 pages
Big Data Framework
No ratings yet
Big Data Framework
3 pages
Comprehensive Guide to Hadoop and Big Data
No ratings yet
Comprehensive Guide to Hadoop and Big Data
2 pages
Big Data Analytics Course Outline (Fall 2020) : Dr. Tariq Mahmood 830 Am - 11 Am (Monday) Scope
No ratings yet
Big Data Analytics Course Outline (Fall 2020) : Dr. Tariq Mahmood 830 Am - 11 Am (Monday) Scope
3 pages
Operations Research Course File
No ratings yet
Operations Research Course File
42 pages
MDL-display Reachstacker - Eng PDF
No ratings yet
MDL-display Reachstacker - Eng PDF
25 pages
Grounding Transformer FAQs - Pacific Crest Transformers - Custom Liquid Filled Transformers & Repair
No ratings yet
Grounding Transformer FAQs - Pacific Crest Transformers - Custom Liquid Filled Transformers & Repair
8 pages
FAQ Industry Professional On Teamcenter Training
No ratings yet
FAQ Industry Professional On Teamcenter Training
11 pages
Sahyadri Companion
No ratings yet
Sahyadri Companion
4 pages
Computer Systems and Components Worksheet
100% (2)
Computer Systems and Components Worksheet
2 pages
Emilio Aguinaldo College - Manila: Automatic Knock Sensing Door Lock System
No ratings yet
Emilio Aguinaldo College - Manila: Automatic Knock Sensing Door Lock System
7 pages
Assets TN10110421CFRPart11UserGuideSpraytec97 6 Tcm54 36984
No ratings yet
Assets TN10110421CFRPart11UserGuideSpraytec97 6 Tcm54 36984
15 pages
Helisem VRH Datasheet W
No ratings yet
Helisem VRH Datasheet W
4 pages
NATO Ammunition Demilitarization
No ratings yet
NATO Ammunition Demilitarization
33 pages
AST App
No ratings yet
AST App
26 pages
Mil 4
No ratings yet
Mil 4
13 pages
Quora Question Pair Classification
No ratings yet
Quora Question Pair Classification
6 pages
Kodak Veriset - T400-800
No ratings yet
Kodak Veriset - T400-800
2 pages
Grounding & Shock Prevention Systems
No ratings yet
Grounding & Shock Prevention Systems
11 pages
Database Systems in The Big Data Era
No ratings yet
Database Systems in The Big Data Era
17 pages
COA-Super Important Questions
No ratings yet
COA-Super Important Questions
3 pages
CSE 333 Computer Peripherals & Interfacing
No ratings yet
CSE 333 Computer Peripherals & Interfacing
41 pages
Insights For Facebook Pages: Product Guide
100% (6)
Insights For Facebook Pages: Product Guide
11 pages
Bins Show
No ratings yet
Bins Show
92 pages
Solana ZK Part1
No ratings yet
Solana ZK Part1
16 pages
Slot-1 Supple Summer - 2025 Result
No ratings yet
Slot-1 Supple Summer - 2025 Result
8 pages
Spirax-Sarco Volume 1
No ratings yet
Spirax-Sarco Volume 1
1,396 pages
Java Programming Quiz Questions
No ratings yet
Java Programming Quiz Questions
18 pages
Deutz Fahr Agrofarm 410 420 430 Tractor Shop Service Repair Manual
93% (14)
Deutz Fahr Agrofarm 410 420 430 Tractor Shop Service Repair Manual
956 pages
ADR Transmission Error Troubleshooting Guide
No ratings yet
ADR Transmission Error Troubleshooting Guide
7 pages
60 Series General Purpose Relays
No ratings yet
60 Series General Purpose Relays
14 pages
WCDMA Drive Test Analysis
No ratings yet
WCDMA Drive Test Analysis
61 pages
Dimmer Schneider PDF
No ratings yet
Dimmer Schneider PDF
1 page
Technical Report on Generator Inspection
No ratings yet
Technical Report on Generator Inspection
4 pages

Big Data with Hadoop and Spark Course

Uploaded by

Big Data with Hadoop and Spark Course

Uploaded by

BIG DATA WITH HADOOP AND SPARK

1. Introduction to Big Data and Hadoop (7 Hours)

(a) What is Big Data?

2. Hdfs Architecture (4 Hours)

(a) Regular File System v/s HDFS

3. Map Reduce (2.5 Hours)

(a) What is MapReduce and Why it is popular

4. Hive (3.5 Hours)

5. Scala (Object Oriented and Functional Programming) (2.5 Hours)

6. Functional Programming, Object Oriented, Programming, Integrations (2.5 Hours)

7. Spark Core (3 Hours)

8. Spark Sql (2 Hours)

(a) Introduction of Spark Streaming.

10. Kafka (1.5 Hours)

TOTAL………………. (30 Hours)

You might also like