SlideShare a Scribd company logo
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn
What’s in it for you?
1. History of Spark
What’s in it for you?
What’s in it for you?
1. History of Spark
2. What is Spark?
What’s in it for you?
What’s in it for you?
1. History of Spark
2. What is Spark?
3. Hadoop vs Spark
What’s in it for you?
What’s in it for you?
1. History of Spark
2. What is Spark?
3. Hadoop vs Spark
4. Components of Apache Spark
What’s in it for you?
Spark Core
Spark SQL
Spark Streaming
Spark MLlib
GraphX
What’s in it for you?
1. History of Spark
2. What is Spark?
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark Architecture
What’s in it for you?
What’s in it for you?
1. History of Spark
2. What is Spark?
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark Architecture
6. Applications of Spark
What’s in it for you?
What’s in it for you?
1. History of Spark
2. What is Spark?
3. Hadoop vs Spark
4. Components of Apache Spark
5. Spark Architecture
6. Applications of Spark
7. Spark Use Case
What’s in it for you?
History of Apache Spark
Started as a project at UC
Berkley AMPLab
2009
History of Apache Spark
Started as a project at UC
Berkley AMPLab
Open sourced under a
BSD license
2009
2010
History of Apache Spark
Started as a project at UC
Berkley AMPLab
Open sourced under a
BSD license
Spark became an Apache top
level project
2009
2010
2013
History of Apache Spark
Started as a project at UC
Berkley AMPLab
Open sourced under a
BSD license
Spark became an Apache top
level project
Used by Databricks to sort
large-scale datasets and set a
new world record
2009
2010
2013
2014
History of Apache Spark
What is Apache Spark?
What is Apache Spark?
Apache Spark is an open-source data processing engine to store and process data in
real-time across various clusters of computers using simple programming constructs
What is Apache Spark?
Support various programming languages
Apache Spark is an open-source data processing engine to store and process data in
real-time across various clusters of computers using simple programming constructs
What is Apache Spark?
Support various programming languages Developers and data scientists incorporate
Spark into their applications to rapidly
query, analyze, and transform data at
scale
Query Analyze Transform
Apache Spark is an open-source data processing engine to store and process data in
real-time across various clusters of computers using simple programming constructs
History of Apache Spark
Hadoop vs Spark
Hadoop vs Spark
Processing data using MapReduce in Hadoop is slow
Spark processes data 100 times faster than MapReduce as it is done in-
memory
Hadoop vs Spark
Processing data using MapReduce in Hadoop is slow
Spark processes data 100 times faster than MapReduce as it is done in-
memory
Performs batch processing of data Performs both batch processing and real-time processing of data
Hadoop vs Spark
Processing data using MapReduce in Hadoop is slow
Spark processes data 100 times faster than MapReduce as it is done in-
memory
Performs batch processing of data Performs both batch processing and real-time processing of data
Hadoop has more lines of code. Since it is written in Java, it takes
more time to execute
Spark has fewer lines of code as it is implemented in Scala
Hadoop vs Spark
Processing data using MapReduce in Hadoop is slow
Spark processes data 100 times faster than MapReduce as it is done in-
memory
Performs batch processing of data Performs both batch processing and real-time processing of data
Hadoop has more lines of code. Since it is written in Java, it takes
more time to execute
Spark has fewer lines of code as it is implemented in Scala
Hadoop supports Kerberos authentication, which is difficult to manage Spark supports authentication via a shared secret. It can also
run on YARN leveraging the capability of Kerberos
History of Apache Spark
Spark Features
Spark Features
Fast processing
Spark contains Resilient Distributed
Datasets (RDD) which saves time
taken in reading, and writing
operations and hence, it runs almost
ten to hundred times faster than
Hadoop
Spark Features
In-memory
computing
In Spark, data is stored in the RAM,
so it can access the data quickly and
accelerate the speed of analytics
Fast processing
Spark Features
Flexible
Spark supports multiple languages
and allows the developers to write
applications in Java, Scala, R, or
Python
In-memory
computingFast processing
Spark Features
Fault tolerance
Spark contains Resilient Distributed
Datasets (RDD) that are designed to
handle the failure of any worker
node in the cluster. Thus, it ensures
that the loss of data reduces to zero
Flexible
In-memory
computingFast processing
Spark Features
Better analytics
Spark has a rich set of SQL queries,
machine learning algorithms,
complex analytics, etc. With all these
functionalities, analytics can be
performed better
Fault toleranceFlexible
In-memory
computingFast processing
History of Apache Spark
Components of Spark
Components of Apache Spark
Spark Core
Components of Apache Spark
Spark Core Spark SQL
SQL
Components of Apache Spark
Spark
Streaming
Spark Core Spark SQL
SQL Streaming
Components of Apache Spark
MLlib
Spark
Streaming
Spark Core Spark SQL
SQL Streaming MLlib
Components of Apache Spark
MLlib
Spark
Streaming
Spark Core Spark SQL GraphX
SQL Streaming MLlib
History of Apache Spark
Components of Spark –
Spark Core
Spark Core
Spark Core
Spark Core is the base engine for large-scale parallel and distributed
data processing
Spark Core
Spark Core
Spark Core is the base engine for large-scale parallel and distributed
data processing
It is responsible for:
memory management fault recovery
scheduling, distributing and
monitoring jobs on a cluster
interacting with storage
systems
Resilient Distributed Dataset
Spark Core
Spark Core is embedded with RDDs (Resilient Distributed Datasets), an
immutable fault-tolerant, distributed collection of objects that can be operated on
in parallel
RDD
Transformation Action
These are operations (such as reduce,
first, count) that return
a value after running a computation on
an RDD
These are operations (such as map, filter,
join, union) that are performed on an RDD
that yields a new RDD containing the
result
History of Apache Spark
Components of Spark –
Spark SQL
Spark SQL
Spark SQL framework component is used for structured and semi-structured data
processing
Spark SQL
SQL
Spark SQL
Spark SQL framework component is used for structured and semi-structured data
processing
Spark SQL
SQL
DataFrame DSL Spark SQL and HQL
DataFrame API
Data Source API
CSV JSON JDBC
Spark SQL Architecture
History of Apache Spark
Components of Spark –
Spark Streaming
Spark Streaming
Spark Streaming is a lightweight API that allows developers to perform batch
processing and real-time streaming of data with ease
Spark
Streaming
Streaming
Provides secure, reliable, and fast processing of live data
streams
Spark Streaming
Spark Streaming is a lightweight API that allows developers to perform batch
processing and real-time streaming of data with ease
Spark
Streaming
Streaming
Provides secure, reliable, and fast processing of live data
streams
Streaming Engine
Input data
stream
Batches of
input data
Batches of
processed
data
History of Apache Spark
Components of Spark –
Spark MLlib
Spark MLlib
MLlib is a low-level machine learning library that is simple to use,
is scalable, and compatible with various programming languages
MLlib
MLlib
MLlib eases the deployment and development of
scalable machine learning algorithms
Spark MLlib
MLlib is a low-level machine learning library that is simple to use,
is scalable, and compatible with various programming languages
MLlib
MLlib
MLlib eases the deployment and development of
scalable machine learning algorithms
It contains machine learning libraries that have an
implementation of various machine learning algorithms
Clustering Classification Collaborative
Filtering
History of Apache Spark
Components of Spark –
GraphX
GraphX
GraphX is Spark’s own Graph Computation Engine and data store
GraphX
GraphX
GraphX is Spark’s own Graph Computation Engine and data store
GraphX
Provides a uniform tool for ETL Exploratory data analysis
Interactive graph computations
History of Apache Spark
Spark Architecture
Master Node
Driver Program
SparkContext
• Master Node has a Driver Program
• The Spark code behaves as a driver
program and creates a SparkContext,
which is a gateway to all the Spark
functionalities
Apache Spark uses a master-slave architecture that consists of a driver, that runs on a
master node, and multiple executors which run across the worker nodes in the cluster
Spark Architecture
Master Node
Driver Program
SparkContext Cluster Manager
• Spark applications run as independent
sets of processes
on a cluster
• The driver program & Spark context
takes care of the job execution within
the cluster
Spark Architecture
Master Node
Driver Program
SparkContext Cluster Manager
Cache
Task Task
Executor
Worker Node
Cache
Task Task
Executor
Worker Node
• A job is split into multiple tasks that are
distributed over the worker node
• When an RDD is created in Spark
context, it can be distributed across
various nodes
• Worker nodes are slaves that run
different tasks
Spark Architecture
Master Node
Driver Program
SparkContext Cluster Manager
Cache
Task Task
Executor
Worker Node
Cache
Task Task
Executor
Worker Node
• The Executor is responsible for the
execution of these tasks
• Worker nodes execute the tasks
assigned by the Cluster Manager and
return the results back to the
SparkContext
Spark Architecture
Spark Cluster Managers
Standalone mode
1
By default, applications
submitted to the
standalone mode cluster
will run in FIFO order,
and each application will
try to use all available
nodes
Spark Cluster Managers
Standalone mode
1 2
By default, applications
submitted to the
standalone mode cluster
will run in FIFO order,
and each application will
try to use all available
nodes
Apache Mesos is an
open-source project to
manage computer
clusters, and can also run
Hadoop applications
Spark Cluster Managers
Standalone mode
1 2 3
By default, applications
submitted to the
standalone mode cluster
will run in FIFO order,
and each application will
try to use all available
nodes
Apache Mesos is an
open-source project to
manage computer
clusters, and can also run
Hadoop applications
Apache YARN is the
cluster resource manager
of Hadoop 2. Spark can
be run on YARN
Spark Cluster Managers
Standalone mode
1 2 3 4
By default, applications
submitted to the
standalone mode cluster
will run in FIFO order,
and each application will
try to use all available
nodes
Apache Mesos is an
open-source project to
manage computer
clusters, and can also run
Hadoop applications
Apache YARN is the
cluster resource manager
of Hadoop 2. Spark can
be run on YARN
Kubernetes is an open-
source system for
automating deployment,
scaling, and management
of containerized
applications
History of Apache Spark
Applications of Spark
Applications of Spark
Banking
JPMorgan uses Spark to detect
fraudulent transactions, analyze the
business spends of an individual to
suggest offers, and identify patterns
to decide how much to invest and
where to invest
Applications of Spark
Banking E-Commerce
JPMorgan uses Spark to detect
fraudulent transactions, analyze the
business spends of an individual to
suggest offers, and identify patterns
to decide how much to invest and
where to invest
Alibaba uses Spark to analyze large
sets of data such as real-time
transaction details, browsing history,
etc. in the form of Spark jobs and
provides recommendations to its users
Applications of Spark
Banking E-Commerce Healthcare
JPMorgan uses Spark to detect
fraudulent transactions, analyze the
business spends of an individual to
suggest offers, and identify patterns
to decide how much to invest and
where to invest
Alibaba uses Spark to analyze large
sets of data such as real-time
transaction details, browsing history,
etc. in the form of Spark jobs and
provides recommendations to its users
IQVIA is a leading healthcare company
that uses Spark to analyze patient’s
data, identify possible health issues,
and diagnose it based on their medical
history
Applications of Spark
Banking E-Commerce Healthcare Entertainment
JPMorgan uses Spark to detect
fraudulent transactions, analyze the
business spends of an individual to
suggest offers, and identify patterns
to decide how much to invest and
where to invest
Alibaba uses Spark to analyze large
sets of data such as real-time
transaction details, browsing history,
etc. in the form of Spark jobs and
provides recommendations to its users
IQVIA is a leading healthcare company
that uses Spark to analyze patient’s
data, identify possible health issues,
and diagnose it based on their medical
history
Entertainment and gaming companies
like Netflix and Riot games use
Apache Spark to showcase relevant
advertisements to their users based on
the videos that they watch, share, and
like
History of Apache Spark
Spark Use Case
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Video streaming is a challenge, especially with
increasing demand for high-quality streaming
experiences
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Video streaming is a challenge, especially with
increasing demand for high-quality streaming
experiences
Conviva collects data about video streaming
quality to give their customers visibility into the end-
user experience they are delivering
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva delivers a better
quality of service to its customers by removing the
screen buffering and learning in detail about the
network conditions in real-time
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva delivers a better
quality of service to its customers by removing the
screen buffering and learning in detail about the
network conditions in real-time
This information is stored in the video player to
manage live video traffic coming from 4 billion video
feeds every month, to ensure maximum retention
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva has
created an auto diagnostics alert
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva has
created an auto diagnostics alert
It automatically detects anomalies
along the video streaming pipeline and
diagnoses the root cause of the issue
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva has
created an auto diagnostics alert
It automatically detects anomalies
along the video streaming pipeline and
diagnoses the root cause of the issue
Reduces waiting time before the
video starts
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva has
created an auto diagnostics alert
It automatically detects anomalies
along the video streaming pipeline and
diagnoses the root cause of the issue
Reduces waiting time before the
video starts
Avoids buffering and recovers the
video from a technical error
Spark Use Case
Conviva is one of the world’s leading video streaming companies
Using Apache Spark, Conviva has
created an auto diagnostics alert
It automatically detects anomalies
along the video streaming pipeline and
diagnoses the root cause of the issue
Reduces waiting time before the
video starts
Avoids buffering and recovers the
video from a technical error
Goal is to maximize the viewer
engagement
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn

More Related Content

What's hot (20)

Introduction to Pig
Introduction to PigIntroduction to Pig
Introduction to Pig
Prashanth Babu
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
Carol McDonald
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
GauravBiswas9
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Apache PIG
Apache PIGApache PIG
Apache PIG
Prashant Gupta
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Robert Sanders
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Vivek Aanand Ganesan
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
Introduction to Spark with Python
Introduction to Spark with PythonIntroduction to Spark with Python
Introduction to Spark with Python
Gokhan Atil
 
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep diveApache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive
Sachin Aggarwal
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
Apache Apex
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Databricks
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
colorant
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
Venkata Naga Ravi
 

Similar to What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn (20)

SparkPaper
SparkPaperSparkPaper
SparkPaper
Suraj Thapaliya
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
APACHE SPARK.pptx
APACHE SPARK.pptxAPACHE SPARK.pptx
APACHE SPARK.pptx
DeepaThirumurugan
 
Apache spark
Apache sparkApache spark
Apache spark
Dona Mary Philip
 
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptxCLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
bhuvankumar3877
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
Apache spark
Apache sparkApache spark
Apache spark
Prashant Pranay
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
Frank Schroeter
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
mahchiev
 
39.-Introduction-to-Sparkspark and all-1.pdf
39.-Introduction-to-Sparkspark and all-1.pdf39.-Introduction-to-Sparkspark and all-1.pdf
39.-Introduction-to-Sparkspark and all-1.pdf
ajajkhan16
 
Spark Concepts Cheat Sheet_Interview_Question.pdf
Spark Concepts Cheat Sheet_Interview_Question.pdfSpark Concepts Cheat Sheet_Interview_Question.pdf
Spark Concepts Cheat Sheet_Interview_Question.pdf
aekannake
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Whizlabs
 
Getting Started with Apache Spark (Scala)
Getting Started with Apache Spark (Scala)Getting Started with Apache Spark (Scala)
Getting Started with Apache Spark (Scala)
Knoldus Inc.
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
Marius Soutier
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Big_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_SessionBig_data_analytics_NoSql_Module-4_Session
Big_data_analytics_NoSql_Module-4_Session
RUHULAMINHAZARIKA
 
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptxCLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
CLOUD_COMPUTING_MODULE5_RK_BIG_DATA.pptx
bhuvankumar3877
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
Anirudh
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
Venkata Naga Ravi
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
mahchiev
 
39.-Introduction-to-Sparkspark and all-1.pdf
39.-Introduction-to-Sparkspark and all-1.pdf39.-Introduction-to-Sparkspark and all-1.pdf
39.-Introduction-to-Sparkspark and all-1.pdf
ajajkhan16
 
Spark Concepts Cheat Sheet_Interview_Question.pdf
Spark Concepts Cheat Sheet_Interview_Question.pdfSpark Concepts Cheat Sheet_Interview_Question.pdf
Spark Concepts Cheat Sheet_Interview_Question.pdf
aekannake
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
Whizlabs
 
Getting Started with Apache Spark (Scala)
Getting Started with Apache Spark (Scala)Getting Started with Apache Spark (Scala)
Getting Started with Apache Spark (Scala)
Knoldus Inc.
 
Unit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptxUnit II Real Time Data Processing tools.pptx
Unit II Real Time Data Processing tools.pptx
Rahul Borate
 
Ad

More from Simplilearn (20)

Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Top 50 Scrum Master Interview Questions | Scrum Master Interview Questions & ...
Simplilearn
 
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Bagging Vs Boosting In Machine Learning | Ensemble Learning In Machine Learni...
Simplilearn
 
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Future Of Social Media | Social Media Trends and Strategies 2025 | Instagram ...
Simplilearn
 
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
SQL Query Optimization | SQL Query Optimization Techniques | SQL Basics | SQL...
Simplilearn
 
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
SQL INterview Questions .pTop 45 SQL Interview Questions And Answers In 2025 ...
Simplilearn
 
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
How To Start Influencer Marketing Business | Influencer Marketing For Beginne...
Simplilearn
 
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Cyber Security Roadmap 2025 | How To Become Cyber Security Engineer In 2025 |...
Simplilearn
 
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
How To Become An AI And ML Engineer In 2025 | AI Engineer Roadmap | AI ML Car...
Simplilearn
 
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
What Is GitHub Copilot? | How To Use GitHub Copilot? | How does GitHub Copilo...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Top 7 High Paying AI Certifications Courses For 2025 | Best AI Certifications...
Simplilearn
 
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Data Cleaning In Data Mining | Step by Step Data Cleaning Process | Data Clea...
Simplilearn
 
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Top 10 Data Analyst Projects For 2025 | Data Analyst Projects | Data Analysis...
Simplilearn
 
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
AI Engineer Roadmap 2025 | AI Engineer Roadmap For Beginners | AI Engineer Ca...
Simplilearn
 
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Machine Learning Roadmap 2025 | Machine Learning Engineer Roadmap For Beginne...
Simplilearn
 
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Kotter's 8-Step Change Model Explained | Kotter's Change Management Model | S...
Simplilearn
 
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Gen AI Engineer Roadmap For 2025 | How To Become Gen AI Engineer In 2025 | Si...
Simplilearn
 
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Top 10 Data Analyst Certification For 2025 | Best Data Analyst Certification ...
Simplilearn
 
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Complete Data Science Roadmap For 2025 | Data Scientist Roadmap For Beginners...
Simplilearn
 
Ad

Recently uploaded (20)

"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx
Arshad Shaikh
 
WRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptx
WRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptxWRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptx
WRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptx
Sourav Kr Podder
 
Adam Grant: Transforming Work Culture Through Organizational Psychology
Adam Grant: Transforming Work Culture Through Organizational PsychologyAdam Grant: Transforming Work Culture Through Organizational Psychology
Adam Grant: Transforming Work Culture Through Organizational Psychology
Prachi Shah
 
POS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 SlidesPOS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 Slides
Celine George
 
Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.
jmansha170
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptxPests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Arshad Shaikh
 
EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025
EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025
EUPHORIA GENERAL QUIZ FINALS | QUIZ CLUB OF PSGCAS | 21 MARCH 2025
Quiz Club of PSG College of Arts & Science
 
Uterine Prolapse, causes type and classification,its managment
Uterine Prolapse, causes type and classification,its managmentUterine Prolapse, causes type and classification,its managment
Uterine Prolapse, causes type and classification,its managment
Ritu480198
 
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
EduSkills OECD
 
How to Configure Add to Cart in Odoo 18 Website
How to Configure Add to Cart in Odoo 18 WebsiteHow to Configure Add to Cart in Odoo 18 Website
How to Configure Add to Cart in Odoo 18 Website
Celine George
 
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT PatnaSwachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Swachata Quiz - Prelims - 01.10.24 - Quiz Club IIT Patna
Quiz Club, Indian Institute of Technology, Patna
 
Pharmaceutical_Incompatibilities.pptx
Pharmaceutical_Incompatibilities.pptxPharmaceutical_Incompatibilities.pptx
Pharmaceutical_Incompatibilities.pptx
Shantanu Ranjan
 
THE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATION
THE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATIONTHE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATION
THE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATION
PROF. PAUL ALLIEU KAMARA
 
Fatman Book HD Pdf by aayush songare.pdf
Fatman Book  HD Pdf by aayush songare.pdfFatman Book  HD Pdf by aayush songare.pdf
Fatman Book HD Pdf by aayush songare.pdf
Aayush Songare
 
Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..
faizanaltaf231
 
How to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRMHow to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRM
Celine George
 
Optimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptxOptimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptx
UrmiPrajapati3
 
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
Sritoma Majumder
 
Hemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptxHemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptx
Arshad Shaikh
 
"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx"Hymenoptera: A Diverse and Fascinating Order".pptx
"Hymenoptera: A Diverse and Fascinating Order".pptx
Arshad Shaikh
 
WRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptx
WRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptxWRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptx
WRITTEN THEME ROUND- OPEN GENERAL QUIZ.pptx
Sourav Kr Podder
 
Adam Grant: Transforming Work Culture Through Organizational Psychology
Adam Grant: Transforming Work Culture Through Organizational PsychologyAdam Grant: Transforming Work Culture Through Organizational Psychology
Adam Grant: Transforming Work Culture Through Organizational Psychology
Prachi Shah
 
POS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 SlidesPOS Reporting in Odoo 18 - Odoo 18 Slides
POS Reporting in Odoo 18 - Odoo 18 Slides
Celine George
 
Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.Artificial intelligence Presented by JM.
Artificial intelligence Presented by JM.
jmansha170
 
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptxAnalysis of Quantitative Data Parametric and non-parametric tests.pptx
Analysis of Quantitative Data Parametric and non-parametric tests.pptx
Shrutidhara2
 
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptxPests of Rice: Damage, Identification, Life history, and Management.pptx
Pests of Rice: Damage, Identification, Life history, and Management.pptx
Arshad Shaikh
 
Uterine Prolapse, causes type and classification,its managment
Uterine Prolapse, causes type and classification,its managmentUterine Prolapse, causes type and classification,its managment
Uterine Prolapse, causes type and classification,its managment
Ritu480198
 
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
Trends Spotting Strategic foresight for tomorrow’s education systems - Debora...
EduSkills OECD
 
How to Configure Add to Cart in Odoo 18 Website
How to Configure Add to Cart in Odoo 18 WebsiteHow to Configure Add to Cart in Odoo 18 Website
How to Configure Add to Cart in Odoo 18 Website
Celine George
 
Pharmaceutical_Incompatibilities.pptx
Pharmaceutical_Incompatibilities.pptxPharmaceutical_Incompatibilities.pptx
Pharmaceutical_Incompatibilities.pptx
Shantanu Ranjan
 
THE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATION
THE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATIONTHE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATION
THE CHURCH AND ITS IMPACT: FOSTERING CHRISTIAN EDUCATION
PROF. PAUL ALLIEU KAMARA
 
Fatman Book HD Pdf by aayush songare.pdf
Fatman Book  HD Pdf by aayush songare.pdfFatman Book  HD Pdf by aayush songare.pdf
Fatman Book HD Pdf by aayush songare.pdf
Aayush Songare
 
Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..Cloud Computing ..PPT ( Faizan ALTAF )..
Cloud Computing ..PPT ( Faizan ALTAF )..
faizanaltaf231
 
How to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRMHow to Create a Stage or a Pipeline in Odoo 18 CRM
How to Create a Stage or a Pipeline in Odoo 18 CRM
Celine George
 
Optimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptxOptimization technique in pharmaceutical product development.pptx
Optimization technique in pharmaceutical product development.pptx
UrmiPrajapati3
 
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
CBSE - Grade 11 - Mathematics - Ch 2 - Relations And Functions - Notes (PDF F...
Sritoma Majumder
 
Hemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptxHemiptera & Neuroptera: Insect Diversity.pptx
Hemiptera & Neuroptera: Insect Diversity.pptx
Arshad Shaikh
 

What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial | Simplilearn

  • 2. What’s in it for you? 1. History of Spark What’s in it for you?
  • 3. What’s in it for you? 1. History of Spark 2. What is Spark? What’s in it for you?
  • 4. What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark What’s in it for you?
  • 5. What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark What’s in it for you? Spark Core Spark SQL Spark Streaming Spark MLlib GraphX
  • 6. What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture What’s in it for you?
  • 7. What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture 6. Applications of Spark What’s in it for you?
  • 8. What’s in it for you? 1. History of Spark 2. What is Spark? 3. Hadoop vs Spark 4. Components of Apache Spark 5. Spark Architecture 6. Applications of Spark 7. Spark Use Case What’s in it for you?
  • 9. History of Apache Spark Started as a project at UC Berkley AMPLab 2009
  • 10. History of Apache Spark Started as a project at UC Berkley AMPLab Open sourced under a BSD license 2009 2010
  • 11. History of Apache Spark Started as a project at UC Berkley AMPLab Open sourced under a BSD license Spark became an Apache top level project 2009 2010 2013
  • 12. History of Apache Spark Started as a project at UC Berkley AMPLab Open sourced under a BSD license Spark became an Apache top level project Used by Databricks to sort large-scale datasets and set a new world record 2009 2010 2013 2014
  • 13. History of Apache Spark What is Apache Spark?
  • 14. What is Apache Spark? Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
  • 15. What is Apache Spark? Support various programming languages Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
  • 16. What is Apache Spark? Support various programming languages Developers and data scientists incorporate Spark into their applications to rapidly query, analyze, and transform data at scale Query Analyze Transform Apache Spark is an open-source data processing engine to store and process data in real-time across various clusters of computers using simple programming constructs
  • 17. History of Apache Spark Hadoop vs Spark
  • 18. Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory
  • 19. Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data
  • 20. Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data Hadoop has more lines of code. Since it is written in Java, it takes more time to execute Spark has fewer lines of code as it is implemented in Scala
  • 21. Hadoop vs Spark Processing data using MapReduce in Hadoop is slow Spark processes data 100 times faster than MapReduce as it is done in- memory Performs batch processing of data Performs both batch processing and real-time processing of data Hadoop has more lines of code. Since it is written in Java, it takes more time to execute Spark has fewer lines of code as it is implemented in Scala Hadoop supports Kerberos authentication, which is difficult to manage Spark supports authentication via a shared secret. It can also run on YARN leveraging the capability of Kerberos
  • 22. History of Apache Spark Spark Features
  • 23. Spark Features Fast processing Spark contains Resilient Distributed Datasets (RDD) which saves time taken in reading, and writing operations and hence, it runs almost ten to hundred times faster than Hadoop
  • 24. Spark Features In-memory computing In Spark, data is stored in the RAM, so it can access the data quickly and accelerate the speed of analytics Fast processing
  • 25. Spark Features Flexible Spark supports multiple languages and allows the developers to write applications in Java, Scala, R, or Python In-memory computingFast processing
  • 26. Spark Features Fault tolerance Spark contains Resilient Distributed Datasets (RDD) that are designed to handle the failure of any worker node in the cluster. Thus, it ensures that the loss of data reduces to zero Flexible In-memory computingFast processing
  • 27. Spark Features Better analytics Spark has a rich set of SQL queries, machine learning algorithms, complex analytics, etc. With all these functionalities, analytics can be performed better Fault toleranceFlexible In-memory computingFast processing
  • 28. History of Apache Spark Components of Spark
  • 29. Components of Apache Spark Spark Core
  • 30. Components of Apache Spark Spark Core Spark SQL SQL
  • 31. Components of Apache Spark Spark Streaming Spark Core Spark SQL SQL Streaming
  • 32. Components of Apache Spark MLlib Spark Streaming Spark Core Spark SQL SQL Streaming MLlib
  • 33. Components of Apache Spark MLlib Spark Streaming Spark Core Spark SQL GraphX SQL Streaming MLlib
  • 34. History of Apache Spark Components of Spark – Spark Core
  • 35. Spark Core Spark Core Spark Core is the base engine for large-scale parallel and distributed data processing
  • 36. Spark Core Spark Core Spark Core is the base engine for large-scale parallel and distributed data processing It is responsible for: memory management fault recovery scheduling, distributing and monitoring jobs on a cluster interacting with storage systems
  • 37. Resilient Distributed Dataset Spark Core Spark Core is embedded with RDDs (Resilient Distributed Datasets), an immutable fault-tolerant, distributed collection of objects that can be operated on in parallel RDD Transformation Action These are operations (such as reduce, first, count) that return a value after running a computation on an RDD These are operations (such as map, filter, join, union) that are performed on an RDD that yields a new RDD containing the result
  • 38. History of Apache Spark Components of Spark – Spark SQL
  • 39. Spark SQL Spark SQL framework component is used for structured and semi-structured data processing Spark SQL SQL
  • 40. Spark SQL Spark SQL framework component is used for structured and semi-structured data processing Spark SQL SQL DataFrame DSL Spark SQL and HQL DataFrame API Data Source API CSV JSON JDBC Spark SQL Architecture
  • 41. History of Apache Spark Components of Spark – Spark Streaming
  • 42. Spark Streaming Spark Streaming is a lightweight API that allows developers to perform batch processing and real-time streaming of data with ease Spark Streaming Streaming Provides secure, reliable, and fast processing of live data streams
  • 43. Spark Streaming Spark Streaming is a lightweight API that allows developers to perform batch processing and real-time streaming of data with ease Spark Streaming Streaming Provides secure, reliable, and fast processing of live data streams Streaming Engine Input data stream Batches of input data Batches of processed data
  • 44. History of Apache Spark Components of Spark – Spark MLlib
  • 45. Spark MLlib MLlib is a low-level machine learning library that is simple to use, is scalable, and compatible with various programming languages MLlib MLlib MLlib eases the deployment and development of scalable machine learning algorithms
  • 46. Spark MLlib MLlib is a low-level machine learning library that is simple to use, is scalable, and compatible with various programming languages MLlib MLlib MLlib eases the deployment and development of scalable machine learning algorithms It contains machine learning libraries that have an implementation of various machine learning algorithms Clustering Classification Collaborative Filtering
  • 47. History of Apache Spark Components of Spark – GraphX
  • 48. GraphX GraphX is Spark’s own Graph Computation Engine and data store GraphX
  • 49. GraphX GraphX is Spark’s own Graph Computation Engine and data store GraphX Provides a uniform tool for ETL Exploratory data analysis Interactive graph computations
  • 50. History of Apache Spark Spark Architecture
  • 51. Master Node Driver Program SparkContext • Master Node has a Driver Program • The Spark code behaves as a driver program and creates a SparkContext, which is a gateway to all the Spark functionalities Apache Spark uses a master-slave architecture that consists of a driver, that runs on a master node, and multiple executors which run across the worker nodes in the cluster Spark Architecture
  • 52. Master Node Driver Program SparkContext Cluster Manager • Spark applications run as independent sets of processes on a cluster • The driver program & Spark context takes care of the job execution within the cluster Spark Architecture
  • 53. Master Node Driver Program SparkContext Cluster Manager Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • A job is split into multiple tasks that are distributed over the worker node • When an RDD is created in Spark context, it can be distributed across various nodes • Worker nodes are slaves that run different tasks Spark Architecture
  • 54. Master Node Driver Program SparkContext Cluster Manager Cache Task Task Executor Worker Node Cache Task Task Executor Worker Node • The Executor is responsible for the execution of these tasks • Worker nodes execute the tasks assigned by the Cluster Manager and return the results back to the SparkContext Spark Architecture
  • 55. Spark Cluster Managers Standalone mode 1 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes
  • 56. Spark Cluster Managers Standalone mode 1 2 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications
  • 57. Spark Cluster Managers Standalone mode 1 2 3 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications Apache YARN is the cluster resource manager of Hadoop 2. Spark can be run on YARN
  • 58. Spark Cluster Managers Standalone mode 1 2 3 4 By default, applications submitted to the standalone mode cluster will run in FIFO order, and each application will try to use all available nodes Apache Mesos is an open-source project to manage computer clusters, and can also run Hadoop applications Apache YARN is the cluster resource manager of Hadoop 2. Spark can be run on YARN Kubernetes is an open- source system for automating deployment, scaling, and management of containerized applications
  • 59. History of Apache Spark Applications of Spark
  • 60. Applications of Spark Banking JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest
  • 61. Applications of Spark Banking E-Commerce JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users
  • 62. Applications of Spark Banking E-Commerce Healthcare JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users IQVIA is a leading healthcare company that uses Spark to analyze patient’s data, identify possible health issues, and diagnose it based on their medical history
  • 63. Applications of Spark Banking E-Commerce Healthcare Entertainment JPMorgan uses Spark to detect fraudulent transactions, analyze the business spends of an individual to suggest offers, and identify patterns to decide how much to invest and where to invest Alibaba uses Spark to analyze large sets of data such as real-time transaction details, browsing history, etc. in the form of Spark jobs and provides recommendations to its users IQVIA is a leading healthcare company that uses Spark to analyze patient’s data, identify possible health issues, and diagnose it based on their medical history Entertainment and gaming companies like Netflix and Riot games use Apache Spark to showcase relevant advertisements to their users based on the videos that they watch, share, and like
  • 64. History of Apache Spark Spark Use Case
  • 65. Spark Use Case Conviva is one of the world’s leading video streaming companies
  • 66. Spark Use Case Conviva is one of the world’s leading video streaming companies Video streaming is a challenge, especially with increasing demand for high-quality streaming experiences
  • 67. Spark Use Case Conviva is one of the world’s leading video streaming companies Video streaming is a challenge, especially with increasing demand for high-quality streaming experiences Conviva collects data about video streaming quality to give their customers visibility into the end- user experience they are delivering
  • 68. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva delivers a better quality of service to its customers by removing the screen buffering and learning in detail about the network conditions in real-time
  • 69. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva delivers a better quality of service to its customers by removing the screen buffering and learning in detail about the network conditions in real-time This information is stored in the video player to manage live video traffic coming from 4 billion video feeds every month, to ensure maximum retention
  • 70. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert
  • 71. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue
  • 72. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts
  • 73. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts Avoids buffering and recovers the video from a technical error
  • 74. Spark Use Case Conviva is one of the world’s leading video streaming companies Using Apache Spark, Conviva has created an auto diagnostics alert It automatically detects anomalies along the video streaming pipeline and diagnoses the root cause of the issue Reduces waiting time before the video starts Avoids buffering and recovers the video from a technical error Goal is to maximize the viewer engagement

Editor's Notes