1 - Big Data Analytics & IoT
1 - Big Data Analytics & IoT
3
“Data is the New Oil”
– World Economic Forum 2011
4
What is Big Data?
According to the Author Dr. Kirk Borne, Principal Data Scientist, Big Data Definition is Everything,
Quantified and Tracked.
Smarter Decisions
Better Products
Deeper Insights
Greater Knowledge
Optimal Solutions
More Automated Processes
More accurate Predictive and Prescriptive Analytics
Better models of future behaviors and outcomes. 5
What is IoT?
The Internet of Things (IoT) is the network of physical objects—devices, vehicles, buildings and other
items embedded with electronics, software, sensors, and network connectivity—that enables these
objects to collect and exchange data.
S3 HDFS HPC
Big Data Ingestion involves connecting to various data sources, extracting the
data, and detecting the changed data. It's about moving data - and especially the
unstructured data - from where it is originated, into a system where it can be stored
and analyzed.
8
Data Collection (Integration) Layer
In this Layer, more focus is on transportation data from ingestion layer to rest of
Data Pipeline. Here we use a messaging system that will act as a mediator
between all the programs that can send and receive messages.
• Kafka works with Storm, Hbase, Spark for real-time analysis and rendering
streaming data
– Building Real-Time streaming Data Pipelines that reliably get data between systems or
applications
– Building Real-Time streaming applications that transform or react to the streams of data.
• Data Pipeline is the main component of data integration
9
Data Processing Layer
In this Layer, data collected in the previous layer is processed and made ready to
route to different destinations.
10
Data Storage Layer
Next, the major issue is to keep data in the right place based on usage. A
combination of distributed file systems and NoSQL databases provide scalable
data storage platforms for Big Data / IoT
• HDFS - A Java-based file system that provides scalable and reliable data
storage, and it was designed to span large clusters of commodity servers.
• Amazon Simple Storage Service (Amazon S3) - Object storage with a simple
web service interface to store and retrieve any amount of data from anywhere
on the web.
• NoSQL – Non-relational databases that provide a mechanism for storage and
retrieval of data which is modeled in means other than the tabular relations
used in relational databases.
11
Data Query (Access) Layer
This is the layer where strong analytic processing takes place. Data analytics is an
essential step which solved the inefficiencies of traditional data platforms to handle
large amounts of data related to interactive queries, ETL, storage and processing
12
Data Visualization Layer
This layer focus on Big Data Visualization. We need something that will grab
people’s attention, pull them in, make your findings well-understood. This is the
where the data value is perceived by the user.
13