0% found this document useful (0 votes)
22 views

Big Data-Introduction

The document discusses the introduction to big data including its definition, characteristics, importance and architecture. It describes big data in terms of volume, velocity, variety and veracity. It also explains the key components of big data architecture including data ingestion, processing, storage, visualization, sources, storage, batch processing, stream processing, analytical data store, analysis and reporting and orchestration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Big Data-Introduction

The document discusses the introduction to big data including its definition, characteristics, importance and architecture. It describes big data in terms of volume, velocity, variety and veracity. It also explains the key components of big data architecture including data ingestion, processing, storage, visualization, sources, storage, batch processing, stream processing, analytical data store, analysis and reporting and orchestration.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

AJAY KUMAR GARG ENGINEERING COLLEGE, GHAZIABAD

Computer Science And Engineering

Big Data and Analytics (KDS-601)

Introduction to Big Data and its


architecture

Presented By:
Ms. Neeharika Tripathi
Assistant Professor
Department of Computer Science And Engineering
Introduction to Big data
• Data is raw facts that has not been processed to
explain their meaning.
• Big Data is a term used to describe a collection of
data that is huge in volume and yet growing
exponentially with time.
• Few examples of Big Data are:
▫ The Stock Exchange generates about one terabyte of new
trade data per day.
▫ The statistic shows that 500+terabytes of new data get
ingested into the databases of social media site Facebook,
every day
Characteristics Of Big Data
 Volume: Volume means “How much Data is generated”.
Now-a-days, Organizations or Human Beings or Systems are
generating or getting a very vast amount of Data say TB
(TeraBytes) to PB (PetaBytes) to ExaByte(EB) and more. Size
of data plays a very crucial role in determining value out of
data. Also, whether a particular data can actually be
considered as a Big Data or not, is dependent upon the
volume of data.
 Velocity: Velocity means “How fast produce Data”. Big Data
Velocity deals with the speed at which data flows in from
sources like business processes, application logs, networks,
and social media sites, sensors, Mobile devices, etc.
Characteristics Of Big Data
• Variety: Variety means “Different forms of Data”.
Variety refers to heterogeneous sources and the nature of
data, both structured and unstructured. Nowadays, data
in the form of emails, photos, videos, monitoring devices,
PDFs, audio, etc. are also being considered in the analysis
applications.
• Veracity: Veracity means “The Quality or Correctness or
Accuracy of Captured Data”. Out of 4Vs, it is the most
important V for any Big Data Solutions. Because without
Correct Information or Data, there is no use of storing
large amounts of data at fast rate and different formats.
Importance of Big Data
• Cost Saving: Big Data tools like Apache Hadoop,
Spark, etc. bring cost-saving benefits to businesses
when they have to store large amounts of data.
• Time Saving: Tools like Hadoop help them to analyze
data immediately thus helping in making quick
decisions based on the learnings.
• Understand the market condition: Big Data
analysis helps businesses to get a better understanding
of market situations. For example, analysis of customer
purchasing behavior helps companies to identify the
products sold most and thus produces those products
accordingly.
Importance of Big Data
• Social media Listening: Big data tools can do sentiment
analysis. Therefore, we can get feedback about who is saying
what about our company.
• Boost Customer Acquisition and Retention: Customers
are a vital asset on which any business depends on. No single
business can achieve its success without building a robust
customer base. Big data analytics helps businesses to identify
customer related trends and patterns. Customer behavior
analysis leads to a profitable business.
• Solve Advertisers Problem and Offer Marketing
Insights: Big data analytics shapes all business operations. It
enables companies to fulfill customer expectations. Big data
analytics helps in changing the company’s product line. It
ensures powerful marketing campaigns.
Big data architecture
• Data Ingestion: This layer is responsible for collecting
and storing data from various sources. data ingestion
process of extracting data from various sources and
loading it into a data repository. Data ingestion is a key
component of a Bi how data will be ingested,
transformed, and stored.
• Data Processing: Data processing is the second layer,
responsible for collecting, cleaning, and preparing the
data for analysis. This layer is critical for ensuring that
the data is high quality and ready to be used in future.
• Data Storage: Data storage is the third layer,
responsible for storing the data in a format that can
be easily accessed and analyzed. This layer is
essential for ensuring that the data is accessible and
available to the other layers.
• Data Visualization: Data visualization is the
fourth layer and is responsible for creating
visualizations of the data that humans can easily
understand. This layer is important for making the
data accessible.
Components of Big Data

• Data sources: All big data solutions start with one


or more data sources. Examples include: ○
Application data stores, such as relational databases.
• Data storage: Data for batch processing operations
is typically stored in a distributed file store that can
hold high volumes of large files in various formats.
This kind of store is often called a data lake. Options
for implementing this storage include Azure Data
Lake Store or blob containers in Azure Storage.
Components of Big Data
• Batch processing: Because the data sets are so large,
often a big data solution must process data files using long-
running batch jobs to filter, aggregate, and otherwise
prepare the data for analysis. Usually these jobs involve
reading source files, processing them, and writing the
output to new files.
• Real-time message ingestion: If the solution includes
real-time sources, the architecture must include a way to
capture and store real-time messages for stream processing.
This might be a simple data store, where incoming
messages are dropped into a folder for processing. Options
include Azure Event Hubs, Azure IoT Hubs, and Kafka.
Components of Big Data
• Stream processing: After capturing real-time
messages, the solution must process them by filtering,
aggregating, and otherwise preparing the data for
analysis. The processed stream data is then written to an
output sink.
• Analytical data store: Many big data solutions prepare
data for analysis and then serve the processed data in a
structured format that can be queried using analytical
tools. The data could be presented through a low-latency
NoSQL technology such as HBase, or an interactive Hive
database that provides a metadata abstraction over data
files in the distributed data store.
Components of Big Data
• Analysis and reporting: The goal of most big data
solutions is to provide insights into the data through analysis
and reporting. Analysis and reporting can also take the form
of interactive data exploration by data scientists or data
analysts.
• Orchestration: Most big data solutions consist of repeated
data processing operations, encapsulated in workflows, that
transform source data, move data between multiple sources
and sinks, load the processed data into an analytical data
store, or push the results straight to a report or dashboard. To
automate these workflows, we can use an orchestration
technology such as Azure Data Factory or Apache Oozie and
Sqoop.

You might also like