What Is Big Data - Introduction
What Is Big Data - Introduction
Big data is a collection of large datasets that cannot be processed using traditional computing
techniques. It is not a single technique or a tool; rather it has become a complete subject, which
involves various tools, techniques and frameworks.
“Big Data” is data whose scale, diversity, and complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract value and hidden knowledge from it
Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it
will be of three types.
Structured data − Relational data.
Semi Structured data − XML data.
Unstructured data − Word, PDF, Text, Media Logs.
Data is everywhere. In fact, the amount of digital data that exists is growing at a rapid rate,
doubling every two years, and changing the way we live. According to IBM, 2.5 billion gigabytes
(GB) of data was generated every day in 2019.
An article by Forbes states that Data is growing faster than ever before and by the year 2020, about
1.7 megabytes of new information will be created every second for every human being on the
planet.
Which makes it extremely important to at least know the basics of the field. After all, here is where
our future lies.
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
Big Data Challenges
The major challenges associated with big data are as follows −
Capturing data Storage Searching Sharing
Transfer Analysis Presentation
Complexity:
Various formats, types, and structures
Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays,
etc…
Static data vs. streaming data
A single application can be generating/collecting many types of data
Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data
on a day to day basis as they have billions of users worldwide.
E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which
users buying trends can be traced.
Weather Station: All the weather station and satellite gives very huge data which are stored and
manipulated to forecast weather.
Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly
publish their plans and for this they store the data of its million users.
Share Market: Stock exchange across the world generates huge amount of data through its daily
transaction.
Black Box Data
Social Media Data Stock Exchange Data Power Grid Data
Transport Data Search Engine Data
NoSQL Big Data systems are designed to take advantage of new cloud computing architectures
that have emerged over the past decade to allow massive computations to be run inexpensively
and efficiently. This makes operational big data workloads much easier to manage, cheaper, and
faster to implement.
Some NoSQL systems can provide insights into patterns and trends based on real-time data with
minimal coding and without the need for data scientists and additional infrastructure.
Analytical Big Data
These includes systems like Massively Parallel Processing (MPP) database systems and
MapReduce that provide analytical capabilities for retrospective and complex analysis that may
touch most or all of the data.
MapReduce provides a new method of analyzing data that is complementary to the capabilities
provided by SQL, and a system based on MapReduce that can be scaled up from single servers to
thousands of high and low end machines.
These two classes of technology are complementary and frequently deployed together.
Operational vs. Analytical Systems
Operational Analytical
Latency 1 ms - 100 ms 1 min - 100 min
Concurrency 1000 - 100,000 1 - 10
Access Pattern Writes and Reads Reads
Queries Selective Unselective
Data Scope Operational Retrospective
End User Customer Data Scientist
Technology NoSQL MapReduce, MPP Database