0% found this document useful (0 votes)
53 views

What Is Big Data - Introduction

Big data refers to large datasets that cannot be processed using traditional computing techniques due to their scale, diversity, and complexity. It requires new techniques and tools to manage and extract value from the data. Big data comes from a variety of internal and external sources, including transactions, sensors, social media, and more. The challenges of big data include capturing, storing, searching, sharing, transferring, analyzing, and presenting data from various formats and sources. However, big data also provides benefits like personalized marketing, predictive analytics, and more efficient operations across many industries.

Uploaded by

Paritosh Belekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

What Is Big Data - Introduction

Big data refers to large datasets that cannot be processed using traditional computing techniques due to their scale, diversity, and complexity. It requires new techniques and tools to manage and extract value from the data. Big data comes from a variety of internal and external sources, including transactions, sensors, social media, and more. The challenges of big data include capturing, storing, searching, sharing, transferring, analyzing, and presenting data from various formats and sources. However, big data also provides benefits like personalized marketing, predictive analytics, and more efficient operations across many industries.

Uploaded by

Paritosh Belekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Big Data

Big data is a collection of large datasets that cannot be processed using traditional computing
techniques. It is not a single technique or a tool; rather it has become a complete subject, which
involves various tools, techniques and frameworks.
“Big Data” is data whose scale, diversity, and complexity require new architecture, techniques,
algorithms, and analytics to manage it and extract value and hidden knowledge from it

Thus Big Data includes huge volume, high velocity, and extensible variety of data. The data in it
will be of three types.
Structured data − Relational data.
Semi Structured data − XML data.
Unstructured data − Word, PDF, Text, Media Logs.

Data is everywhere. In fact, the amount of digital data that exists is growing at a rapid rate,
doubling every two years, and changing the way we live. According to IBM, 2.5 billion gigabytes
(GB) of data was generated every day in 2019.
An article by Forbes states that Data is growing faster than ever before and by the year 2020, about
1.7 megabytes of new information will be created every second for every human being on the
planet.
Which makes it extremely important to at least know the basics of the field. After all, here is where
our future lies.
https://www.simplilearn.com/data-science-vs-big-data-vs-data-analytics-article
Big Data Challenges
The major challenges associated with big data are as follows −
Capturing data Storage Searching Sharing
Transfer Analysis Presentation
Complexity:
Various formats, types, and structures
Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays,
etc…
Static data vs. streaming data
A single application can be generating/collecting many types of data

To extract knowledge➔ all these types of


data need to linked together

Harnessing Big Data


• OLTP: Online Transaction Processing (DBMSs)
• OLAP: Online Analytical Processing (Data Warehousing)
• RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

Sources of Big Data


Information from multiple internal and external sources:
Transactions Social media Enterprise content
Sensors Mobile devices

This flood of data is coming from many sources.


• The Stock Exchange generates about one terabyte of new trade data per day.
• Facebook hosts approximately 10 billion photos, taking up one petabyte of storage.
• Ancestry.com, the genealogy site, stores around 2.5 petabytes of data.
• The Internet Archive stores around 2 petabytes of data and is growing at a rate of 20 terabytes
per month.
• The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data
per year.

Social networking sites: Facebook, Google, LinkedIn all these sites generates huge amount of data
on a day to day basis as they have billions of users worldwide.
E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge amount of logs from which
users buying trends can be traced.
Weather Station: All the weather station and satellite gives very huge data which are stored and
manipulated to forecast weather.
Telecom company: Telecom giants like Airtel, Vodafone study the user trends and accordingly
publish their plans and for this they store the data of its million users.
Share Market: Stock exchange across the world generates huge amount of data through its daily
transaction.
Black Box Data
Social Media Data Stock Exchange Data Power Grid Data
Transport Data Search Engine Data

The Applications of Big Data are


Banking and Securities Communications Media and Entertainment
Healthcare Providers Education Manufacturing and Natural Resources
Government Insurance Retail and Wholesale trade
Transportation Energy and Utilities

The Uses of Big Data are


Location Tracking Precision Medicine
Fraud Detection & Handling Advertising Entertainment & Media

Real World Big Data Examples


Discovering consumer shopping habits. Personalized marketing.
Fuel optimization tools for the transportation industry.
Monitoring health conditions through data from wearables.
Live road mapping for autonomous vehicles. Streamlined media streaming.
Predictive inventory ordering
Challenges in Handling Big Data
The Bottleneck is in technology
New architecture, algorithms, techniques are needed
Also in technical skills
Experts in using the new technology and dealing with big data
Issues with Big data
Huge amount of unstructured data which needs to be stored, processed and analyzed
There are three issues with Big data and they are as follows −
Low Quality and Inaccurate Data
Low-quality data or inaccurate data quality may lead to inaccurate results or predictions which
does nothing but just wastes the time and effort of the individuals.
To solve, to predict or to find new patterns from the data, the data must be of high quality and
accurate.
Processing Large Data Sets
Due to a large amount of data, no traditional data management tool or software can directly/easily
process because the size of these large data sets is usually in Terabytes which is really hard to
process.
So we need to go through various stages to process the data like removing unnecessary low-quality
data, partitioning the data by some defined factor, etc.
Integrating data from a variety of sources
Data comes from various types of sources like social media, different websites, captured
images/videos, customer logs, reports created by individuals, newspapers, emails, etc.
Collecting and integrating various data which are of different types is a very challenging task.

Benefits of Big Data


Using the information kept in the social network like Facebook, the marketing agencies are
learning about the response for their campaigns, promotions, and other advertising mediums.
Using the information in the social media like preferences and product perception of their
consumers, product companies and retail organizations are planning their production.
Using the data regarding the previous medical history of patients, hospitals are providing better
and quick service.
Big Data Technologies
Big data technologies are important in providing more accurate analysis, which may lead to more
concrete decision-making resulting in greater operational efficiencies, cost reductions, and reduced
risks for the business.
To harness the power of big data, you would require an infrastructure that can manage and process
huge volumes of structured and unstructured data in realtime and can protect data privacy and
security.
There are various technologies in the market from different vendors including Amazon, IBM,
Microsoft, etc., to handle big data. While looking into the technologies that handle big data, we
examine the following two classes of technology −
Operational Big Data
This include systems like MongoDB that provide operational capabilities for real-time, interactive
workloads where data is primarily captured and stored.

NoSQL Big Data systems are designed to take advantage of new cloud computing architectures
that have emerged over the past decade to allow massive computations to be run inexpensively
and efficiently. This makes operational big data workloads much easier to manage, cheaper, and
faster to implement.
Some NoSQL systems can provide insights into patterns and trends based on real-time data with
minimal coding and without the need for data scientists and additional infrastructure.
Analytical Big Data
These includes systems like Massively Parallel Processing (MPP) database systems and
MapReduce that provide analytical capabilities for retrospective and complex analysis that may
touch most or all of the data.
MapReduce provides a new method of analyzing data that is complementary to the capabilities
provided by SQL, and a system based on MapReduce that can be scaled up from single servers to
thousands of high and low end machines.
These two classes of technology are complementary and frequently deployed together.
Operational vs. Analytical Systems
Operational Analytical
Latency 1 ms - 100 ms 1 min - 100 min
Concurrency 1000 - 100,000 1 - 10
Access Pattern Writes and Reads Reads
Queries Selective Unselective
Data Scope Operational Retrospective
End User Customer Data Scientist
Technology NoSQL MapReduce, MPP Database

You might also like