0% found this document useful (0 votes)
18 views51 pages

What is Big Data and Data Science_.pdf

The document provides an overview of Big Data and Data Science, discussing their definitions, technological changes, and the impact of digital transformation on society. It highlights the differences between traditional analytics and Big Data, emphasizing the importance of real-time data processing and the variety of data types. Additionally, it addresses common myths surrounding Big Data and outlines the role of Data Science in extracting valuable insights from large volumes of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views51 pages

What is Big Data and Data Science_.pdf

The document provides an overview of Big Data and Data Science, discussing their definitions, technological changes, and the impact of digital transformation on society. It highlights the differences between traditional analytics and Big Data, emphasizing the importance of real-time data processing and the variety of data types. Additionally, it addresses common myths surrounding Big Data and outlines the role of Data Science in extracting valuable insights from large volumes of data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 51

BBVA Campus

Update yourself

What is Big Data and Data


Science? Myths and Realities
BBVA Campus
Update yourself

What is Big Data and Data Science?

01 Digital Transformation
02 Technological and Cultural Change

03 What is Big Data?

04 What is Data Science?

05 Big Data and Data Science - Myths vs


Reality
BBVA Campus
Update yourself

01
Digital Transformation
Digital Transformation

Things we used to do on a trip… that we don’t do


anymore
Digital Transformation
Digital Transformation
…that are transforming people's lives
Spain is the most connected country in
Europe 81%
19%
of users use 5 or more of Spaniards who
connected connect to the web do so
via mobile
devices (second only to the
US) country in the world in
Spain is the 5th
mcommerce
6/10 x3
We buy 3 times more
from apps than from
They buy via mobile browsers
mobile

“Always connected” Millennials are heavy users of mobile


Internet of devices
PEOPLE Look at the mobile in
the first 5
Digitalization of minutes after 3/10 70%
people's lives getting up, and He prefers to lose
uses it while in Check your
1/2 the bathroom phone every his wallet rather
10 minutes than his
cell phone
Digital Transformation
Internet of Things
The number of m2m connections is increasing
5 billion. 26 billion.
By connecting
2024 objects their
capabilities
2014 are multiplied

Reducing the cost of sensing and computing


<50 Connecting an

22$ 1,4$ cents.


2024
object to the
Internet via
1992 2014 sensors is
becoming less and Internet of THINGS
Organizations are betting on IoT less expensive
Digitization of
of companies will IoT physical objects

95%
invest in IoT within
a maximum period investment
of three years s have
grown by
60% since 2012
Digital Transformation

How much did it


cost to store 1 GB
in 1992 vs what it
costs now
now in 2018?

201
1992 8
600€ 0,03€
Digital Transformation

x20,000
Digital Transformation

Cost to Launch an Internet Tech Startup

$
$
5
M
5
M
C $
D 4
D M

e
i
A
$
3
t
T
o
M

h
o $500K
e $
2 $50K $5K
r
E
I
M
2005 2009 2011
T Open Source + Cloud + Developers Start
H $
Horizontal Scaling AWS Companies
S E
1
o R M
u
r
c 0
e
:

M
a
r
k

S
u
s
BBVA Campus
Update yourself

02
Technological and Cultural Change
Technological and Cultural
Change
Technological and Cultural
Change
Other changes that have allowed the emergence of Big Data
Technological and Cultural
Change
Technological changes
Initially there was a Monolithic System:
• A single machine
• That single machine processed the information
• Limitation on the amount of data to be processed
• Limiting the speed of such processing
Monolithic

Distributed systems
are effective, but not
Distributed always efficient…
System
In the face of Limitations arise the Systems
these
Distributed: • A unified set of machines
• Allows processing large amounts of data
• Rate limiting because programming was too expensive
• Machine failure could lead to loss of information
Technological and Cultural
Change
Technological
changes

HDD
SMS HDD

Distributed
System

System •
Big Data
HDD HDD

To correct these existing limitations, Big Data Systems emerge:

• Changes in the distribution of physical elements: a master computer


manages the rest of the computers that are the workers, ensuring that the
information to be processed is well distributed among them (each “worker” works
on the data stored on its own hard drive).

• Implementation of a software layer: it is responsible for managing the group of


computers and abstracts the user.
Technological and Cultural
Change
Changes and evolution of analytics

DATA TYPES
Data • Data with an established data format and
structure.
Structured • Example: Transactional data and OLAP.

Operational Data • Text data with a recognizable pattern, which is


data suitable for parsing (hashing).
Semi-structured • Example: XML files that are defined by an XSD
schema

Customer Data • Text data with a difficult-to-identify data pattern. They


can be formatted with effort, time and specific tools.
demographic • Example: Records of events or actions on a website,
Quasi-Structured logs.
information

Market Data No • Data that has no consistency or pattern and


is usually stored in different types of files
trends Structured • Example: Text files, PDFs, Images, Videos...

TRADITIONALLY PRESENT
Technological and Cultural
Change
Changes and evolution of analytics

TRADITIONAL ANALYTICS
(BUSINESS INTELLIGENCE)
It allows business objectives to be achieved at the product/service level based on data
analysis.

ANALYSIS FOCUS: Reports, KPIs, trends


ANALYSIS: Retrospective and Descriptive
ANALYSIS PROCESS: Comparative
Technological and Cultural
Change
Changes and evolution of analytics
BUSINESS INTELLIGENCE
LIMITATIONS
< • Slow processing: processing • Use of internal data (sometimes
0 quantities requires very heavy and slow enriched with very generic market
o processes studies).

r • High risk of possible machine failure: • Lack of capacity to store all data

e a machine failure can lead to loss of


information if not managed correctly.
• Under-exploitation of available data
(due to limitations in terms of
• Centralized data storage: single point processing and time).
EI of access.
T

• Information silos, lack of information


sharing Conclusion:


Analysis by area, not global.
Analytics of past events, not predictive.
BI is not
enough…
Technological and Cultural
Change
BIG
DATA

New interactions
New data types
New technologies New
products and services New
behaviors New opportunities
New business models
BBVA Campus
Update yourself

03
What is Big Data?
What is Big Data?
Big Data is a concept that refers to a set of processes, technologies and models
based on the massive storage of data, processing and transformation of the same
into knowledge, to anticipate what will happen in a complex world with many
interactions.

SOCIAL NETWORKS
• Twitter Pinterest Blog
• Facebook Google + s
CUSTOME • Linkedin Youtube Wikis
RS SYSTEMS Yelp

Purchase/custom CAPTURE THE
er cards ORGANIZATION SUPPLY CHAIN
• Offer / response
• SMS • Purchase orders
• Clickstream BILLIONS • Shipments
• In-store behavior OF • Returns
DEINTERACTIO • Sensors
• Ratings and NS • Store
surveys • Receipts
MILLIONS • Transporters
• Geolocation
OF • Product information
• Web forums IoT TRANSACTIONS • In-store placement
• Call centers
• Sensors • Market intelligence
• Beacons
• Email
• Smart mirrors
• Digital Signage MARKET
• RFID •
• Smart packaging • Trade • Competence
Economic
• Smart price tags • Organization • Events
situation
• Demographic • Industry news

Meteorological
conditions
What is Big Data?

BIG DATA GLOBAL ADOPTION


By industry
Financial Services Technology
Telecommunications Retail
Government 18
Healthcare Advertising and
14%
Entertainment Video Games Data
I 9% 7%
Services
7%
Energy and utilities IT Consulting
Maritime transport Transport
Big companies SMEs
(others)
Adoption Adoption

(80%
Spent Spent

60% 40%

Sources: Sciencedaily.com, thegovlab.org


2015
What is Big Data?
BIG GLOBAL ADOPTION
DATA

$13 $203
0 billion billion
h 2016 2020

$151
billion
2017

Annual growth rate of 11.7%


Sources IDC 2016, IDC 2017, Sciencedaily.com, Datameer
2015
What is Big Data?

The V's of Big Data

Volume

Variety Speed
What is Big Data?

VOLUME
Global data volume:
2009 0.8 Zb
2010 1 Zb
2011 1.8 Zb
2018 estimated 35 Zb
2025 estimated 163 Zb

Unit Abbreviation Equivalence


Byte/Octet B 8 bits
Kilobyte KB 1024 bytes
Megabyte MB 1024 KB
Gigabyte GB 1024 MB
Terabyte TB 1024 GB
Petabyte PB 1024 TB
Exabyte EB 1024 PB
Zettabyte ZB 1024 EB
Yottabyte YB 1024 ZB
Brontobyte BB 1024 YB
Geopbyte GeB 1024 BB
What is Big Data?

VOLUME

Structured
Temp table
(from .csv)

= Navigation
Properties
Name, Foreign l D,
Description,Cost Description,cost i,Bread,
Bob,Bread,$3.10 $3.10
Bob,Milk,$1.40
Bob,Cookies,$2.26

Unstructured
Social Media Landscape

& • "to
• is=
What is Big Data?

VOLUME

What type of data does today's


big data consist of?

Structured Unstructured
Social Media
Landscape

Around 90% of current data is unstructured data


What is Big Data?

SPEED

Data is generated very quickly and needs to be processed at


high speed.

Types of information processing:

Batch processing: data is accumulated and processed periodically, requiring a


scalable architecture with large storage capacity Streaming processing: data is
processed immediately and requires a low-latency architecture
Hybrid processing: Batch + Streaming, must comply with an architecture with capacity
for both processes.

Late decisions lead to lost opportunities


What is Big Data?

VARIETY
• Different formats, types and structures
• Text, numbers, images, audio, video, sequences, time series, social media
data, etc.
• Static data vs real-time data
• A simple application can generate and store many different types of data

Ean 99
What is Big Data?

TRUTHFULNESS &
VALUE
Volume
Veracity
• Trust
• Authenticity
• Origin and
reputation

+

Availability
• Responsibility

+ It is important that the data


provide knowledge that
Wort allows us to take advantage
h of it, whether it be benefits,
Variety Speed competitive advantage, etc.
What is Big Data?

Summary of differences between Big Data and Business


Intelligence
Real Time
Better connectivity From reactive to
enables real-time
based applications to predictive
be possible. strategies
Business Intelligence has
been able to analyze what
has happened, while
advanced analytics allows
us to predict what will
happen next.

Datafication
Digitalization generates Democratization
enormous amounts of data. From data control to Open
Processing such a large Data. Access to
amount of data was possible information is much easier
with distributed systems, but than before.
it was not accessible to
everyone.
world.
BBVA Campus
Update yourself

04
What is Data Science?
What is Data Science?

Data Science is a multidisciplinary field that combines


principles, processes, and techniques that allow us to
understand phenomena through the automated analysis of
large volumes of data, with the ultimate goal of extracting
valuable information (insights) and knowledge.
What is Data Science?

Differences with other disciplines

Statistics is the analysis, interpretation and presentation of facts based on


numbers and data.

Data Science offers a holistic view of data exploitation; that is, in addition to building
analytical models from a set of data and explaining the relationships between the
variables in the model, it deals with all stages within the data life cycle. That is, the
collection, cleaning, and transformation of data, construction of analytical models, and the
interpretation of the results and presentation of the conclusions obtained in a
comprehensible format suitable for dissemination to other areas of the business.
Differences with other disciplines
What is Data Science?
Data mining is the process of extracting hidden and unknown patterns
from a set of raw data, with the aim of transforming large amounts of data
into useful information.

Both terms refer to the process of transforming large volumes of data into
knowledge; however, data science is understood as a set of fundamental principles
that serve as a guide for the extraction of knowledge, while data mining refers to
the technologies that incorporate these principles.
What is Data Science?
In Big Data environments (massive and unstructured data), data
Data Science Disciplines analysis must combine methods and
techniques from different disciplines,
such as:

- Statistics
- Databases
- Computing
- Machine Learning
- Visualization, among others.
What is Data Science?
What is Data Science?
Data Science. Value of data
Data Insights Data Products
It consists of extracting hidden They are data-driven
knowledge from data that allows technological assets that
companies to make better encapsulate an algorithm
business decisions. These designed to process data and
decisions can be of two types: generate results, and which is
directly integrated as a visible
component of the business
- Decisions that require making operation.
discoveries in the data.

- Repetitive decisions for which a (small) increase in the accuracy of the analysis
offers a benefit to the company.
BBVA Campus
Update yourself

What is the
impact of DSy
BD on
BBVA’
s business?
BBVA Campus
Update yourself

Characteristics of
a
Data Scientist
BBVA Campus
Update yourself

05
Big Data and Data Science - Myths vs Reality
Big Data - Myth #1: Big Data is a fad!

1958
H. Peter Luhn defines BI
Data analytics was not invented yesterday. at IBM
Machine Learning:

- 50s Turing test


- 60s Perceptron
- 80s Neural Networks
But the hype created by the press can lead - 90s Bayesian Models
- 2000+ SVM (kernels)
people to think that Big Data is just a fad. - 2008+ Deep Learning

However, the reality is that current technology


has reduced the costs of collecting, storing,
integrating and processing large quantities of
data.

So, Big Data is not a fad, it is a reality in which


companies are investing heavily to achieve real
competitive advantages.
Big Data - Myth #2: The amount and size of data is what
matters

Nowadays we have the capacity to store huge amounts of data, we are in the age of the
Petabyte!

Having a lot of data without ensuring its


quality will get us nowhere!

A biased set of data sources will not give


us the best results either!

The focus then must go towards the variety and veracity of the data.
Big Data - Myth #3: Advanced analytics is the most important
thing in Big Data

Business Insights!

• The challenges of Big Data are not only focused on building the analytical model.

• There are challenges to manage: the variety of data, the scale and volume, the speed of
processing, privacy and legality, and even cultural and human challenges.

• We don't just need data scientists, we need data engineers, architects, business experts,
statisticians, visualizers, etc!
Big Data - Myth #3: Advanced analytics is the most important
thing in Big Data

• To reach the Analytical phase, all the challenges that arise from data acquisition, through cleaning
and integration must first be resolved.

• In all these phases, the challenges of variety, volume, speed, privacy and human interaction
are present.
Big Data - Myth #3: Advanced analytics is the most important
thing in Big Data

Other references to understand the challenges of Big Data. Read


this paper:

https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper
.pdf
Big Data - Myth #4: Reusing existing data is easy
There are millions of datasets available for free or at very low cost that can be used to
enrich analytical models or even data collected by the company in the past.

However, using previously collected data without proper understanding of the context in
which it was obtained can make it impossible to
integrate this data into an analytical model.

The challenge then is:


- Improve data governance to properly metadata
legacy data.
- Develop methods to fit legacy data to current
analytical problems.
Big Data - Myth #5: Data Science is the same as Big Data
Data Science involves using data to solve a business problem in a specific domain.

To achieve this, data management, data analysis and expert knowledge of the business
or domain to which it is to be applied are required.
The Fields of Data
Today, most business problems solved with Data Science
Science involve using Big Data.

But it is not always necessary. There may be


business problems that can be solved with Data
Science without involving Big Data.

Experimental Theoretical
Big Data - Myth #6: If I do Big Data Analytics my entire
organization will be able to exploit it!
We can buy larger computer systems, with more machines, better CPUs, and more
storage space.

But human capacity does not scale as quickly


as that of computer systems.

So the challenge is how do I prepare my staff


to have the ability to exploit Data Science and
Big Data as levers of business impact?

So I have to think not only about


implementing Big Data, but also about
improving the “usability” of my analytical
systems.

The end user who is business savvy is


typically not computer savvy.
BBVA Campus
Update yourself

Thanks
BBVA Campus
Update yourself

You might also like