What is Big Data and Data Science_.pdf
What is Big Data and Data Science_.pdf
Update yourself
01 Digital Transformation
02 Technological and Cultural Change
01
Digital Transformation
Digital Transformation
95%
invest in IoT within
a maximum period investment
of three years s have
grown by
60% since 2012
Digital Transformation
201
1992 8
600€ 0,03€
Digital Transformation
x20,000
Digital Transformation
$
$
5
M
5
M
C $
D 4
D M
e
i
A
$
3
t
T
o
M
h
o $500K
e $
2 $50K $5K
r
E
I
M
2005 2009 2011
T Open Source + Cloud + Developers Start
H $
Horizontal Scaling AWS Companies
S E
1
o R M
u
r
c 0
e
:
M
a
r
k
S
u
s
BBVA Campus
Update yourself
02
Technological and Cultural Change
Technological and Cultural
Change
Technological and Cultural
Change
Other changes that have allowed the emergence of Big Data
Technological and Cultural
Change
Technological changes
Initially there was a Monolithic System:
• A single machine
• That single machine processed the information
• Limitation on the amount of data to be processed
• Limiting the speed of such processing
Monolithic
Distributed systems
are effective, but not
Distributed always efficient…
System
In the face of Limitations arise the Systems
these
Distributed: • A unified set of machines
• Allows processing large amounts of data
• Rate limiting because programming was too expensive
• Machine failure could lead to loss of information
Technological and Cultural
Change
Technological
changes
HDD
SMS HDD
Distributed
System
System •
Big Data
HDD HDD
DATA TYPES
Data • Data with an established data format and
structure.
Structured • Example: Transactional data and OLAP.
TRADITIONALLY PRESENT
Technological and Cultural
Change
Changes and evolution of analytics
TRADITIONAL ANALYTICS
(BUSINESS INTELLIGENCE)
It allows business objectives to be achieved at the product/service level based on data
analysis.
r • High risk of possible machine failure: • Lack of capacity to store all data
New interactions
New data types
New technologies New
products and services New
behaviors New opportunities
New business models
BBVA Campus
Update yourself
03
What is Big Data?
What is Big Data?
Big Data is a concept that refers to a set of processes, technologies and models
based on the massive storage of data, processing and transformation of the same
into knowledge, to anticipate what will happen in a complex world with many
interactions.
SOCIAL NETWORKS
• Twitter Pinterest Blog
• Facebook Google + s
CUSTOME • Linkedin Youtube Wikis
RS SYSTEMS Yelp
•
Purchase/custom CAPTURE THE
er cards ORGANIZATION SUPPLY CHAIN
• Offer / response
• SMS • Purchase orders
• Clickstream BILLIONS • Shipments
• In-store behavior OF • Returns
DEINTERACTIO • Sensors
• Ratings and NS • Store
surveys • Receipts
MILLIONS • Transporters
• Geolocation
OF • Product information
• Web forums IoT TRANSACTIONS • In-store placement
• Call centers
• Sensors • Market intelligence
• Beacons
• Email
• Smart mirrors
• Digital Signage MARKET
• RFID •
• Smart packaging • Trade • Competence
Economic
• Smart price tags • Organization • Events
situation
• Demographic • Industry news
•
Meteorological
conditions
What is Big Data?
(80%
Spent Spent
60% 40%
$13 $203
0 billion billion
h 2016 2020
$151
billion
2017
Volume
Variety Speed
What is Big Data?
VOLUME
Global data volume:
2009 0.8 Zb
2010 1 Zb
2011 1.8 Zb
2018 estimated 35 Zb
2025 estimated 163 Zb
VOLUME
Structured
Temp table
(from .csv)
= Navigation
Properties
Name, Foreign l D,
Description,Cost Description,cost i,Bread,
Bob,Bread,$3.10 $3.10
Bob,Milk,$1.40
Bob,Cookies,$2.26
Unstructured
Social Media Landscape
& • "to
• is=
What is Big Data?
VOLUME
Structured Unstructured
Social Media
Landscape
SPEED
VARIETY
• Different formats, types and structures
• Text, numbers, images, audio, video, sequences, time series, social media
data, etc.
• Static data vs real-time data
• A simple application can generate and store many different types of data
Ean 99
What is Big Data?
TRUTHFULNESS &
VALUE
Volume
Veracity
• Trust
• Authenticity
• Origin and
reputation
+
•
Availability
• Responsibility
Datafication
Digitalization generates Democratization
enormous amounts of data. From data control to Open
Processing such a large Data. Access to
amount of data was possible information is much easier
with distributed systems, but than before.
it was not accessible to
everyone.
world.
BBVA Campus
Update yourself
04
What is Data Science?
What is Data Science?
Data Science offers a holistic view of data exploitation; that is, in addition to building
analytical models from a set of data and explaining the relationships between the
variables in the model, it deals with all stages within the data life cycle. That is, the
collection, cleaning, and transformation of data, construction of analytical models, and the
interpretation of the results and presentation of the conclusions obtained in a
comprehensible format suitable for dissemination to other areas of the business.
Differences with other disciplines
What is Data Science?
Data mining is the process of extracting hidden and unknown patterns
from a set of raw data, with the aim of transforming large amounts of data
into useful information.
Both terms refer to the process of transforming large volumes of data into
knowledge; however, data science is understood as a set of fundamental principles
that serve as a guide for the extraction of knowledge, while data mining refers to
the technologies that incorporate these principles.
What is Data Science?
In Big Data environments (massive and unstructured data), data
Data Science Disciplines analysis must combine methods and
techniques from different disciplines,
such as:
- Statistics
- Databases
- Computing
- Machine Learning
- Visualization, among others.
What is Data Science?
What is Data Science?
Data Science. Value of data
Data Insights Data Products
It consists of extracting hidden They are data-driven
knowledge from data that allows technological assets that
companies to make better encapsulate an algorithm
business decisions. These designed to process data and
decisions can be of two types: generate results, and which is
directly integrated as a visible
component of the business
- Decisions that require making operation.
discoveries in the data.
- Repetitive decisions for which a (small) increase in the accuracy of the analysis
offers a benefit to the company.
BBVA Campus
Update yourself
What is the
impact of DSy
BD on
BBVA’
s business?
BBVA Campus
Update yourself
Characteristics of
a
Data Scientist
BBVA Campus
Update yourself
05
Big Data and Data Science - Myths vs Reality
Big Data - Myth #1: Big Data is a fad!
1958
H. Peter Luhn defines BI
Data analytics was not invented yesterday. at IBM
Machine Learning:
Nowadays we have the capacity to store huge amounts of data, we are in the age of the
Petabyte!
The focus then must go towards the variety and veracity of the data.
Big Data - Myth #3: Advanced analytics is the most important
thing in Big Data
Business Insights!
• The challenges of Big Data are not only focused on building the analytical model.
• There are challenges to manage: the variety of data, the scale and volume, the speed of
processing, privacy and legality, and even cultural and human challenges.
• We don't just need data scientists, we need data engineers, architects, business experts,
statisticians, visualizers, etc!
Big Data - Myth #3: Advanced analytics is the most important
thing in Big Data
• To reach the Analytical phase, all the challenges that arise from data acquisition, through cleaning
and integration must first be resolved.
• In all these phases, the challenges of variety, volume, speed, privacy and human interaction
are present.
Big Data - Myth #3: Advanced analytics is the most important
thing in Big Data
https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/bigdatawhitepaper
.pdf
Big Data - Myth #4: Reusing existing data is easy
There are millions of datasets available for free or at very low cost that can be used to
enrich analytical models or even data collected by the company in the past.
However, using previously collected data without proper understanding of the context in
which it was obtained can make it impossible to
integrate this data into an analytical model.
To achieve this, data management, data analysis and expert knowledge of the business
or domain to which it is to be applied are required.
The Fields of Data
Today, most business problems solved with Data Science
Science involve using Big Data.
Experimental Theoretical
Big Data - Myth #6: If I do Big Data Analytics my entire
organization will be able to exploit it!
We can buy larger computer systems, with more machines, better CPUs, and more
storage space.
Thanks
BBVA Campus
Update yourself