Big Data Intro
Big Data Intro
Data
BIG DATA
Maximilien Brice, © CERN
The Earthscope
• The Earthscope is the world's
largest science project. Designed to
track North America's geological
evolution, this observatory records
data over 3.8 million square miles,
amassing 67 terabytes of data. It
analyzes seismic slips in the San
Andreas fault, sure, but also the
plume of magma underneath
Yellowstone and much, much more.
(http://www.msnbc.msn.com/id/44
363598/ns/technology_and_science
-future_of_technology/#.TmetOdQ-
-uI)
Big data -Definition
• Big data is a collection of data sets so large
and complex that it becomes difficult to
process using on-hand database management
tools
• The challenges include capture, storage,
search, sharing, analysis, and visualization.
Big Data: A definition
• Put another way, big data is the realization of
greater business intelligence by storing,
processing, and analyzing data that was
previously ignored due to the limitations of
traditional data management technologies
Source: Harness the Power of Big Data: The IBM Big Data Platform
Type of Data
• Relational Data (Tables/Transaction/Legacy
Data)
• Text Data (Web)
• Semi-structured Data (XML)
• Graph Data
– Social Network, Semantic Web (RDF), …
• Streaming Data
– You can only scan the data once
Who’s Generating Big Data
Mobile devices
(tracking all objects all the time)
• Data Volume
– 44x increase from 2009 2020
– From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in
collected/generated data
10
•A typical PC might have had 10 gigabytes of storage in 2000.
•Today, Face book ingests 500 terabytes of new data every day.
•Boeing 737 will generate 240 terabytes of flight data during a
single flight across the US.
•The smart phones, the data they create and consume; sensors
embedded into everyday objects will soon result in billions of new,
constantly-updated data feeds containing environmental, location,
and other information, including video.
Characteristics of Big Data:
2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video,
sequences, time series, social media
data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be
generating/collecting many types of
data
12
• Big Data isn't just numbers, dates, and strings. Big Data is
also geospatial data, 3D data, audio and video, and
unstructured text, including log files and social media.
• Traditional database systems were designed to address
smaller volumes of structured data, fewer updates or a
predictable, consistent data structure.
• Big Data analysis includes different types of data
Characteristics of Big Data:
3-Speed (Velocity)
14
• Click streams and ad impressions capture user behavior at
millions of events per second
• high-frequency stock trading algorithms reflect market
changes within microseconds
• machine to machine processes exchange data between
billions of devices
• infrastructure and sensors generate massive log data in real-
time
• on-line gaming systems support millions of concurrent users,
each producing multiple inputs per second.
Big Data is a Hot Topic Because Technology Makes
it Possible to Analyze ALL Available Data
Cost effectively manage and analyze
all available data in its native form
unstructured, structured, streaming
Why Big Data and BI
22
Source: Business Intelligence Strategy: A Framework for Achieving BI
Excellence
Big Data Conundrum
• Problems:
– Although there is a massive spike available data,
the percentage of the data that an enterprise can
understand is on the decline
– The data that the enterprise is trying to
understand is saturated with both useful signals
and lots of noise