Big Data Analytics
What Is Big Data Analytics?
● Big Data
– Buzz word
– Two definitions:
● Data sets too large for modern relational databases
● Semi-structured/Unstructured data sets
● Analytics
– The science of measuring and discovering patterns
and trends with data
Big Data Analytics - Introduction
Source: http://www.socialtalent.co/blog/big-data-whats-the-big-deal
Data, Data, Everywhere...
● In 2004:
– Internet traffic: 1 Exabyte (that's 134,217,728 8GB
flash drives)
– A lot of other media:
● Newspapers/books/magazines
● DVDs
Data, Data, Everywhere...
● Today:
– Internet traffic: 1.3 Zettabytes (that's
178,670,639,360 8 GB sticks)
● 110.3 exabytes per month
– Even more media:
● Mobile devices (phones/tablets/mp3 players/etc)
● The Internet of Things
● Streaming Media
The Internet of Things
● How many of you have...
– Fitness trackers?
– E-readers?
– Ipods?
● Tie them to social sites (i.e. Facebook)?
The Internet of Things
● You're being tracked!
● So what?
– Marketing
– Medical
– Government
● Building fuller picture of what's tracked.
Social Network Integration
Six Degrees of Separation
Source: http://www.83toinfinity.com
Source: http://www.math.cornell.edu/~numb3rs/blanco/social_net.jpg
Data Storage
Data Storage
● Relational Databases
– Structured data
– Can scale to huge volumes of data
● Hadoop
– Semi-structured/unstructured data
– Massively parallel storage and processing
Relational Database
Source: http://www.ntu.edu.sg/home/ehchua/programming/sql/images/ManyToOne.png
Unstructured Data
Source: http://storagegaga.com/2011/12/
Semi-structured
Source: http://www.stylusstudio.com/images/figures/sql_xml_xml_fragment.gif
What Solution to Pick?
● Data Volume and Speed
– Relational Databases Will Cap out
– ”Big Data” Stores Scale (For Now)
● Hadoop
● Spark
● Lucene
– Alternative Modeling Techniques
● Hyper Normalized (6-8NF)
– Inmon's Textual Disambiguation
– Anchor Modeling
– Data Vault
Big Data Analytics - Introduction
Hadoop
● Version 1
– Giant data store
– File distribution
– File parsing tools
– Generic security
● Version 2
– Giant data store
– Replaced foundation work
– Unified security -LDAP/Kerberos support
Tools
● Oozie
● Hive
● NoSQL Databases
– Hbase
– MongoDB
JSON
{
"employees": [
{ "firstName":"John" , "lastName":"Doe" },
{ "firstName":"Anna" , "lastName":"Smith" },
{ "firstName":"Peter" , "lastName":"Jones" }
]
}
Source: http://www.w3schools.com/json/json_syntax.asp
How to Analyze?
● Performance
● Timeliness
● Accuracy
● Feedback
“Big Data” Solutions
● Search the entire data set
● Great performance
● Highly accurate
● Integrates into Analytics tools
– Only some of the tools are able to support Hadoop,
etc.
Statistics
● Designed for all sizes of data sets
● Decreases time to results
● As accurate as needed
● Analytics tools fully support
● Most “Big Data” tools support
Analytics Tools
● Can access data of most sizes
– Most can handle Hadoop and some NoSQL
databases
● Built for Predictive Modeling
● Starting to handle social/network modeling
How to Get Started
● Grab some tools!
– RapidMiner (http://rapidminer.com/)
– R (http://www.r-project.org/)
– Weka (http://www.cs.waikato.ac.nz/ml/weka/)
● Grab some data!
– http://www.kdnuggets.com/datasets/index.html
– http://aws.amazon.com/publicdatasets/
– http://www.reddit.com/r/datasets
Prizes/Challenges
● Kaggle - https://www.kaggle.com/
● MIT - http://bigdata.csail.mit.edu/challenge
● Heritage Health Prize -
http://www.heritagehealthprize.com/c/hhp
● Twitter -
@OpenDataAlex
● LinkedIn –
alexmeadows
● Github - dbaAlex
Questions? Comments?

More Related Content

PPTX
Big Data Tutorial V4
PPTX
Presentation on Big Data Analytics
PDF
Lecture1 introduction to big data
PPTX
Introduction to Big Data
PPTX
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
PDF
Introduction to Big Data
PDF
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
PDF
Big data analytics with Apache Hadoop
Big Data Tutorial V4
Presentation on Big Data Analytics
Lecture1 introduction to big data
Introduction to Big Data
Big Data Tutorial | What Is Big Data | Big Data Hadoop Tutorial For Beginners...
Introduction to Big Data
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big data analytics with Apache Hadoop

What's hot (20)

PPTX
Big Data Analytics
PPTX
Tools and Methods for Big Data Analytics by Dahl Winters
PPT
Big Tools for Big Data
PPTX
Introduction to Big Data
PPSX
Big Data
PPTX
Big Data Analysis Patterns - TriHUG 6/27/2013
PPTX
PPTX
Big data unit 2
PPTX
Big Data Hadoop
PPTX
Big data deep learning: applications and challenges
DOCX
Big data abstract
PPTX
Are you ready for BIG DATA?
PDF
The evolution of data analytics
PPTX
Big Data & Data Science
PPTX
Exploring Big Data Analytics Tools
PDF
Big data Big Analytics
PDF
Introduction to Big Data
PPT
Overview of Bigdata Analytics
PPTX
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
PPTX
big data overview ppt
Big Data Analytics
Tools and Methods for Big Data Analytics by Dahl Winters
Big Tools for Big Data
Introduction to Big Data
Big Data
Big Data Analysis Patterns - TriHUG 6/27/2013
Big data unit 2
Big Data Hadoop
Big data deep learning: applications and challenges
Big data abstract
Are you ready for BIG DATA?
The evolution of data analytics
Big Data & Data Science
Exploring Big Data Analytics Tools
Big data Big Analytics
Introduction to Big Data
Overview of Bigdata Analytics
Big Data Presentation - Data Center Dynamics Sydney 2014 - Dez Blanchfield
big data overview ppt
Ad

Viewers also liked (16)

ODP
Continuous integration with business intelligence and analytics
PDF
Introduction to Big Data Analytics and Data Science
PDF
4 Big Analytic Types That You Should Know By Wayne Chen
PPTX
Learning Analytics Medea Webinar, part 1
PPTX
Introduction to Big Data & Analytics
PPTX
G finals
DOCX
Big data lecture notes
PPT
Automated Testing vs Manual Testing
PPTX
Chemathlon 2016 finals
PPTX
Chemathlon 2016
PDF
Introduction to Test Automation
PDF
Introduction to Data Mining and Big Data Analytics
PPTX
Predictive Analytics - An Overview
PDF
8 Ways to Personalize Your App (in Under 30 Minutes)
PPTX
What is Big Data?
PPTX
Big Data - 25 Amazing Facts Everyone Should Know
Continuous integration with business intelligence and analytics
Introduction to Big Data Analytics and Data Science
4 Big Analytic Types That You Should Know By Wayne Chen
Learning Analytics Medea Webinar, part 1
Introduction to Big Data & Analytics
G finals
Big data lecture notes
Automated Testing vs Manual Testing
Chemathlon 2016 finals
Chemathlon 2016
Introduction to Test Automation
Introduction to Data Mining and Big Data Analytics
Predictive Analytics - An Overview
8 Ways to Personalize Your App (in Under 30 Minutes)
What is Big Data?
Big Data - 25 Amazing Facts Everyone Should Know
Ad

Similar to Big Data Analytics - Introduction (20)

PPT
Big data
PPTX
BigData
PPTX
Big Data Analytics MIS presentation
PPT
Data analytics & its Trends
PPT
Big data analytics, survey r.nabati
PPTX
Data analytics introduction
PPTX
big data processing.pptx
PPTX
Big data analytics: Technology's bleeding edge
PPTX
Big data Analytics Unit - CCS334 Syllabus
PPTX
Big Data - An Overview
PPTX
selected topics in CS-CHaaapteerobe.pptx
PDF
Big Data Science Workshop Documentation V1.0
PPTX
Intro big data analytics
PPTX
Data mining with big data implementation
PPTX
Big data Analytics Fundamentals Chapter 1
PPTX
PDF
A REVIEW PAPER ON BIG DATA ANALYTICS
PPTX
INTRODUCTION TO BIG DATA AND HADOOP
PPTX
Bigdata and Hadoop with applications
PDF
@vtucode.in-21CS71-module-1-pdf.pdfBig data
Big data
BigData
Big Data Analytics MIS presentation
Data analytics & its Trends
Big data analytics, survey r.nabati
Data analytics introduction
big data processing.pptx
Big data analytics: Technology's bleeding edge
Big data Analytics Unit - CCS334 Syllabus
Big Data - An Overview
selected topics in CS-CHaaapteerobe.pptx
Big Data Science Workshop Documentation V1.0
Intro big data analytics
Data mining with big data implementation
Big data Analytics Fundamentals Chapter 1
A REVIEW PAPER ON BIG DATA ANALYTICS
INTRODUCTION TO BIG DATA AND HADOOP
Bigdata and Hadoop with applications
@vtucode.in-21CS71-module-1-pdf.pdfBig data

More from Alex Meadows (16)

PPTX
Ethics In A Data Driven World
PDF
SIM RTP Meeting - So Who's Using Open Source Anyway?
ODP
Introduction To Data Warehousing
ODP
Continuous Integration As A Service
ODP
Building next generation data warehouses
PPTX
How Linked Data Can Speed Information Discovery
ODP
Graphing Your Data
ODP
Introduction To Analytics
PDF
Big Data Pitfalls
PDF
Open Source BI Overview
PDF
Agile Business Intelligence
ODP
Open source data_warehousing_overview
ODP
Data quality overview
ODP
Mondrian and OLAP Overview
ODP
Open Source Business Intelligence Overview
ODP
Choosing the right steps in pentaho kettle
Ethics In A Data Driven World
SIM RTP Meeting - So Who's Using Open Source Anyway?
Introduction To Data Warehousing
Continuous Integration As A Service
Building next generation data warehouses
How Linked Data Can Speed Information Discovery
Graphing Your Data
Introduction To Analytics
Big Data Pitfalls
Open Source BI Overview
Agile Business Intelligence
Open source data_warehousing_overview
Data quality overview
Mondrian and OLAP Overview
Open Source Business Intelligence Overview
Choosing the right steps in pentaho kettle

Recently uploaded (20)

PDF
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
4 layer Arch & Reference Arch of IoT.pdf
PDF
Electrocardiogram sequences data analytics and classification using unsupervi...
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
PDF
The AI Revolution in Customer Service - 2025
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PDF
SaaS reusability assessment using machine learning techniques
PDF
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PPTX
Internet of Everything -Basic concepts details
PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PPTX
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
PPTX
SGT Report The Beast Plan and Cyberphysical Systems of Control
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Aug23rd - Mulesoft Community Workshop - Hyd, India.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
4 layer Arch & Reference Arch of IoT.pdf
Electrocardiogram sequences data analytics and classification using unsupervi...
Data Virtualization in Action: Scaling APIs and Apps with FME
The AI Revolution in Customer Service - 2025
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
Microsoft User Copilot Training Slide Deck
Advancing precision in air quality forecasting through machine learning integ...
A symptom-driven medical diagnosis support model based on machine learning te...
SaaS reusability assessment using machine learning techniques
MENA-ECEONOMIC-CONTEXT-VC MENA-ECEONOMIC
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
Internet of Everything -Basic concepts details
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
EIS-Webinar-Regulated-Industries-2025-08.pdf
agenticai-neweraofintelligence-250529192801-1b5e6870.pptx
SGT Report The Beast Plan and Cyberphysical Systems of Control
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes

Big Data Analytics - Introduction