0% found this document useful (0 votes)

181 views51 pages

Introduction To Big Data & Basic Data Analysis

This document provides an introduction to basic concepts in big data and data analysis. It defines big data as high-volume, high-velocity, or high-variety information assets that require new processing forms to enable enhanced decision making and insights. Examples of big data sources include transactions from Walmart, photos on Facebook, and credit card fraud detection. The document also outlines the four stages ("4 A's") of the data lifecycle: acquisition, aggregation, analysis and application. It provides an overview of computational views of big data and related topics like data visualization, machine learning, and data mining.

Uploaded by

dinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

181 views51 pages

Introduction To Big Data & Basic Data Analysis

Uploaded by

dinesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Introduction to Big Data

& Basic Data Analysis

Basic Concepts in Big Data
What is big data?
"Big Data are high-volume, high-velocity,
and/or high-variety information assets that
require new forms of processing to enable
enhanced decision making, insight discovery
and process optimization (Gartner 2012)
Complicated (intelligent) analysis of data
may make a small data appear to be big
Bottom line: Any data that exceeds our
current capability of processing can be
regarded as big
Why is big data a big
Government deal?
Obama administration announced big data initiative
Many different big data programs launched
Private Sector
Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than 2.5
petabytes of data
Facebook handles 40 billion photos from its user base.
Falcon Credit Card Fraud Detection System protects 2.1 billion active
accounts world-wide
Science
Large Synoptic Survey Telescope will generate 140 Terabyte of data every
5 days.
Biomedical computation like decoding human Genome & personalized
medicine
Social science revolution
-
Lifecycle of Data: 4 As
In
ed D te
er Aggregatio a g
att ta rat
c
S ta n ed
Da
Acquisition Analysis
g e
Log ed
da l
ta ow
Application Kn
Computational View of Big Data

Data
Visualization
Data Access Data Analysis

Data Understanding Data Integration

Formatting, Cleaning

Storage Data
Big Data & Related Topics/Courses
CS19
Human-Computer Interaction
9
Data
Visualization Machine Learning
DatabasesInformation Retrieval
Data Access Data Analysis
Data Mining
Computer Vision
Speech Recognition
Data Understanding Data Integration
Natural Language ProcessingData Warehousing

Formatting, Cleaning
Signal Processing
Many
Storage Applications!
Data
Information Theory
Some Data Analysis Techniques

Visualizat
ion
Classificati Predictive
on Modeling
Time Clusteri
Series ng
Big Data EveryWhere!

Lots of data is being collected

and warehoused
Web data, e-commerce
purchases at department/
grocery stores
Bank/Credit Card
transactions
Social Network
How much data?
Google processes 20 PB a day (2008)
Facebook has 2.5 PB of user data + 15
TB/day (4/2009)
eBay has 6.5 PB of user data + 50 TB/day
(5/2009)

640K ought to be
enough for
anybody.
The Earthscope
The Earthscope is the world's
largest science project.
Designed to track North
America's geological evolution,
this observatory records data
over 3.8 million square miles,
amassing 67 terabytes of data.
much more.
(http://www.msnbc.msn.com/id/4
4363598/ns/technology_and_sci
ence-
future_of_technology/#.TmetOd
Q--uI)
Type of Data
Relational Data
(Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF),
What to do with these data?
Aggregation and Statistics
Data warehouse and OLAP
Indexing, Searching, and Querying
Keyword based search
Pattern matching (XML/RDF)
Knowledge discovery
Data Mining
Statistical Modeling
OLAP and Data Mining
Warehouse Architecture
Client Client

Query &
Analysis

Metadata Warehous
e

Integration

Sourc Sourc Sourc

e e e
15
Star Schemas

store
storeId
city

17
Star
product prodId name price store storeId city
p1 bolt 10 c1 nyc
p2 nut 5
c2 sfo
c3 la

sale oderId date custId prodId storeId qty amt

o100 1/7/97 53 p1 c1 1 12
o102 2/7/97 53 p2 c1 2 11
105 3/8/97 111 p1 c3 5 50

customer custId name address city

53 joe 10 main sfo
81 fred 12 main sfo
111 sally 80 willow la

18
Cube

Fact table view:

Multi-dimensional cube:
sale prodId storeId amt
p1 c1 12 c1 c2 c3
p2 c1 11 p1 12 50
p1 c3 50 p2 11 8
p2 c2 8

dimensions = 2

19
3-D Cube

Fact table view: Multi-dimensional cube:

sale prodId storeId date amt

p1 c1 1 12
p2 c1 1 11 c1 c2 c3
p1 c3 1 50 day 2 p1 44 4
p2 c2 1 8 p2 c1 c2 c3
p1 c1 2 44 day 1 p1 12 50
p1 c2 2 4 p2 11 8

dimensions = 3

20
ROLAP vs. MOLAP
ROLAP:
Relational On-Line Analytical
Processing
MOLAP:
Multi-Dimensional On-Line Analytical
Processing

21
Aggregates
Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50 81
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4

22
Aggregates
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11 ans date sum
p1 c3 1 50 1 81
p2 c2 1 8 2 48
p1 c1 2 44
p1 c2 2 4

23
Another Example
Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId storeId date amt
p1 c1 1 12 sale prodId date amt
p2 c1 1 11
p1 1 62
p1 c3 1 50
p2 1 19
p2 c2 1 8
p1 c1 2 44 p1 2 48
p1 c2 2 4

rollup

drill-down

24
Aggregates
Operators: sum, count, max, min,
median, ave
Having clause
Using dimension hierarchy
average by region (within store)
maximum by month (within date)

25
What is Data Mining?

Discovery of useful, possibly

unexpected, patterns in data
Extraction of implicit, previously
unknown and potentially useful
information from data
Exploration & analysis, by automatic
or
semi-automatic means, of large
quantities of data in order to discover
meaningful patterns
Data Mining Tasks
Classification [Predictive]
Clustering [Descriptive]
Association Rule Discovery [Descriptive]
Sequential Pattern Discovery [Descriptive]
Regression [Predictive]
Classification: Definition
Given a collection of records (training set )
Each record contains a set of attributes, one of the
attributes is the class.
Find a model for class attribute as a function
of the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to
build the model and test set used to validate it.
Decision Trees
Example:
Conducted survey to see what customers were
interested in new model car
Want to select customers for advertising campaign

training
set

29
Clustering

income

education

age

30
K-Means Clustering

31
Association Rule Mining
t ion er ts
ac m c
a ns
d sto odu ht
t r i cu id pr oug
b

sales
market-basket
records:
data

Trend: Products p5, p8 often bough together

Trend: Customer 12 likes product p9

32
Association Rule Discovery
Marketing and Sales Promotion:
Let the rule discovered be
{Bagels, } --> {Potato Chips}
Potato Chips as consequent => Can be used to
determine what should be done to boost its sales.
Bagels in the antecedent => can be used to see which
products would be affected if the store discontinues
selling bagels.
Bagels in antecedent and Potato chips in consequent
=> Can be used to see what products should be sold
with Bagels to promote sale of Potato chips!
Supermarket shelf management.
Inventory Managemnt
Other Types of Mining
Text mining: application of data mining to
textual documents
cluster Web pages to find related pages
cluster pages a user has visited to organize
their visit history
classify Web pages automatically into a Web
directory
Graph Mining:
Deal with graph data

34
The Meaning of Big Data - 3
Vs
Big Volume
With simple (SQL) analytics
With complex (non-SQL) analytics

Big Velocity
Drink from the fire hose

Big Variety
Large number of diverse data sources to
integrate

35
The Participants

Row storage and row executor

Microsoft Madison, DB2, Netezza, Oracle(!)

Column store grafted onto a row executor (wannabees)

Terradata/Asterdata, EMC/Greenplum

Column store and column executor

HP/Vertica, Sybase/IQ, Paraccel

Oracle Exadata is not:

a column store
a scalable shared-nothing architecture

36
Hadoop..

Simple analytics
X100 times a parallel DBMS
Complex analytics (Mahout or roll-your-own)
X100 times Scalapack
Parallel programming
Parallel grep (great)
Everything else (awful)
Hadoop lacks
Stateful computations
Point-to-point communication

37
Big Velocity
Sensor tagging everything of value
sends velocity through the roof
E.g. car insurance

Smart phones as a mobile platform

sends velocity through the roof

State of multi-player internet games

must be recorded sends velocity
through the roof
38
New OLTP
You need to ingest a
fire hose in real-time

You need to perform

high volume OLTP

You often need real-

time analytics

39
VoltDB: an example of
New SQL
A main memory SQL engine

Open source

Shared nothing, Linux, TCP/IP on jelly beans

Light-weight transactions
Run-to-completion with no locking

Single-threaded
Multi-core by splitting main memory

About 100x RDBMS on TPC-C

40
Big Variety
Typical enterprise has 5000 operational systems
Only a few get into the data warehouse
What about the rest?

And what about all the rest of your data?

Spreadsheets
Access data bases
Web pages

And public data from the web?

41
The World of Data
Integration
the rest of your data

enterprise text
data warehouse

42
Summary
The rest of your data (public and private)
Is a treasure trove of incredibly valuable
information

Largely untapped

43
IoT Meets Big Data

44
Big Data Value Chain
Discove
Ingestio ry & Integrat
Collection Analysis Delivery
n Cleansin ion
g

Collection Structured, unstructured and semi-structured data from

multiple sources
Ingestion loading vast amounts of data onto a single data store
Discovery & Cleansing understanding format and content; clean
up and formatting
Integration linking, entity extraction, entity resolution, indexing
and data fusion
Analysis Intelligence, statistics, predictive and text analytics,
Need learning
machine for Standardized Approaches At
Delivery querying, visualization, real time delivery on enterprise-
class availability
Each Step
Source OReilly Strata 2012

12
45
45
Considerations for Big Data Standardization

Variety of Use Data Characteristics

Cases Distributed /
Centralized
Mobility
The 4 Vs : Volume,
Security & Privacy Velocity, Variety,
Lifecycle Veracity
Management & Data Collection
Data Quality Data Visualization
System Data Quality
Management & Data Analytics &
Other Issues Action
46
Data Sources
Source Any*

Anytime
Sensors
Anything
Applications
Any Device
Software agents
Any Context
Individuals
Any Place
Organizations
Anywhere
Hardware resources
Any one

47
Big Data Standardization Challenges
(1)
Big Data use cases, definitions, vocabulary and reference architectures
(e.g. system, data, platforms, online/offline)
Specifications and standardization of metadata including data
provenance
Application models (e.g. batch, streaming)
Query languages including non-relational queries to support diverse
data types (XML, RDF, JSON, multimedia) and Big Data operations (e.g.
matrix operations)
Domain-specific languages
Semantics of eventual consistency
Advanced network protocols for efficient data transfer
General and domain specific ontologies and taxonomies for describing
data semantics including interoperation between ontologies

Source : ISO

48
Big Data Standardization
Challenges (2)
Big Data security and privacy access controls
Remote, distributed, and federated analytics (taking the
analytics to the data) including data and processing
resource discovery and data mining
Data sharing and exchange
Data storage, e.g. memory storage system, distributed file
system, data warehouse, etc.
Human consumption of the results of big data analysis (e.g.
visualization)
Interface between relational (SQL) and non-relational
(NoSQL)
Big Data Quality and Veracity description and management
Source : ISO

49
Big Data Seminar Report with ppt and pdf
The Structure of Big Data
Structured
Most traditional data sources
Semi-structured
Many sources of big data
Unstructured
Video data, audio data
Benefits of Big Data
Big Data is already an important part of the $64 billion
database and data analytics market
It offers commercial opportunities of a comparable
Sekhar Kondepudi
[email protected]
www.kondepudi-group.info
M : +65 98566472

Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
43 pages
Data Monetization: A Definition
No ratings yet
Data Monetization: A Definition
3 pages
Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
No ratings yet
Big Data and Business Analytics: Trends, Platforms, Success Factors and Applications
32 pages
Financial Management Guide
No ratings yet
Financial Management Guide
94 pages
Telecommunications Sector
No ratings yet
Telecommunications Sector
16 pages
Data Analytics
100% (1)
Data Analytics
24 pages
Analytics: The Real-World Use of Big Data: How Innovative Enterprises Extract Value From Uncertain Data
100% (1)
Analytics: The Real-World Use of Big Data: How Innovative Enterprises Extract Value From Uncertain Data
22 pages
Understanding Drill Down and Drill Up
No ratings yet
Understanding Drill Down and Drill Up
24 pages
Introduction to Data Science Concepts
100% (1)
Introduction to Data Science Concepts
20 pages
Data Analytics Process Explained
No ratings yet
Data Analytics Process Explained
2 pages
00 Top 20 Best Data Science Books You Should Read
No ratings yet
00 Top 20 Best Data Science Books You Should Read
8 pages
Big Data Analytics in Marketing Strategies
No ratings yet
Big Data Analytics in Marketing Strategies
7 pages
Understanding Strategic Business Units
No ratings yet
Understanding Strategic Business Units
35 pages
Business Justification
No ratings yet
Business Justification
15 pages
Business Analytics For Managers-Unit1&2
100% (1)
Business Analytics For Managers-Unit1&2
39 pages
Big Data Market Trends and Forecasts
No ratings yet
Big Data Market Trends and Forecasts
16 pages
Data Warehouse Training Guide
0% (1)
Data Warehouse Training Guide
60 pages
Role of Big Data Analytics in Banking
No ratings yet
Role of Big Data Analytics in Banking
6 pages
Advanced SQL Techniques for Data Analysis
No ratings yet
Advanced SQL Techniques for Data Analysis
17 pages
Business Intelligence
No ratings yet
Business Intelligence
98 pages
Big Data in Government Use
No ratings yet
Big Data in Government Use
4 pages
Big Data Analytics System Tutorial
100% (1)
Big Data Analytics System Tutorial
36 pages
The Basics of Business Intelligence
No ratings yet
The Basics of Business Intelligence
3 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
7 pages
Data Flow Diagram
No ratings yet
Data Flow Diagram
7 pages
Assignment 1&2
No ratings yet
Assignment 1&2
4 pages
Advanced Analytics Capabilities in Tableau
No ratings yet
Advanced Analytics Capabilities in Tableau
21 pages
DevExpress Dashboard
No ratings yet
DevExpress Dashboard
67 pages
Financing Programs For Womens Financial Inclusion and Access To Finance For Women MSMEs Results From A Survey of Public Development Banks
No ratings yet
Financing Programs For Womens Financial Inclusion and Access To Finance For Women MSMEs Results From A Survey of Public Development Banks
73 pages
Intro To Ethical Hacking
100% (1)
Intro To Ethical Hacking
90 pages
Big Data For Marketing Resource Reallocation
No ratings yet
Big Data For Marketing Resource Reallocation
31 pages
Future of Data Analytics Guide
No ratings yet
Future of Data Analytics Guide
18 pages
Big Data and Social Media Analytics
No ratings yet
Big Data and Social Media Analytics
6 pages
Bumper Book
No ratings yet
Bumper Book
90 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
14 pages
Data Visualization and Customer Segmentation Slides 2009
100% (1)
Data Visualization and Customer Segmentation Slides 2009
42 pages
Bad Data Good Companies 106465
No ratings yet
Bad Data Good Companies 106465
47 pages
Business Intelligence and Data Analytics Overview
No ratings yet
Business Intelligence and Data Analytics Overview
15 pages
Tableau Exasol WhitePaper
No ratings yet
Tableau Exasol WhitePaper
9 pages
Data Mining Information
100% (1)
Data Mining Information
15 pages
Business Network Analysis Course 2023
No ratings yet
Business Network Analysis Course 2023
2 pages
Data Mining Concepts and Techniques
100% (1)
Data Mining Concepts and Techniques
55 pages
Big Data Investment and Benefits Analysis
No ratings yet
Big Data Investment and Benefits Analysis
8 pages
Predictive Analytics - Share - V5
No ratings yet
Predictive Analytics - Share - V5
32 pages
Business Intelligence Essentials
No ratings yet
Business Intelligence Essentials
21 pages
Real-Time Object Detection Overview
No ratings yet
Real-Time Object Detection Overview
56 pages
Preparing Data For Analysis Using Excel
No ratings yet
Preparing Data For Analysis Using Excel
10 pages
C Boe Taxes and Investing
No ratings yet
C Boe Taxes and Investing
27 pages
Big Data Analytics in Heart Attack Prediction 2167 1168 1000393
No ratings yet
Big Data Analytics in Heart Attack Prediction 2167 1168 1000393
9 pages
Cloud Computing's Impact on Big Data
No ratings yet
Cloud Computing's Impact on Big Data
5 pages
Modern Data Science - Best Practices For Predictive Analytics
No ratings yet
Modern Data Science - Best Practices For Predictive Analytics
11 pages
Data Marketplaces Explained: 2023 Guide
No ratings yet
Data Marketplaces Explained: 2023 Guide
11 pages
Big Data Analytics
No ratings yet
Big Data Analytics
18 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
Big Data: Types, Trends, and Analytics
No ratings yet
Big Data: Types, Trends, and Analytics
74 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
Unit-1 PPT Dma
No ratings yet
Unit-1 PPT Dma
83 pages
Big Data Lesson 1 Lucrezia Noli
No ratings yet
Big Data Lesson 1 Lucrezia Noli
46 pages
Overview of Information Technology Basics
No ratings yet
Overview of Information Technology Basics
45 pages
Clustering Techniques in Data Mining
No ratings yet
Clustering Techniques in Data Mining
5 pages
Bach Flower Remedies Guide in Hindi
No ratings yet
Bach Flower Remedies Guide in Hindi
213 pages
Gayatri Mantra Mudra Yoga Guide
0% (1)
Gayatri Mantra Mudra Yoga Guide
2 pages
Big Data Handling for Researchers
No ratings yet
Big Data Handling for Researchers
8 pages
Ramesh Damani and Rakesh Jhunjhunwala Porfolio
No ratings yet
Ramesh Damani and Rakesh Jhunjhunwala Porfolio
8 pages
NCERT Solutions For Class 11th: CH 1 Introduction To Accounting Accountancy
100% (1)
NCERT Solutions For Class 11th: CH 1 Introduction To Accounting Accountancy
36 pages
ABET Course Syllabus - CS345
No ratings yet
ABET Course Syllabus - CS345
2 pages
Chakra Healing with Herbs
No ratings yet
Chakra Healing with Herbs
1 page
Data - of Best Kind
No ratings yet
Data - of Best Kind
4 pages
1 Greeting
No ratings yet
1 Greeting
63 pages
Networking Basics Explained
No ratings yet
Networking Basics Explained
9 pages
Class 11 English Core Syllabus Overview
No ratings yet
Class 11 English Core Syllabus Overview
6 pages
Mother Tincture in Homeopathy
100% (2)
Mother Tincture in Homeopathy
10 pages
Lo-Shu Numerology: Unlocking Luck
100% (3)
Lo-Shu Numerology: Unlocking Luck
7 pages
Answering A Case Study
No ratings yet
Answering A Case Study
2 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
C# and .NET Lab Manual
0% (1)
C# and .NET Lab Manual
23 pages
Heap Sort Algorithm Explained
No ratings yet
Heap Sort Algorithm Explained
2 pages
E-Commerce Basics and Market Insights
No ratings yet
E-Commerce Basics and Market Insights
12 pages
Systems Development Lifecycle Overview
No ratings yet
Systems Development Lifecycle Overview
20 pages
Data Structure Sorting
No ratings yet
Data Structure Sorting
138 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
73 pages
Diabetes Prediction System Overview
No ratings yet
Diabetes Prediction System Overview
3 pages
01 Intro
No ratings yet
01 Intro
46 pages
Clustering Tutorial
No ratings yet
Clustering Tutorial
4 pages
UNIT 4. Exercise
No ratings yet
UNIT 4. Exercise
4 pages
Memory Architectures in Cyber-Physical Systems
No ratings yet
Memory Architectures in Cyber-Physical Systems
51 pages
Data Mining in Telecommunications and Studying Its Status in Iran Telecom Companies and Operators
No ratings yet
Data Mining in Telecommunications and Studying Its Status in Iran Telecom Companies and Operators
2 pages
Data Mining Resources
No ratings yet
Data Mining Resources
26 pages
Data Mining and Knowledge Management
No ratings yet
Data Mining and Knowledge Management
9 pages
Data Analysis & AI for Weather Prediction
0% (1)
Data Analysis & AI for Weather Prediction
2 pages
Master of Data Science Programme Guide
No ratings yet
Master of Data Science Programme Guide
29 pages
Innovations in Big Data Mining
100% (2)
Innovations in Big Data Mining
286 pages
DWM Report
No ratings yet
DWM Report
12 pages
Business Intelligence and Decision Support Systems
No ratings yet
Business Intelligence and Decision Support Systems
4 pages
Data Mining for Construction Risk Management
100% (1)
Data Mining for Construction Risk Management
4 pages
Walmart Sales Forecasting Analysis
No ratings yet
Walmart Sales Forecasting Analysis
35 pages
Implementasi Data Mining Clustering Tingkat Kepuasan Konsumen Terhadap Pelayanan Go-Jek
No ratings yet
Implementasi Data Mining Clustering Tingkat Kepuasan Konsumen Terhadap Pelayanan Go-Jek
7 pages
2011 DSS Detecting Evolutionary Financial Statement Fraud PDF
No ratings yet
2011 DSS Detecting Evolutionary Financial Statement Fraud PDF
7 pages
Data Mining Issues
0% (1)
Data Mining Issues
5 pages
PROPOSAL
No ratings yet
PROPOSAL
14 pages
KNN, Decision Trees, Naive Bayes Overview
No ratings yet
KNN, Decision Trees, Naive Bayes Overview
6 pages
Data Mining for Business Intelligence
No ratings yet
Data Mining for Business Intelligence
6 pages
Course Outline - Data Mining
No ratings yet
Course Outline - Data Mining
18 pages
MLL
No ratings yet
MLL
2 pages
DMPM VD1 Assignment Question and Marking Criteria
0% (1)
DMPM VD1 Assignment Question and Marking Criteria
2 pages
BI Module 4 Notes
No ratings yet
BI Module 4 Notes
31 pages
5th International Conference On Machine Learning Techniques and Data Science (MLDS 2024)
No ratings yet
5th International Conference On Machine Learning Techniques and Data Science (MLDS 2024)
1 page
K-Medoids Clustering Overview
No ratings yet
K-Medoids Clustering Overview
36 pages
Data Mining Project
No ratings yet
Data Mining Project
33 pages
Crop Yield Prediction Using ML Techniques
No ratings yet
Crop Yield Prediction Using ML Techniques
113 pages

Introduction To Big Data & Basic Data Analysis

Uploaded by

Introduction To Big Data & Basic Data Analysis

Uploaded by

Introduction to Big Data

& Basic Data Analysis

Data Understanding Data Integration

Lots of data is being collected

Sourc Sourc Sourc

A star schema is a common

sale oderId date custId prodId storeId qty amt

customer custId name address city

Fact table view:

Fact table view: Multi-dimensional cube:

sale prodId storeId date amt

Discovery of useful, possibly

Trend: Products p5, p8 often bough together

Row storage and row executor

Column store grafted onto a row executor (wannabees)

Column store and column executor

Oracle Exadata is not:

Smart phones as a mobile platform

State of multi-player internet games

You need to perform

You often need real-

Shared nothing, Linux, TCP/IP on jelly beans

About 100x RDBMS on TPC-C

And what about all the rest of your data?

And public data from the web?

Collection Structured, unstructured and semi-structured data from

Variety of Use Data Characteristics

You might also like