SlideShare a Scribd company logo
qCube:
Efficient integration of range query operators over a high
dimension data cube
Rodrigo Rocha Silva
Doctorate Student
Prof. Dr. Celso Massaki Hirata
Advisor
Prof. Dr. Joubert de Castro Lima
Co-Advisor
ITA – INSTITUTO TECNOLÓGICO DE AERONÁUTICA

Electronic Engineering and Computer Science Division - EEC/I
Department of Computer Science
Brazil
qCube: Efficient integration of range query operators over a high dimension data cube

Goal
Present a new cube approach, designed for
high dimension range queries. Our cube
approach, named Query Cube (qCube),
implements Equal, Not Equal, Greater or
Less than, Some, Between and Similar
range query operators and Distinct, Subcube and Top-k Similar inquire query
operators

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

2
qCube: Efficient integration of range query operators over a high dimension data cube

Topics
–
–
–
–
–
–
–

Motivation
Data Cube
Related Work
Query Cube (qCube)
Experiments
Results
Conclusions

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

3
qCube: Efficient integration of range query operators over a high dimension data cube

Motivation
Users need to view data in a tangible way, such as reports,
cross tables and histograms

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

4
qCube: Efficient integration of range query operators over a high dimension data cube

Motivation
• Suppose that at some decision-making
process it is necessary the following
information :
“What is the women journal research papers
variance impact, using months {1, 3, 5, 7,
11}, year 2012 and ages varying from 25-40
years? Return results for all countries”

“The average temperatures above 30 degrees
Celsius on the weekends of leap years in the last
200 years.”
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

5
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
A data cube, introduced by Gray et al., 1996, is
a generalization of the group-by operator over all
possible combinations of dimensions with
various granularity aggregates.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

6
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube

A data cube has exponential
complexity with respect to the
number of dimensions
For an input with size d the
output has size 2d

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

7
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
• Hierarchies
Year

Discipline

Day

Department

Year

Wednesday, October 02, 2012

Hour

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

8
qCube: Efficient integration of range query operators over a high dimension data cube

Data Cube
A

C

COUNT

A

B

C

COUNT

*

*

*

11

*

b2

c1

1

a1

*

*

3

*

b2

c2

1

a2

*

*

5

*

b3

c2

3

a3

Base Relation R – 11 tuples

B

*

*

3

a1

b1

c1

1

A

B

C

COUNT

*

b1

*

6

a3

b3

c2

1

a1

b1

c1

1

*

b2

*

2

a2

b3

c2

1

a3

b3

c2

1

*

b3

*

3

a3

b1

c1

1

a2

b3

c2

1

*

*

c1

4

a2

b1

c1

1

a3

b1

c1

1

*

*

c2

7

a2

b2

c2

1

a2

b1

c1

1

a1

b1

*

2

a1

b1

c2

1

a2

b2

c2

1

a1

b3

*

1

a2

b2

c1

1

a1

b1

c2

1

a2

b1

*

2

a3

b1

c2

1

a2

b2

c1

1

a2

b2

*

2

a1

b3

c2

1

a3

b1

c2

1

a2

b3

*

1

a2

b1

c2

1

a1

b3

c2

1

a3

b1

*

2

a2

b1

c2

1

a3

b3

*

1

a1

*

c1

1

a1

*

c2

2

a2

*

c1

2

a2

*

c2

3

a3

*

c1

1

a3

*

c2

2

*

b1

c1

3

*

b1

c2

3

Wednesday, October 02, 2012

FULL 3D CUBE

+

38 tuples

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

9
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach
•

Partitions the data vertically

•

Reduces high-dimensional cube into a set of lower
dimensional cubes

•

Lossless reduction

•

Offers tradeoffs between the amount of pre-processing
and the speed of online computation

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

10
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Example
• Let the cube aggregation function be count
tid

A

B

C

D

E

1

a1

b1

c1

d1

e1

2

a1

b2

c1

d2

e1

3

a1

b2

c1

d1

e2

4

a2

b1

c1

d1

e2

5

a2

b1

c1

d1

e3

• Divide the 5 dimensions into 2 shell fragments:
– (A, B, C) and (D, E)
From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

11
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing 1-D Inverted Indices
• Build traditional invert index or RID list
Attribute Value

TID List

List Size

a1

123

3

a2

45

2

b1

145

3

b2

23

2

c1

12345

5

d1

1345

4

d2

2

1

e1

12

2

e2

34

2

e3

5

1

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

12
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach
• Generalize the 1-D inverted indices to multi-dimensional
ones in the data cube sense
• Compute all cuboids for data cubes ABC and DE while
retaining the inverted indices
• For example, shell
fragment cube ABC
contains 7 cuboids:
– A, B, C
– AB, AC, BC
– ABC
• This completes the offline
computation stage

Cell

Intersection

TID List List Size

a1 b1

1 2 3 ∩1 4 5

1

1

a1 b2

1 2 3 ∩2 3

23

2

a2 b1

4 5 ∩1 4 5

45

2

a2 b2

4 5 ∩2 3

⊗

0

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

13
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Measure Table
• If measures other than count are present, store in
ID_measure table separate from the shell fragments
tid

count

sum

1

5

70

2

3

10

3

8

20

4

5

40

5

2

30

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

14
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Query
•

Given the fragment cubes, process a query as follows

1.

Divide the query into fragment, same as the shell

2.

Fetch the corresponding TID list for each fragment
from the fragment cube

3.

Intersect the TID lists from each fragment to construct
instantiated base table

4.

Compute the data cube using the base table with any
cubing algorithm
From book Han and Kamber: Data Mining Concepts and Techniques

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

15
qCube: Efficient integration of range query operators over a high dimension data cube

Related Work – Frag-Cubing Approach
A B C D E F G H I J K L M N …

Base Table

Online
Computation

From book Han and Kamber: Data Mining Concepts and Techniques
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

16
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Approach
Implements a set of tuple identifiers per dimension attribute,
similar to Frag-Cubing;
Therefore, qCube can answer point queries using tuple
identifiers intersections and range queries using unions plus
intersections algorithms, regardless measure function types.
Frag-Cubing just implements point and some inquire queries.
There is no Frag-Cubing solution for queries like

“What is the women journal research papers variance impact,
using months {1, 3, 5, 7, 11}, year 2012 and ages varying
from 25-40 years? Return results for all countries”
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

17
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Approach
Implements the range query operators:
• Equal;
• Not Equal;
• Greater or Less than;
• Some;
• Between and Similar.
Also implements inquire query operators:
• Distinct;
• Sub-cube;
• Top-k Similar.
Over a high dimension data cube.
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

18
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Architecture

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

19
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Computation
TID
1
2
3
4
5
6

A
a1
a2
a1
a3
a1
a5

Function
tid
1
2
3
4
5
6

B
b1
b2
b1
b3
b1
b5

C
c1
c2
c1
c3
c4
c5

D
d1
d2
d1
d2
d1
d2

Variance
M1
2.56
3.14
2.45
6.7
9
1

Wednesday, October 02, 2012

E
e1
e2
e1
e2
e2
e2

Count
M2
1
1
1
1
1
1

Attribute Value TID List

Attribute Value TID List

a1
a2
a3
a5
b1
b2
b3
b5
c1

c2
c3
c4
c5
d1
d2
e1
e2

Average
M3
10
20
10
11
3
1

1, 3, 5
2
4
6
1, 3, 5
2
4
6
1, 3

Skewness
M4
1
0
1
1
1
1

2
4
5
6
1, 3, 5
2, 4, 6
1, 3
2, 4, 5, 6

Standard deviation
M5
877686769698
7986676867.99
-7878789.8777
-99974333.23
100045.655
1

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

20
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Update

The same qCube Computation algorithm

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

21
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Update
TID
1
2
3
4
5
6

A
a1
a2
a1
a3
a1
a5

B
b1
b2
b1
b3
b1
b5

C
c1
c2
c1
c3
c4
c5

D
d1
d2
d1
d2
d1
d2

E
e1
e2
e1
e2
e2
e2

Attribute Value
a1
a2
a3
a5
b1
b2
b3
b5
c1
c2
c3
Wednesday, October 02, 2012

tid
5
7
8
9

TID List
1, 3
2, 8
4, 5, 7
6, 9
1, 3, 5
2, 7
4, 8
6, 9
1, 3
2
4, 7

A
a3
a3
a2
a5

B
b1
b2
b3
b5

C
c4
c3
c4
c5

Attribute Value
c4
c5
d1
d2
d3
e1
e2
e3
f1
f2

D
d1
d3
d3
d1

E
e2
e3
e2
e1

F

f1
f2

TID List
5, 8
6, 9
1, 3, 5, 9
2, 4, 6
7, 8
1, 3, 9
2, 4, 5, 6, 8
7
8
9

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

22
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query
pQ= a1:*:*:*:e1
Attribute Value
a1
a2
a3
a5
b1
b2
b3
b5
c1
c2
c3

Wednesday, October 02, 2012

TID List
1, 3
2, 8
4, 5
6, 9
1, 3, 5
2, 7
4, 8
6, 9
1, 3
2
4, 7

Attribute Value
c4
c5
d1
d2
d3
e1
e2
e3
f1
f2

TID List
5, 8
6, 9
1, 3, 5, 9
2, 4, 6
7, 8
1, 3, 9
2, 4, 5, 6, 8
7
8
9

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

23
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Range and Inquire Query
rOp= (greater than + less than + between + some + different +

similar x (fv1 … fvn))
iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn))
qCube rearranges Q sub-queries in order to improve query
response times

a result of Q we have qR=(TID1, TID2 … TIDk), where TIDi is
the ith tuple identifier of relation R.
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

24
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

25
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example
“What is the women journal research papers variance impact,
using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 2540 years? Return results for all countries”
In Q, they are (sex = women, paperType=journal, year=2012).
The range queries (month = (1,3,5,7,11), age <>25-40) are also
sorted according to their cardinalities.
In Q, there is inquire query (country=distinct).

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

26
qCube: Efficient integration of range query operators over a high dimension data cube

qCube Query - example

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

27
qCube: Efficient integration of range query operators over a high dimension data cube

Experiments
•

We tested qCube Computation and Query algorithms against Frag-Cubing
algorithm used in [Li et al. 2004];

•

The qCube algorithms were coded in Java 64 bits;

•

Frag-Cubing is a free and open source C++
application(http://illimine.cs.uiuc.edu/);

•

The synthetic base relations were created using data generator provided by the
IlliMine project;

•

The IlliMine project is an open-source project to provide various approaches for
data mining and machine learning.

•

Frag-Cubing approach is part of IlliMine project.

•

We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each
core, 12MB cache and 128GB of RAM DDR3 1333MHz.

•

The system runs Windows Server 2008 64 bits, High Performance version.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

28
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of Point Queries and Skewed
Relations

Response time per query over
100 trials: T=107; C=5000;

D=30, S=0

Response time per query over 100
trials: T=107; C=5000; D=30,

S=2.5

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

29
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of Range Query Operators
and Skewed Relations

Response time queries with one infrequent point
operator: T=107; C=5000; D=30, S=2.5
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

30
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Performance Evaluation of of Inquire Operators
and Skewed Relations

Response time queries with inquire operators: T = 107; C = 5000; D = 30, S = 2.5.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

31
qCube: Efficient integration of range query operators over a high dimension data cube

Results - Runtime and Memory Consumption

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

32
qCube: Efficient integration of range query operators over a high dimension data cube

Conclusions
• qCube has linear runtime and memory consumption, similar to

Frag-Cubing;
• It implements Not Equal, Greater or Less than, Some, Between
and Similar range query operators and Distinct, Sub-cube and
Top-k Similar inquire query operators;
• When compared with Frag-Cubing, qCube is faster to answer
point and inquire queries with sub-cube operators.
• It introduces a different cube representation with less empty cells
than Frag-Cubing;
• Frag-Cubing cannot answer two sub-cube operators in a data
cube with 107 tuples, C=5000, D=30 and S=2.5.
Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

33
qCube: Efficient integration of range query operators over a high dimension data cube

Conclusions
Interesting research directions to further extend qCube:
First, we must experiment it with holistic measures. Update and computation
experiments with many holistic measures are a hard problem;
TIDs can become huge, thus memory consumption and intersection costs can
become impracticable, and therefore we must address an efficient solution to
partition TIDs with fast data retrieval.
Multicore and multicomputer versions of qCube must be implemented.

qCube must be improved to answer top-k queries combined with range, point
and inquire queries.
Experiments with high dimensional text cubes must be made to evaluate qCube ,
specially its text measures computing.

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

34
qCube: Efficient integration of range query operators over a high dimension data cube

Acknowlegements

Wednesday, October 02, 2012

28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva

35

More Related Content

What's hot (9)

Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
Sigmoid
 
IMDb Data Integration
IMDb Data IntegrationIMDb Data Integration
IMDb Data Integration
Giuseppe Andreetti
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
Citus Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
John Mulhall
 
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
Exploring Titan and Spark GraphX for Analyzing Time-Varying Electrical Networks
DataWorks Summit/Hadoop Summit
 
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on ...
Revolution Analytics
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
Sigmoid
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
Stavros Kontopoulos
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
Stavros Kontopoulos
 
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur CubukcuThe State of Postgres | Strata San Jose 2018 | Umur Cubukcu
The State of Postgres | Strata San Jose 2018 | Umur Cubukcu
Citus Data
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Stavros Kontopoulos
 
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
Hadoop User Group Ireland (HUG) Ireland - Eddie Baggot Presentation April 2016
John Mulhall
 

Similar to qCube: Efficient integration of range query operators over a high dimension data cube (20)

Comparison between cube techniques
Comparison between cube techniquesComparison between cube techniques
Comparison between cube techniques
ijsrd.com
 
A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...
IJECEIAES
 
A Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsA Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension Relations
Rodrigo Rocha Silva
 
Fast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environmentsFast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environments
Nexgen Technology
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Salah Amean
 
Datacube
DatacubeDatacube
Datacube
man2sandsce17
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementation
omayva
 
datacub
datacubdatacub
datacub
Mohsen Nourafkan
 
Hashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visualHashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visual
Nexgen Technology
 
Hashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visualHashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visual
Nexgen Technology
 
mod 2.pdf
mod 2.pdfmod 2.pdf
mod 2.pdf
ShivaprasadGouda3
 
05 cubetech
05 cubetech05 cubetech
05 cubetech
JoonyoungJayGwak
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 
Data Mining: Concepts and Techniques (3rd ed.) Chapter 5
Data Mining: Concepts and Techniques  (3rd ed.)  Chapter 5Data Mining: Concepts and Techniques  (3rd ed.)  Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) Chapter 5
FriendsofGADGETS
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
DataminingTools Inc
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
Datamining Tools
 
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
IJECEIAES
 
Lecture 8 is for best and you should read
Lecture 8 is for best and you should readLecture 8 is for best and you should read
Lecture 8 is for best and you should read
centralcollegepkr
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
Subrata Kumer Paul
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
Bharat Kalia
 
Comparison between cube techniques
Comparison between cube techniquesComparison between cube techniques
Comparison between cube techniques
ijsrd.com
 
A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...A time efficient and accurate retrieval of range aggregate queries using fuzz...
A time efficient and accurate retrieval of range aggregate queries using fuzz...
IJECEIAES
 
A Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension RelationsA Hybrid Memory Data Cube Approach for High Dimension Relations
A Hybrid Memory Data Cube Approach for High Dimension Relations
Rodrigo Rocha Silva
 
Fast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environmentsFast raq a fast approach to range aggregate queries in big data environments
Fast raq a fast approach to range aggregate queries in big data environments
Nexgen Technology
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Salah Amean
 
Data Warehouse Implementation
Data Warehouse ImplementationData Warehouse Implementation
Data Warehouse Implementation
omayva
 
Hashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visualHashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visual
Nexgen Technology
 
Hashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visualHashedcubes simple, low memory, real time visual
Hashedcubes simple, low memory, real time visual
Nexgen Technology
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Christophe Debruyne
 
Data Mining: Concepts and Techniques (3rd ed.) Chapter 5
Data Mining: Concepts and Techniques  (3rd ed.)  Chapter 5Data Mining: Concepts and Techniques  (3rd ed.)  Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) Chapter 5
FriendsofGADGETS
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
DataminingTools Inc
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
Datamining Tools
 
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
Evaluating Aggregate Functions of Iceberg Query Using Priority Based Bitmap I...
IJECEIAES
 
Lecture 8 is for best and you should read
Lecture 8 is for best and you should readLecture 8 is for best and you should read
Lecture 8 is for best and you should read
centralcollegepkr
 
Chapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.pptChapter 5. Data Cube Technology.ppt
Chapter 5. Data Cube Technology.ppt
Subrata Kumer Paul
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
Bharat Kalia
 
Ad

Recently uploaded (20)

Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure ModesCognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Dr. Tathagat Varma
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
UiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build PipelinesUiPath Community Zurich: Release Management and Build Pipelines
UiPath Community Zurich: Release Management and Build Pipelines
UiPathCommunity
 
6th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 20256th Power Grid Model Meetup - 21 May 2025
6th Power Grid Model Meetup - 21 May 2025
DanBrown980551
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
Jira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : IntroductionJira Administration Training – Day 1 : Introduction
Jira Administration Training – Day 1 : Introduction
Ravi Teja
 
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure ModesCognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Dr. Tathagat Varma
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
Measuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI SuccessMeasuring Microsoft 365 Copilot and Gen AI Success
Measuring Microsoft 365 Copilot and Gen AI Success
Nikki Chapple
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
STKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 versionSTKI Israel Market Study 2025 final v1 version
STKI Israel Market Study 2025 final v1 version
Dr. Jimmy Schwarzkopf
 
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Nix(OS) for Python Developers - PyCon 25 (Bologna, Italia)
Peter Bittner
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
Ad

qCube: Efficient integration of range query operators over a high dimension data cube

  • 1. qCube: Efficient integration of range query operators over a high dimension data cube Rodrigo Rocha Silva Doctorate Student Prof. Dr. Celso Massaki Hirata Advisor Prof. Dr. Joubert de Castro Lima Co-Advisor ITA – INSTITUTO TECNOLÓGICO DE AERONÁUTICA Electronic Engineering and Computer Science Division - EEC/I Department of Computer Science Brazil
  • 2. qCube: Efficient integration of range query operators over a high dimension data cube Goal Present a new cube approach, designed for high dimension range queries. Our cube approach, named Query Cube (qCube), implements Equal, Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Subcube and Top-k Similar inquire query operators Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 2
  • 3. qCube: Efficient integration of range query operators over a high dimension data cube Topics – – – – – – – Motivation Data Cube Related Work Query Cube (qCube) Experiments Results Conclusions Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 3
  • 4. qCube: Efficient integration of range query operators over a high dimension data cube Motivation Users need to view data in a tangible way, such as reports, cross tables and histograms Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 4
  • 5. qCube: Efficient integration of range query operators over a high dimension data cube Motivation • Suppose that at some decision-making process it is necessary the following information : “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries” “The average temperatures above 30 degrees Celsius on the weekends of leap years in the last 200 years.” Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 5
  • 6. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube, introduced by Gray et al., 1996, is a generalization of the group-by operator over all possible combinations of dimensions with various granularity aggregates. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 6
  • 7. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A data cube has exponential complexity with respect to the number of dimensions For an input with size d the output has size 2d Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 7
  • 8. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube • Hierarchies Year Discipline Day Department Year Wednesday, October 02, 2012 Hour 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 8
  • 9. qCube: Efficient integration of range query operators over a high dimension data cube Data Cube A C COUNT A B C COUNT * * * 11 * b2 c1 1 a1 * * 3 * b2 c2 1 a2 * * 5 * b3 c2 3 a3 Base Relation R – 11 tuples B * * 3 a1 b1 c1 1 A B C COUNT * b1 * 6 a3 b3 c2 1 a1 b1 c1 1 * b2 * 2 a2 b3 c2 1 a3 b3 c2 1 * b3 * 3 a3 b1 c1 1 a2 b3 c2 1 * * c1 4 a2 b1 c1 1 a3 b1 c1 1 * * c2 7 a2 b2 c2 1 a2 b1 c1 1 a1 b1 * 2 a1 b1 c2 1 a2 b2 c2 1 a1 b3 * 1 a2 b2 c1 1 a1 b1 c2 1 a2 b1 * 2 a3 b1 c2 1 a2 b2 c1 1 a2 b2 * 2 a1 b3 c2 1 a3 b1 c2 1 a2 b3 * 1 a2 b1 c2 1 a1 b3 c2 1 a3 b1 * 2 a2 b1 c2 1 a3 b3 * 1 a1 * c1 1 a1 * c2 2 a2 * c1 2 a2 * c2 3 a3 * c1 1 a3 * c2 2 * b1 c1 3 * b1 c2 3 Wednesday, October 02, 2012 FULL 3D CUBE + 38 tuples 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 9
  • 10. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach • Partitions the data vertically • Reduces high-dimensional cube into a set of lower dimensional cubes • Lossless reduction • Offers tradeoffs between the amount of pre-processing and the speed of online computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 10
  • 11. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Example • Let the cube aggregation function be count tid A B C D E 1 a1 b1 c1 d1 e1 2 a1 b2 c1 d2 e1 3 a1 b2 c1 d1 e2 4 a2 b1 c1 d1 e2 5 a2 b1 c1 d1 e3 • Divide the 5 dimensions into 2 shell fragments: – (A, B, C) and (D, E) From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 11
  • 12. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing 1-D Inverted Indices • Build traditional invert index or RID list Attribute Value TID List List Size a1 123 3 a2 45 2 b1 145 3 b2 23 2 c1 12345 5 d1 1345 4 d2 2 1 e1 12 2 e2 34 2 e3 5 1 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 12
  • 13. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach • Generalize the 1-D inverted indices to multi-dimensional ones in the data cube sense • Compute all cuboids for data cubes ABC and DE while retaining the inverted indices • For example, shell fragment cube ABC contains 7 cuboids: – A, B, C – AB, AC, BC – ABC • This completes the offline computation stage Cell Intersection TID List List Size a1 b1 1 2 3 ∩1 4 5 1 1 a1 b2 1 2 3 ∩2 3 23 2 a2 b1 4 5 ∩1 4 5 45 2 a2 b2 4 5 ∩2 3 ⊗ 0 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 13
  • 14. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Measure Table • If measures other than count are present, store in ID_measure table separate from the shell fragments tid count sum 1 5 70 2 3 10 3 8 20 4 5 40 5 2 30 From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 14
  • 15. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Query • Given the fragment cubes, process a query as follows 1. Divide the query into fragment, same as the shell 2. Fetch the corresponding TID list for each fragment from the fragment cube 3. Intersect the TID lists from each fragment to construct instantiated base table 4. Compute the data cube using the base table with any cubing algorithm From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 15
  • 16. qCube: Efficient integration of range query operators over a high dimension data cube Related Work – Frag-Cubing Approach A B C D E F G H I J K L M N … Base Table Online Computation From book Han and Kamber: Data Mining Concepts and Techniques Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 16
  • 17. qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements a set of tuple identifiers per dimension attribute, similar to Frag-Cubing; Therefore, qCube can answer point queries using tuple identifiers intersections and range queries using unions plus intersections algorithms, regardless measure function types. Frag-Cubing just implements point and some inquire queries. There is no Frag-Cubing solution for queries like “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 25-40 years? Return results for all countries” Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 17
  • 18. qCube: Efficient integration of range query operators over a high dimension data cube qCube Approach Implements the range query operators: • Equal; • Not Equal; • Greater or Less than; • Some; • Between and Similar. Also implements inquire query operators: • Distinct; • Sub-cube; • Top-k Similar. Over a high dimension data cube. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 18
  • 19. qCube: Efficient integration of range query operators over a high dimension data cube qCube Architecture Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 19
  • 20. qCube: Efficient integration of range query operators over a high dimension data cube qCube Computation TID 1 2 3 4 5 6 A a1 a2 a1 a3 a1 a5 Function tid 1 2 3 4 5 6 B b1 b2 b1 b3 b1 b5 C c1 c2 c1 c3 c4 c5 D d1 d2 d1 d2 d1 d2 Variance M1 2.56 3.14 2.45 6.7 9 1 Wednesday, October 02, 2012 E e1 e2 e1 e2 e2 e2 Count M2 1 1 1 1 1 1 Attribute Value TID List Attribute Value TID List a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 c4 c5 d1 d2 e1 e2 Average M3 10 20 10 11 3 1 1, 3, 5 2 4 6 1, 3, 5 2 4 6 1, 3 Skewness M4 1 0 1 1 1 1 2 4 5 6 1, 3, 5 2, 4, 6 1, 3 2, 4, 5, 6 Standard deviation M5 877686769698 7986676867.99 -7878789.8777 -99974333.23 100045.655 1 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 20
  • 21. qCube: Efficient integration of range query operators over a high dimension data cube qCube Update The same qCube Computation algorithm Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 21
  • 22. qCube: Efficient integration of range query operators over a high dimension data cube qCube Update TID 1 2 3 4 5 6 A a1 a2 a1 a3 a1 a5 B b1 b2 b1 b3 b1 b5 C c1 c2 c1 c3 c4 c5 D d1 d2 d1 d2 d1 d2 E e1 e2 e1 e2 e2 e2 Attribute Value a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 Wednesday, October 02, 2012 tid 5 7 8 9 TID List 1, 3 2, 8 4, 5, 7 6, 9 1, 3, 5 2, 7 4, 8 6, 9 1, 3 2 4, 7 A a3 a3 a2 a5 B b1 b2 b3 b5 C c4 c3 c4 c5 Attribute Value c4 c5 d1 d2 d3 e1 e2 e3 f1 f2 D d1 d3 d3 d1 E e2 e3 e2 e1 F f1 f2 TID List 5, 8 6, 9 1, 3, 5, 9 2, 4, 6 7, 8 1, 3, 9 2, 4, 5, 6, 8 7 8 9 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 22
  • 23. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query pQ= a1:*:*:*:e1 Attribute Value a1 a2 a3 a5 b1 b2 b3 b5 c1 c2 c3 Wednesday, October 02, 2012 TID List 1, 3 2, 8 4, 5 6, 9 1, 3, 5 2, 7 4, 8 6, 9 1, 3 2 4, 7 Attribute Value c4 c5 d1 d2 d3 e1 e2 e3 f1 f2 TID List 5, 8 6, 9 1, 3, 5, 9 2, 4, 6 7, 8 1, 3, 9 2, 4, 5, 6, 8 7 8 9 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 23
  • 24. qCube: Efficient integration of range query operators over a high dimension data cube qCube Range and Inquire Query rOp= (greater than + less than + between + some + different + similar x (fv1 … fvn)) iOp =(sub-cube + distinct + top-k similar x (fv1 … fvn)) qCube rearranges Q sub-queries in order to improve query response times a result of Q we have qR=(TID1, TID2 … TIDk), where TIDi is the ith tuple identifier of relation R. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 24
  • 25. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 25
  • 26. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example “What is the women journal research papers variance impact, using months {1, 3, 5, 7, 11}, year 2012 and ages varying from 2540 years? Return results for all countries” In Q, they are (sex = women, paperType=journal, year=2012). The range queries (month = (1,3,5,7,11), age <>25-40) are also sorted according to their cardinalities. In Q, there is inquire query (country=distinct). Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 26
  • 27. qCube: Efficient integration of range query operators over a high dimension data cube qCube Query - example Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 27
  • 28. qCube: Efficient integration of range query operators over a high dimension data cube Experiments • We tested qCube Computation and Query algorithms against Frag-Cubing algorithm used in [Li et al. 2004]; • The qCube algorithms were coded in Java 64 bits; • Frag-Cubing is a free and open source C++ application(http://illimine.cs.uiuc.edu/); • The synthetic base relations were created using data generator provided by the IlliMine project; • The IlliMine project is an open-source project to provide various approaches for data mining and machine learning. • Frag-Cubing approach is part of IlliMine project. • We ran the algorithms in two Intel Xeon six-core processors with 2.4GHz each core, 12MB cache and 128GB of RAM DDR3 1333MHz. • The system runs Windows Server 2008 64 bits, High Performance version. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 28
  • 29. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of Point Queries and Skewed Relations Response time per query over 100 trials: T=107; C=5000; D=30, S=0 Response time per query over 100 trials: T=107; C=5000; D=30, S=2.5 Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 29
  • 30. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of Range Query Operators and Skewed Relations Response time queries with one infrequent point operator: T=107; C=5000; D=30, S=2.5 Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 30
  • 31. qCube: Efficient integration of range query operators over a high dimension data cube Results - Performance Evaluation of of Inquire Operators and Skewed Relations Response time queries with inquire operators: T = 107; C = 5000; D = 30, S = 2.5. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 31
  • 32. qCube: Efficient integration of range query operators over a high dimension data cube Results - Runtime and Memory Consumption Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 32
  • 33. qCube: Efficient integration of range query operators over a high dimension data cube Conclusions • qCube has linear runtime and memory consumption, similar to Frag-Cubing; • It implements Not Equal, Greater or Less than, Some, Between and Similar range query operators and Distinct, Sub-cube and Top-k Similar inquire query operators; • When compared with Frag-Cubing, qCube is faster to answer point and inquire queries with sub-cube operators. • It introduces a different cube representation with less empty cells than Frag-Cubing; • Frag-Cubing cannot answer two sub-cube operators in a data cube with 107 tuples, C=5000, D=30 and S=2.5. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 33
  • 34. qCube: Efficient integration of range query operators over a high dimension data cube Conclusions Interesting research directions to further extend qCube: First, we must experiment it with holistic measures. Update and computation experiments with many holistic measures are a hard problem; TIDs can become huge, thus memory consumption and intersection costs can become impracticable, and therefore we must address an efficient solution to partition TIDs with fast data retrieval. Multicore and multicomputer versions of qCube must be implemented. qCube must be improved to answer top-k queries combined with range, point and inquire queries. Experiments with high dimensional text cubes must be made to evaluate qCube , specially its text measures computing. Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 34
  • 35. qCube: Efficient integration of range query operators over a high dimension data cube Acknowlegements Wednesday, October 02, 2012 28º Simpósio Brasileiro de Banco de Dados - Rodrigo Rocha Silva 35