COURSEFILE
COURSEFILE
Department: CSE
Mission
M1:To produce the best quality computer science professionals through intellectual inputs.
M2:To impart quality training, hands on experience and value education to solve the real world
problems.
M3: To provide an environment that values and encourages knowledge acquisition.
COURSE OBJECTIVES
To understand data warehouse concepts, architecture, business analysis and tools
To understand data pre-processing and data visualization techniques
To study algorithms for finding hidden and interesting patterns in data
To understand and apply various classification and clustering techniques using tools
COURSE OUTCOMES
At the end of the course, the students will be able to:
Design a Data warehouse system and perform business analysis with OLAP tools
Apply suitable pre-processing and visualization techniques for data analysis
Apply frequent pattern and association rule mining techniques for data analysis
Apply appropriate classification techniques for data analysis
Apply appropriate clustering techniques for data analysis
No. of
Topic
Unit No Date Name of the Concept Classes
No
Required
UNIT-I
01/10/2021 Data Warehousing, Business Analysis and On-Line Analytical Processing
1 1
(OLAP): Basic Concepts
2 02/10/2021 Data Warehousing Components 2
3 05/10/2021 Building a Data Warehouse 1
Unit - 1 4 06/10/2021 Database Architectures for Parallel Processing, Parallel DBMS Vendors 3
09/10/2021 Multidimensional Data Model, Data Warehouse Schemas for Decision
5 3
Support
6 14/10/2021 Concept Hierarchies, Characteristics of OLAP Systems 2
7 18/10/2021 Typical OLAP Operations, OLAP and OLTP 2
Total number of hours 14
UNIT-II
1 21/10/2021 Introduction: Introduction to Data Mining Systems 1
2 22/10/2021 Knowledge Discovery Process 1
3 23/10/2021 Data Mining Techniques, Issues, applications 3
4 27/10/2021 Data Objects and Attribute Types 2
Unit - 2 5 29/10/2021 Basic Statistical Descriptions of Data 1
6 30/10/2021 Data Visualization, Measuring Data Similarity and Dissimilarity 3
7 03/11/2021 Data Pre-processing: Data Cleaning 2
8 06/11/2021 Data Integration, Data Reduction 2
9 09/11/2021 Data Transformation and Data Discretization 2
Total number of hours 17
UNIT–III
1 10/11/2021 Frequent Pattern Analysis: Mining Frequent Patterns 1
2 11/11/2021 Associations and Correlations 1
3 12/11/2021 Mining Methods 3
Unit - 3 4 17/11/2021 Classification using Frequent Patterns 2
5 29/11/2021 Pattern Evaluation Method 1
6 01/12/2021 Pattern Mining in Multilevel, Multidimensional space 2
7 04/12/2021 Constraint Based Frequent Pattern Mining 2
Total number of hours 12
UNIT–IV
1 07/12/2021 Classification: Decision Tree Induction 2
2 09/12/2021 Bayes’ Theorem 1
3 10/12/2021 Naïve Bayesian Classification 1
4 13/12/2021 Bayesian Belief Networks 2
5 15/12/2021 Rule Based Classification 1
Unit – 4
6 16/12/2021 Classification by Back Propagation 1
7 17/12/2021 Support Vector Machines 1
8 18/12/2021 Lazy Learners 1
9 20/12/2021 Model Evaluation and Selection 2
10 22/12/2021 Techniques to improve Classification Accuracy 2
Total number of hours 14
UNIT–V
1 27/12/2021 Clustering: Clustering Techniques, Cluster analysis 2
2 29/12/2021 Partitioning Methods 2
3 31/12/2021 Hierarchical methods 1
4 03/01/2022 Density Based Methods 2
Unit – 5
5 05/01/2022 Grid Based Methods 1
6 06/01/2022 Evaluation of clustering 1
7 07/01/2022 Clustering high dimensional data, Clustering with constraints 2
8 11/01/2022 Outlier analysis, outlier detection methods 2
Total number of hours 13
1 2 3 4 5 6 7
DAY / 10:55 am 11:50am - 12:45 pm –
09:00 am – 10:00 am – 01:40 pm – 02:30 pm- 03:20 pm-
PERIOD – 12:45 pm 01:40 pm
10:00 am 10:55 am 02:30 pm 03:20 pm 04:10 pm
11:50 am
MON CN CD STM AI REASONING DWDM CD
-------- CN LAB---------------
THU CD CN AI DWDM
(LAB4)
REASON ---------AITT LAB-----------
FRI ----- CODING---------- STM
ING (LAB3)
----DM LAB--------
SAT ES REASONING DWDM CN
(LAB3)
REASONING Mr.PRASANNA
1 2 3 4 5 6 7
DAY / 11:50am - 12:45 pm
09:00 am – 10:00 am – 10:55 am – 01:40 pm – 02:30 pm- 03:20 pm-
PERIOD 12:45 pm –
10:00 am 10:55 am 11:50 am 02:30 pm 03:20 pm 04:10 pm
01:40 pm
-------------- DM LAB----------------
MON (LAB3) CN CD AI CN
------------- CN LAB------------------
TUE L DWDM REASONING STM DWDM
(LAB4)
----------------AITT LAB------------ U
WED N CN STM CD ES
(LAB3)
C
THU DWDM STM CD H ------- CODING---------- STM REASONING
REASONING Mr.PRASANNA
COURSE OUTCOME PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
DATA WAREHOUSING
AND DATA MINING
CO1 Able to understand data
warehouse concepts,
architecture, business analysis
2 2
and tools
CO2 Able to understand data pre- 2 2
processing and data
visualization techniques
CO3 Able to study algorithms for 3 2
finding hidden and interesting
patterns in data
CO4 Able to understand and apply 3 2
various classification
techniques using tools
CO5 Able to understand and apply 2 2
various clustering techniques
using tools
2.40 2.00
COURSE OUTCOME PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
DATA MINING LAB
CO1 Able to understand the
mathematical basics quickly
and covers each and every
3 3 3 2 2 1 1
condition of data mining in
order to prepare for real-
world problems
CO2 The various classes of 2
algorithms will be covered to
give a foundation to further
3 3 3 1 1
apply knowledge to dive
deeper into the different
flavors of algorithms
CO3 Students should aware of 2
packages and libraries of R
and also familiar with
2 3 3 1 1
functions used in R for
visualization
CO4 To enable students to use R to 2
conduct analytics on large
real life datasets
3 3 3 2 1 1
CO5 To familiarize students with 2
how various statistics like
mean median etc and data can
3 3 3 2 1 1
be collected for data
exploration in R
2.80 3.00 3.00 2.00 2.00 1.00 1.00
SUB CODE:R1931051 MID 1 MID 2
DE DE
ON A1 AV MID1 ON A1 AV MID2 Ma Mi
SN S A2 A3 S A2
Reg No L 5 G TOTA L 5 G TOTA x n Internal
o 10 5M 5M 10 5M
10M M 5M L 10M M 5M L 0.8 0.2
M M
1 19HP1A0501 4 4 5 5 5 5 13 3 4 5 5 5 12 10 2 12
2 19HP1A0502 8 3 5 5 5 5 16 7 4 5 5 5 16 13 3 16
3 19HP1A0503 9 AB 5 5 5 5 14 8 3 5 5 5 16 13 3 16
4 19HP1A0504 8 5 5 5 5 5 18 9 4 5 5 5 18 14 4 18
5 19HP1A0505 7 5 5 5 5 5 17 8 AB 5 5 5 13 14 3 17
6 19HP1A0506 9 7 5 5 5 5 21 7 5 5 5 5 17 17 3 20
7 19HP1A0507 7 2 5 5 5 5 14 9 2 5 5 5 16 13 3 16
8 19HP1A0508 10 5 5 5 5 5 20 8 5 5 5 5 18 16 4 20
9 19HP1A0509 10 6 5 5 5 5 21 8 5 5 5 5 18 17 4 21
10 19HP1A0510 8 6 5 5 5 5 19 8 5 5 5 5 18 15 4 19
11 19HP1A0511 6 5 5 5 5 5 16 2 2 5 5 5 9 13 2 15
12 19HP1A0512 7 5 5 5 5 5 17 7 5 5 5 5 17 14 3 17
13 19HP1A0513 0 0 5 5 5 5 5 9 3 5 5 5 17 14 1 15
14 19HP1A0514 10 7 5 5 5 5 22 10 2 5 5 5 17 18 3 21
15 19HP1A0515 AB AB 0 0 0 0 0 AB AB 0 0 0 0 0 0 0
16 19HP1A0516 9 6 5 5 5 5 20 6 5 5 5 5 16 16 3 19
17 19HP1A0517 10 6 5 5 5 5 21 9 5 5 5 5 19 17 4 21
18 19HP1A0518 8 4 5 5 5 5 17 8 4 5 5 5 17 14 3 17
19 19HP1A0519 8 3 5 5 5 5 16 7 5 5 5 5 17 14 3 17
20 19HP1A0520 10 7 5 5 5 5 22 7 5 5 5 5 17 18 3 21
21 19HP1A0521 9 6 5 5 5 5 20 0 3 5 5 5 8 16 2 18
22 19HP1A0522 8 6 5 5 5 5 19 8 7 5 5 5 20 16 4 20
23 19HP1A0523 9 5 5 5 5 5 19 7 4 5 5 5 16 15 3 18
24 19HP1A0524 AB AB 5 5 5 5 5 3 5 5 5 5 13 10 1 11
25 19HP1A0525 8 4 5 5 5 5 17 8 4 5 5 5 17 14 3 17
26 19HP1A0526 10 5 5 5 5 5 20 10 3 5 5 5 18 16 4 20
27 19HP1A0527 10 6 5 5 5 5 21 10 4 5 5 5 19 17 4 21
28 19HP1A0528 9 6 5 5 5 5 20 8 3 5 5 5 16 16 3 19
29 19HP1A0529 9 5 5 5 5 5 19 10 6 5 5 5 21 17 4 21
30 19HP1A0530 10 5 5 5 5 5 20 9 AB 5 5 5 14 16 3 19
31 19HP1A0531 7 4 5 5 5 5 16 7 AB 5 5 5 12 13 2 15
32 19HP1A0532 AB AB 5 5 5 5 5 9 2 5 5 5 16 13 1 14
33 19HP1A0533 9 3 5 5 5 5 17 8 5 5 5 5 18 14 3 17
34 19HP1A0534 8 6 5 5 5 5 19 6 4 5 5 5 15 15 3 18
35 19HP1A0535 7 4 5 5 5 5 16 3 3 5 5 5 11 13 2 15
36 19HP1A0536 9 5 5 5 5 5 19 9 6 5 5 5 20 16 4 20
37 19HP1A0537 10 5 5 5 5 5 20 10 6 5 5 5 21 17 4 21
38 19HP1A0539 5 5 5 5 5 5 15 6 8 5 5 5 19 15 3 18
39 19HP1A0540 5 6 5 5 5 5 16 8 7 5 5 5 20 16 3 19
40 19HP1A0541 6 5 5 5 5 5 16 5 6 5 5 5 16 13 3 16
41 19HP1A0542 7 5 5 5 5 5 17 7 5 5 5 5 17 14 3 17
42 19HP1A0543 4 6 5 5 5 5 15 5 4 5 5 5 14 12 3 15
43 19HP1A0544 3 3 5 5 5 5 11 3 2 5 5 5 10 9 2 11
44 19HP1A0545 3 5 5 5 5 5 13 3 6 5 5 5 14 11 3 14
45 19HP1A0546 5 6 5 5 5 5 16 3 4 5 5 5 12 13 2 15
46 19HP1A0547 6 4 5 5 5 5 15 7 6 5 5 5 18 14 3 17
47 19HP1A0549 7 6 5 5 5 5 18 AB AB 5 5 5 5 14 1 15
48 19HP1A0550 8 7 5 5 5 5 20 7 6 5 5 5 18 16 4 20
49 19HP1A0551 8 4 5 5 5 5 17 7 5 5 5 5 17 14 3 17
50 19HP1A0552 5 3 5 5 5 5 13 4 2 5 5 5 11 10 2 12
51 19HP1A0553 8 6 5 5 5 5 19 8 5 5 5 5 18 15 4 19
52 19HP1A0554 7 2 5 5 5 5 14 3 4 5 5 5 12 11 2 13
53 19HP1A0555 10 4 5 5 5 5 19 7 AB 5 5 5 12 15 2 17
54 19HP1A0556 7 5 5 5 5 5 17 8 5 5 5 5 18 14 3 17
55 19HP1A0557 6 6 5 5 5 5 17 6 4 5 5 5 15 14 3 17
56 19HP1A0558 9 7 5 5 5 5 21 8 5 5 5 5 18 17 4 21
57 19HP1A0559 9 7 5 5 5 5 21 7 6 5 5 5 18 17 4 21
58 19HP1A0560 7 5 5 5 5 5 17 8 4 5 5 5 17 14 3 17
59 19HP1A0561 10 6 5 5 5 5 21 10 6 5 5 5 21 17 4 21
60 19HP1A0563 7 3 5 5 5 5 15 5 4 5 5 5 14 12 3 15
61 19HP1A0564 9 4 5 5 5 5 18 7 7 5 5 5 19 15 4 19
62 19HP1A0565 7 4 5 5 5 5 16 7 5 5 5 5 17 14 3 17
63 20HP5A0501 7 6 5 5 5 5 18 7 3 5 5 5 15 14 3 17
64 20HP5A0502 6 4 5 5 5 5 15 9 5 5 5 5 19 15 3 18
65 20HP5A0503 10 6 5 5 5 5 21 9 7 5 5 5 21 17 4 21
66 20HP5A0504 8 4 5 5 5 5 17 9 5 5 5 5 19 15 3 18
67 20HP5A0505 7 3 5 5 5 5 15 7 3 5 5 5 15 12 3 15
68 20HP5A0506 8 6 5 5 5 5 19 7 3 5 5 5 15 15 3 18
Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
Compare and contrast OLAP and OLTP (or)
1a Define the terms OLTP and OLAP? 5 CO1 PO2 - L3 5/30 16.66%
What is data cube? Give an example for 2-D
view of and 3-D data cube representation of PO1,PO
b the data. 5 CO1 2 - L3 5/30 16.66%
Given the following measurement for the
variable age: 18, 22, 25, 42, 28, 43, 33, 35,
56, 28 Standardize the variables by the
following:
(i) Compute the mean absolute deviation
for age.
(ii) Compute the Z-score for the first
four measurements. PO1,PO
2a 5 CO2 2 - L3 5/30 16.66%
What is noisy data? Explain the binning
b methods for data smoothening. 5 CO2 PO1 - L2 5/30 16.66%
Write the algorithm to discover frequent
item sets without candidate generation and
3 explain it with an example. 10 CO3 PO2 - L2 10/30 33.33%
SCHEME OF EVALUATION
DATE: 22-11-2021 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30
ANSWERS
DATE: 22-11-2021 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30
b) What is data cube? Give an example for 2-D view of and 3-D data cube representation of the data.
“What is a data cube?” : A data cube allows data to be modeled and viewed in multiple dimensions.
It is defined by dimensions and facts.
In general terms, dimensions are the perspectives or entities with respect to which an organization
wants to keep records. For example, AllElectronics may create a sales data warehouse in order to
keep records of the store’s sales with respect to the dimensions time, item, branch, and location.
These dimensions allow the store to keep track of things like monthly sales of items and the branches
and locations at which the items were sold. Each dimension may have a table associated with it,
called a dimension table, which further describes the dimension.
For example, a dimension table for item may contain the attributes item name, brand, and type.
Dimension tables can be specified by users or experts, or automatically generated and adjusted based
on data distributions.
A multidimensional data model is typically organized around a central theme, such as sales. This
theme is represented by a fact table.
Facts are numeric measures. Think of them as the quantities by which we want to analyze
relationships between dimensions. Examples of facts for a sales data warehouse include dollars sold
(sales amount in dollars), units sold (number of units sold), and amount budgeted. The fact table
contains the names of the facts, or measures, as well as keys to each of the related dimension tables.
Although we usually think of cubes as 3-D geometric structures, in data warehousing the data cube is
n-dimensional.
To gain a better understanding of data cubes and the multidimensional data model, let’s start by
looking at a simple 2-D data cube that is, in fact, a table or spreadsheet for sales data from
AllElectronics. In particular, we will look at the AllElectronics sales data for items sold per quarter
in the city of Vancouver. These data are shown in Table.
In this 2-D representation, the sales for Vancouver are shown with respect to the time dimension
(organized in quarters) and the item dimension (organized according to the types of items sold). The
fact or measure displayed is dollars sold (in thousands).
Now, suppose that we would like to view the sales data with a third dimension. For instance, suppose
we would like to view the data according to time and item, as well as location, for the cities Chicago,
New York, Toronto, and Vancouver. These 3-D data are shown in Table.
The 3-D data in the table are represented as a series of 2-D tables. Conceptually, we may also
represent the same data in the form of a 3-D data cube, as in Figure.
Suppose that we would now like to view our sales data with an additional fourth dimension such as
supplier. Viewing things in 4-D becomes tricky. However, we can think of a 4-D cube as being a
series of 3-D cubes, as shown in Figure
If we continue in this way, we may display any n-dimensional data as a series of (n − 1)-dimensional
“cubes.” The data cube is a metaphor for multidimensional data storage. The actual physical storage
of such data may differ from its logical representation. The important thing to remember is that data
cubes are n-dimensional and do not confine data to 3-D.
2 a) Given the following measurement for the variable age: 18, 22, 25, 42, 28, 43, 33, 35, 56, 28
Standardize the variables by the following:
(i) Compute the mean absolute deviation for age.
(ii) Compute the Z-score for the first four measurements.
b) What is noisy data? Explain the binning methods for data smoothening.
“What is noise?” Noise is a random error or variance in a measured variable. Given a numeric
attribute such as price, how can we “smooth” out the data to remove the noise? Let’s look at the
following data smoothing techniques.
Binning: Binning methods smooth a sorted data value by consulting its “neighborhood,” that is, the
values around it. The sorted values are distributed into a number of “buckets,” or bins. Because
binning methods consult the neighborhood of values, they perform local smoothing.
In this example, the data for price are first sorted and then partitioned into equal-frequency bins of
size 3 (i.e., each bin contains three values).
In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. For
example, the mean of the values 4, 8, and 15 in Bin 1 is 9.
Therefore, each original value in this bin is replaced by the value 9. Similarly, smoothing by bin
medians can be employed, in which each bin value is replaced by the bin median.
In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as
the bin boundaries. Each bin value is then replaced by the closest boundary value.
In general, the larger the width, the greater the effect of the smoothing. Alternatively, bins may be
equal width, where the interval range of values in each bin is constant. Binning is also used as a
discretization technique .
Regression: Data smoothing can also be done by regression, a technique that conforms data values
to a function. Linear regression involves finding the “best” line to fit two attributes (or variables)
so that one attribute can be used to predict the other. Multiple linear regression is an extension of
linear regression, where more than two attributes are involved and the data are fit to a
multidimensional surface.
Outlier analysis: Outliers may be detected by clustering, for example, where similar values are
organized into groups, or “clusters.” Intuitively, values that fall outside of the set of clusters may be
considered outliers.
Many data smoothing methods are also used for data discretization and data reduction . For
example, the binning techniques described before reduce the number of distinct values per attribute.
This acts as a form of data reduction for logic-based data mining methods, such as decision tree
induction, which repeatedly makes value comparisons on sorted data.
Concept hierarchies are a form of data discretization that can also be used for data smoothing. A
concept hierarchy for price, for example, may map real price values into inexpensive, moderately
priced, and expensive, thereby reducing the number of data values to be handled by the mining
process. Some methods of classification (e.g., neural networks) have built-in data smoothing
mechanisms.
3 Write the algorithm to discover frequent item sets without candidate generation and explain it with
an example.
FP-growth algorithm that takes a radically different approach to discovering frequent itemsets. The
algorithm does not subscribe to the generate-and-test paradigm of Apriori. Instead, it encodes the
data set using a compact data structure called an FP-tree and extracts frequent itemsets directly
from this structure.
1 FP-Tree Representation
An FP-tree is a compressed representation of the input data. It is constructed by reading the data set
one transaction at a time and mapping each transaction onto a path in the FP-tree.
As different transactions can have several items in common, their paths may overlap. The more the
paths overlap with one another, the more compression we can achieve using the FP-tree structure.
If the size of the FP-tree is small enough to fit into main memory, this will allow us to extract
frequent itemsets directly from the structure in memory instead of making repeated passes over the
data stored on disk.
The below fig shows a data set that contains ten transactions and five items. The structures of the FP-
tree after reading the first three transactions are also depicted in the diagram. Each node in the tree
contains the label of an item along with a counter that shows the number of transactions mapped onto
the given path. Initially, the FP-tree contains only the root node represented by the null symbol.
Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
What is correlation analysis? How
correlation analysis will be used to generate
1 interested strongest association rule 10 CO3 PO2 - L3 10/30 33.33%
Explain in detail about Characteristics of PO1,PO
2a Naive Bayes Classifiers 5 CO4 2 - L2 5/30 16.66%
Explain in detail about rule induction using
b a Sequential Covering Algorithm 5 CO4 PO2 - L2 5/30 16.66%
How can we conduct cluster analysis on
3a high-dimensional data? Explain in detail. 5 CO5 PO2 - L2 5/30 16.66%
Explain Clustering Based on Density PO1,PO
b Distribution Functions 5 CO5 2 L2 5/30 16.66%
SCHEME OF EVALUATION
DATE: 18-01-2022 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30
In this way, the rules learned should be of high accuracy. The rules need not necessarily be of high
coverage. This is because we can have more than one rule for a class, so that different rules may
cover different tuples within the same class.
The process continues until the terminating condition is met, such as when there are no more training
tuples or the quality of a rule returned is below a user-specified threshold.
The Learn_One_Rule procedure finds the “best” rule for the current class, given the current set of
training tuples.
“How are rules learned?” Typically, rules are grown in a general-to-specific manner (below Fig). We
can think of this as a beam search, where we start off with an empty rule and then gradually keep
appending attribute tests to it.
We append by adding the attribute test as a logical conjunct to the existing condition of the rule
antecedent.
• Clustering high-dimensional data is the search for clusters and the space in which they exist.
• Thus, there are two major kinds of methods: Subspace clustering approaches search for clusters
existing in subspaces of the given high-dimensional data space, where a subspace is defined using a
subset of attributes in the full space.
• Dimensionality reduction approaches try to construct a much lower-dimensional space and search for
clusters in such a space. Often, a method may construct new dimensions by combining some
dimensions from the original data.
b) Explain Clustering Based on Density Distribution Functions
• Density-Based Clustering refers to unsupervised learning methods that identify distinctive
groups/clusters in the data, based on the idea that a cluster in a data space is a contiguous region of
high point density, separated from other such clusters by contiguous regions of low point density
• Partitioning and hierarchical methods are designed to find spherical-shaped clusters. They have
difficulty finding clusters of arbitrary shape such as the “S” shape and oval clusters.
• In Density - Based Method we use three types of algorithms they are:
• Density-based spatial clustering of applications with noise (DBSCAN)
• Ordering Points to Identify the Clustering Structure (OPTICS)
• Clustering-Based on Density Distribution Functions (DENCLUE)
• Density-based clustering by Hinnebirg and Kiem. It enables a compact mathematical description of
arbitrarily shaped clusters in high dimension state of data, and it is good for data sets with a huge
amount of noise.
• Density is a measurement that compares the amount of matter an object has to its volume. An object
with much matter in a certain volume has high density, An object with little matter in the small
amount of volume has a low density.
• Density estimation is a core issue in density-based clustering methods. DENCLUE (DENsity-based
CLUstEring) is a clustering method based on a set of density distribution functions.
• Formally, let x1, ..…, xn be an independent and identically distributed sample of a random variable
f . The kernel density approximation of the probability density function is
• A frequently used kernel is a standard Gaussian function with a mean of 0 and a variance of 1:
• DENCLUE uses a Gaussian kernel to estimate density based on the given set of objects to be
clustered.
• . A point x ∗ is called a density attractor. if it is a local maximum of the estimated density function.
• To avoid trivial local maximum points, DENCLUE uses a noise threshold, ξ , and only considers
those density attractors x ∗ such that ˆ f (x ∗ ) ≥ ξ.
III B.TECH SEM-I, WEEKLY TEST–I EXAMINATION
DATE: 06-10-2021 YEAR/BRANCH: III CSE
TIME: 9:00 AM TO 10:00AM MAX MARKS: 20
Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
Compare and contrast OLAP and OLTP (or)
1a Define the terms OLTP and OLAP? 5 CO1 PO2 - L3 5/20 25%
What is data cube? Give an example for 2-D
view of and 3-D data cube representation of PO1,PO
b the data. 5 CO1 2 - L3 5/20 25%
What is data ware housing? Do various
operations on multi dimensional data
2a models. 5 CO1 PO2 - L2 5/20 25%
What is the main advantage of using
Multidimensional OLAP (MOLAP) PO1,PO
b Servers? 5 CO1 2 L2 5/20 25%
III B.TECH SEM-I, WEEKLY TEST–II EXAMINATION
DATE: 08-12-2021 YEAR/BRANCH: III CSE
TIME: 9:00 AM TO 10:00AM MAX MARKS: 20
Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
Explain in detail about mining multilevel
1 and multidimensional association rule 10 CO3 PO2 - L3 10/20 50%
What is correlation analysis? How
correlation analysis will be used to generate
2 interested strongest association rule 10 CO3 PO2 - L3 10/20 50%
Students List
MID-I MID-II
SNO Roll No
Q1 Q2 Q3 Q4 Q5 Q6
1 19HP1A0501 7 2 2 4 3 0 6
2 19HP1A0502 3 10 9 10 8.5 0 14
3 19HP1A0503 9 10 7 8 7 8 16
4 19HP1A0504 10 9 3 10 9.5 6.5 16
5 19HP1A0505 9 10 2 6 10 5.5 14
6 19HP1A0506 9 9 8 10 2.5 6 15
7 19HP1A0507 8 9 3 7 10 8.5 15
8 19HP1A0508 10 10 9 8 4.5 9 17
9 19HP1A0509 10 10 10 6 9.5 6 17
10 19HP1A0510 10 9 4 5 10 7.5 15
11 19HP1A0511 10 6 0 0 0 5 7
12 19HP1A0512 6 6 8 6.5 3.5 8.5 13
13 19HP1A0513 AB AB AB 7 10 9 9
14 19HP1A0514 10 10 10 10 10 8.5 20
15 19HP1A0515 AB AB AB AB AB AB 0
16 19HP1A0516 10 6 9 10 0 5.5 14
17 19HP1A0517 10 10 9 10 4.5 10 18
18 19HP1A0518 10 9 3 10 7 5 15
19 19HP1A0519 10 10 2 5 7 7.5 14
20 19HP1A0520 10 10 8 10 10 0 16
21 19HP1A0521 9 10 8 0 0 0 9
22 19HP1A0522 8 9 7 10 8.5 3 15
23 19HP1A0523 10 10 7 5 7.5 7 16
24 19HP1A0524 AB AB AB 0 0 7 2
25 19HP1A0525 9 6 7 6 6 10 15
26 19HP1A0526 10 10 8 10 10 7.5 19
27 19HP1A0527 10 10 10 10 8 10 19
28 19HP1A0528 9 10 8 7 5 9.5 16
29 19HP1A0529 9 10 8 10 10 9 19
30 19HP1A0530 10 10 9 9 10 8 19
31 19HP1A0531 10 10 0 10 5 4 13
32 19HP1A0532 AB AB AB 10 10 6 9
33 19HP1A0533 6 10 9 10 8.5 3 16
34 19HP1A0534 10 10 3 5 3.5 8 13
35 19HP1A0535 9 10 0 8 0 0 9
36 19HP1A0536 10 10 5 10 7 9 17
37 19HP1A0537 10 10 9 10 10 7.5 19
38 19HP1A0539 8 6 0 3 4 8.5 10
39 19HP1A0540 6 7 1 9 2.5 10 12
40 19HP1A0541 1 7 8 8.5 0 4 10
41 19HP1A0542 10 9 0 6 7 6.5 13
42 19HP1A0543 6 4 0 5 2 7 8
43 19HP1A0544 2 5 0 0 0 6.5 5
44 19HP1A0545 3 5 0 4 0 3.5 5
45 19HP1A0546 8 6 0 4 0 2.5 7
46 19HP1A0547 10 6 0 8.5 5 7 12
47 19HP1A0549 6 9 4 AB AB AB 6
48 19HP1A0550 8 7 8 8 3 8 14
49 19HP1A0551 6 10 8 5 5 9 14
50 19HP1A0552 4 9 0 3.5 2 4 8
51 19HP1A0553 9 6 7 8 7.5 6.5 15
52 19HP1A0554 5 9 7 4 2 1 9
53 19HP1A0555 10 9 9 0 10 9 16
54 19HP1A0556 8 5 7 8 5 8.5 14
55 19HP1A0557 10 6 0 8 1 7 11
56 19HP1A0558 8 10 9 8 4.5 9 16
57 19HP1A0559 10 9 8 10 0 8.5 15
58 19HP1A0560 8 8 3 8 3.5 10 14
59 19HP1A0561 10 9 9 10 9 9 19
60 19HP1A0563 8 5 7 6 0 6.5 11
61 19HP1A0564 9 10 8 6 5.5 7 15
62 19HP1A0565 9 9 1 10 0 9 13
63 20HP5A0501 10 9 0 8.5 0 10 13
64 20HP5A0502 10 7 0 7 9 10 14
65 20HP5A0503 10 10 8 7 9 8.5 18
66 20HP5A0504 10 7 5 9 9.5 6 16
67 20HP5A0505 10 9 2 8 7 4 13
68 20HP5A0506 9 9 4 7.5 10 2 14
MID-I MID-II
SNO Roll No
Q1 Q2 Q3 Q4 Q5 Q6
1 19HP1A0566 5 9 8 0 2.5 5 10
2 19HP1A0567 10 10 9 10 10 9 19
3 19HP1A0568 9 10 9 8 9 5 17
4 19HP1A0569 9 10 7 10 5 7 16
5 19HP1A0570 9 10 0 10 5 1 12
6 19HP1A0571 8 10 1 6.5 4.5 7.5 13
7 19HP1A0572 2 5 0 2 1.5 3 5
8 19HP1A0573 8 9 8 9.5 5 10 17
9 19HP1A0574 10 10 9 10 9 8.5 19
10 19HP1A0575 9 10 0 0 4 0 8
11 19HP1A0577 10 9 8 10 9 10 19
12 19HP1A0578 9 10 7 10 10 9 18
13 19HP1A0579 3 10 9 9 0 10 14
14 19HP1A0580 9 5 3 9 7 9 14
15 19HP1A0581 9 3 1 4 8 6.5 11
16 19HP1A0582 10 3 0 10 8.5 3 12
17 19HP1A0583 10 10 9 10 5 10 18
18 19HP1A0584 10 10 9 10 9 9.5 19
19 19HP1A0585 9 10 9 8 4 3.5 15
20 19HP1A0586 10 8 5 10 4.5 10 16
21 19HP1A0587 4 5 2 4 2 6.5 8
22 19HP1A0588 10 10 9 10 5 9.5 18
23 19HP1A0589 10 8 2 10 10 0 13
24 19HP1A0590 10 10 9 10 9 9 19
25 19HP1A0591 9 8 7 5 0 7.5 12
26 19HP1A0592 10 5 9 10 1 4.5 13
27 19HP1A0593 10 10 10 10 10 9 20
28 19HP1A0594 10 10 9 8 7 7 17
29 19HP1A0596 10 10 9 10 10 9 19
30 19HP1A0597 9 7 4 8.5 9 7 15
31 19HP1A0598 8 7 0 1 1 0 6
32 19HP1A0599 9 10 0 7 5 10 14
33 19HP1A05A0 10 10 8 7 7 9 17
34 19HP1A05A1 6 6 5 10 6.5 5 13
35 19HP1A05A2 8 1 8 7 2.5 0 9
36 19HP1A05A3 10 10 10 10 9 9.5 20
37 19HP1A05A4 10 5 7 10 0 0 11
38 19HP1A05A5 2 3 0 4 3 0 4
39 19HP1A05A6 10 10 9 10 8 10 19
40 19HP1A05A7 10 9 0 10 10 0 13
41 19HP1A05A8 10 0 0 5 1 7 8
42 19HP1A05A9 10 10 6 9 10 7 17
43 19HP1A05B0 7 2 1 7 0 4 7
44 19HP1A05B1 10 10 10 9 6.5 10 19
45 19HP1A05B2 10 10 9 8 9 8 18
46 19HP1A05B3 8 8 0 5 3 5 10
47 19HP1A05B4 10 8 3 6 0 10 12
48 19HP1A05B5 10 10 5 8 4 8 15
49 19HP1A05B6 10 9 8 8 9 9 18
50 19HP1A05B7 10 10 10 10 9 9 19
51 19HP1A05B8 6 10 9 7.5 5 9 16
52 19HP1A05B9 9 1 0 4.5 4 1 7
53 19HP1A05C0 10 9 6 0 10 0 12
54 19HP1A05C1 10 5 1 0 5 0 7
55 19HP1A05C2 10 8 0 5.5 6 10 13
56 19HP1A05C3 9 6 1 5 7 5 11
57 19HP1A05C4 10 4 7 9 4 9 14
58 19HP1A05C5 9 1 0 3.5 1 3 6
59 19HP1A05C6 10 10 8 9 10 9 19
60 19HP1A05C7 10 10 9 5 4.5 9 16
61 20HP5A0507 10 0 3 6 4 9 11
62 20HP5A0508 10 0 6 6 5 10 12
63 20HP5A0509 9 4 0 7 4 2 9
64 20HP5A0510 9 3 1 0 6.5 4 8
65 20HP5A0511 10 10 3 9 10 8.5 17
66 20HP5A0512 10 10 2 9 10 9 17
67 20HP5A0513 10 4 1 7 6.5 5 11
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools
SUGGESTION:
How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools
SUGGESTION:
How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools
SUGGESTION:
How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools
SUGGESTION:
How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools
SUGGESTION:
How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools
SUGGESTION:
How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------
COURSE FILE