0% found this document useful (0 votes)

56 views

COURSEFILE

This document provides a lesson plan for a course on Data Warehousing and Data Mining. It includes the course objectives, outcomes, topics to be covered over 14 weeks divided into 5 units, and the faculty details. The units cover concepts related to data warehousing, data pre-processing, frequent pattern mining, classification techniques and clustering. Each topic is allotted classes and the overall number of classes required is 70. Textbooks and references for the course are also listed.

Uploaded by

siva 278

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views

COURSEFILE

Uploaded by

siva 278

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 45

COURSE FILE

Subject: DATA WAREHOUSING AND DATA MINING

Academic Year: 2021-2022

Name of the Faculty: MD ARSHA SULTANA

Department: CSE

Batch & Year: 19HP BATCH & IIIYEAR SEM-I

Course File Index

Sno Item Page No(From –to)

Vision , Mission& PEO
1 Syllabus
2 Lesson Plan
3 Timetable
4 Academic Calendar
5 List of Cos
6 List of CO-PO Mapping
7 Evaluation and CO Assessment Tools
8 Internal (CIE-1) Test Paper & Solution and Schema
9 Internal (CIE-2) Test Paper & Solution and Schema
10 Quiz paper and Solution. (if any)
11 Student List
12 Internal (CIE) Marks
13 External (SEE) Marks
14 CO Attainment Sheet
15 Course End Survey
16 Sample Answer Booklets/Lab Records/Reports
17 Course Materials (PPTs, class note (few pages))
18 Faculty Comments/Concluding Report
Vision
To equip the students with adequate technical knowledge that promotes them to contribute
their expertise in the arena of academics, industry and society.

Mission
M1:To produce the best quality computer science professionals through intellectual inputs.
M2:To impart quality training, hands on experience and value education to solve the real world
problems.
M3: To provide an environment that values and encourages knowledge acquisition.

Program Educational Objectives

PEO-I:To make them apply their computer science knowledge and problem solving skills to be
successful computer professionals in diverse career paths including supportive and leadership roles on
multidisciplinary teams.
PEO-II: To make graduates change their ability to master the latest areas of computer science.
PEO-III: Ability to design and develop software applications to meet societal needs.

Program Specific Outcomes

PSO-I: Able to develop code on multiple platforms meeting the needs of IT industry.
PSO-II: Able to analyze & apply the optimization techniques for the enhancement of application.
LESSON PLAN

Branch & Section : III B.Tech-I Sem Regulation: R19

Subject : DATA WAREHOUSING AND DATA MINING Academic Year: 2021-2022
Name of the Faculty : MD ARSHA SULTANA

COURSE OBJECTIVES
 To understand data warehouse concepts, architecture, business analysis and tools
 To understand data pre-processing and data visualization techniques
 To study algorithms for finding hidden and interesting patterns in data
 To understand and apply various classification and clustering techniques using tools
COURSE OUTCOMES
At the end of the course, the students will be able to:
 Design a Data warehouse system and perform business analysis with OLAP tools
 Apply suitable pre-processing and visualization techniques for data analysis
 Apply frequent pattern and association rule mining techniques for data analysis
 Apply appropriate classification techniques for data analysis
 Apply appropriate clustering techniques for data analysis

No. of
Topic
Unit No Date Name of the Concept Classes
No
Required
UNIT-I
01/10/2021 Data Warehousing, Business Analysis and On-Line Analytical Processing
1 1
(OLAP): Basic Concepts
2 02/10/2021 Data Warehousing Components 2
3 05/10/2021 Building a Data Warehouse 1
Unit - 1 4 06/10/2021 Database Architectures for Parallel Processing, Parallel DBMS Vendors 3
09/10/2021 Multidimensional Data Model, Data Warehouse Schemas for Decision
5 3
Support
6 14/10/2021 Concept Hierarchies, Characteristics of OLAP Systems 2
7 18/10/2021 Typical OLAP Operations, OLAP and OLTP 2
Total number of hours 14
UNIT-II
1 21/10/2021 Introduction: Introduction to Data Mining Systems 1
2 22/10/2021 Knowledge Discovery Process 1
3 23/10/2021 Data Mining Techniques, Issues, applications 3
4 27/10/2021 Data Objects and Attribute Types 2
Unit - 2 5 29/10/2021 Basic Statistical Descriptions of Data 1
6 30/10/2021 Data Visualization, Measuring Data Similarity and Dissimilarity 3
7 03/11/2021 Data Pre-processing: Data Cleaning 2
8 06/11/2021 Data Integration, Data Reduction 2
9 09/11/2021 Data Transformation and Data Discretization 2
Total number of hours 17
UNIT–III
1 10/11/2021 Frequent Pattern Analysis: Mining Frequent Patterns 1
2 11/11/2021 Associations and Correlations 1
3 12/11/2021 Mining Methods 3
Unit - 3 4 17/11/2021 Classification using Frequent Patterns 2
5 29/11/2021 Pattern Evaluation Method 1
6 01/12/2021 Pattern Mining in Multilevel, Multidimensional space 2
7 04/12/2021 Constraint Based Frequent Pattern Mining 2
Total number of hours 12
UNIT–IV
1 07/12/2021 Classification: Decision Tree Induction 2
2 09/12/2021 Bayes’ Theorem 1
3 10/12/2021 Naïve Bayesian Classification 1
4 13/12/2021 Bayesian Belief Networks 2
5 15/12/2021 Rule Based Classification 1
Unit – 4
6 16/12/2021 Classification by Back Propagation 1
7 17/12/2021 Support Vector Machines 1
8 18/12/2021 Lazy Learners 1
9 20/12/2021 Model Evaluation and Selection 2
10 22/12/2021 Techniques to improve Classification Accuracy 2
Total number of hours 14
UNIT–V
1 27/12/2021 Clustering: Clustering Techniques, Cluster analysis 2
2 29/12/2021 Partitioning Methods 2
3 31/12/2021 Hierarchical methods 1
4 03/01/2022 Density Based Methods 2
Unit – 5
5 05/01/2022 Grid Based Methods 1
6 06/01/2022 Evaluation of clustering 1
7 07/01/2022 Clustering high dimensional data, Clustering with constraints 2
8 11/01/2022 Outlier analysis, outlier detection methods 2
Total number of hours 13

OVERALL NUMBER OF CLASSES REQUIRED: 70

TEXT BOOKS:
1. Jiawei Han and Micheline Kamber, “Data Mining Concepts and Techniques”, Third Edition, Elsevier, 2012.
2. Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Introduction to Data Mining, Pearson,2016.
REFERENCES:
1) Alex Berson and Stephen J.Smith,―Data Warehousing, Data Mining & OLAP, Tata McGraw–Hill Edition, 35th
Reprint 2016.
2) K.P. Soman, ShyamDiwakar and V. Ajay,―Insight into Data Mining Theory and Practice, Eastern Economy
Edition, Prentice Hall of India, 2006.
3) Ian H.Witten and Eibe Frank,―Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, Second
Edition.
III CSE-1 SEM-1 TIME TABLE FOR THE ACADEMIC YEAR 2021-22

1 2 3 4 5 6 7
DAY / 10:55 am 11:50am - 12:45 pm –
09:00 am – 10:00 am – 01:40 pm – 02:30 pm- 03:20 pm-
PERIOD – 12:45 pm 01:40 pm
10:00 am 10:55 am 02:30 pm 03:20 pm 04:10 pm
11:50 am
MON CN CD STM AI REASONING DWDM CD

TUE AI STM DWDM ES CN AI CD

WED DWDM CN STM ------- CODING---------- AI REASONING

-------- CN LAB---------------
THU CD CN AI DWDM
(LAB4)
REASON ---------AITT LAB-----------
FRI ----- CODING---------- STM
ING (LAB3)
----DM LAB--------
SAT ES REASONING DWDM CN
(LAB3)

SUBJECT NAME FACULTY NAME

DATA WAREHOUSING AND DATA MINING Ms. Md. ARSHA SULTANA

COMPUTER NETWORKS Mr. B. V. SATISH BABU

COMPILER DESIGN Mr.L.V.RAMESH

ARTIFICIAL INTELLIGENCE Mr.K. VENKATESWARA RAO

SOFTWARE TESTING METHODOLOGIES Dr. A. SRINIVAS RAO

COMPUTER NETWORKS LAB Mr. B. V. SATISH BABU

Ms. Md. ARSHA SULTANA
AI TOOLS & TECHNIQUES LAB Mr. K. VENKATESWARA RAO
Mr.L.V.RAMESH
DATA MINING LAB Ms. Md. ARSHA SULTANA
Dr. A. SRINIVAS RAO
EMPLOYABILITY SKILLS -II Mr.SANTHANAM PILLAI

CODING Mr.ANIL KUMAR

REASONING Mr.PRASANNA

CO-ORDINATOR HOD PRINCIPAL

III CSE-2 SEM-1 TIME TABLE FOR THE ACADEMIC YEAR 2021-22

1 2 3 4 5 6 7
DAY / 11:50am - 12:45 pm
09:00 am – 10:00 am – 10:55 am – 01:40 pm – 02:30 pm- 03:20 pm-
PERIOD 12:45 pm –
10:00 am 10:55 am 11:50 am 02:30 pm 03:20 pm 04:10 pm
01:40 pm
-------------- DM LAB----------------
MON (LAB3) CN CD AI CN
------------- CN LAB------------------
TUE L DWDM REASONING STM DWDM
(LAB4)
----------------AITT LAB------------ U
WED N CN STM CD ES
(LAB3)
C
THU DWDM STM CD H ------- CODING---------- STM REASONING

FRI AI CD CN DWDM REASONING DWDM ES

SAT ---- CODING--------- REASONING AI CN AI CD

SUBJECT NAME FACULTY NAME

DATA WAREHOUSING AND DATA MINING Ms. Md. ARSHA SULTANA

COMPUTER NETWORKS Mr. B. V. SATISH BABU

COMPILER DESIGN Mr.L.V.RAMESH

ARTIFICIAL INTELLIGENCE Mr.K. VENKATESWARA RAO

SOFTWARE TESTING METHODOLOGIES Dr. A. SRINIVAS RAO

Mr. B. V. SATISH BABU
COMPUTER NETWORKS LAB
Mrs.M.MOHANA DEEPTHI
Mr.K. VENKATESWARA RAO
AI TOOLS & TECHNIQUES LAB
Mr.L.V.RAMESH
Ms. Md. ARSHA SULTANA
DATA MINING LAB
Mr. K.SIVA RAMA KRISHNA
EMPLOYABILITY SKILLS -II Mr.SANTHANAM PILLAI

CODING Mr.ANIL KUMAR

REASONING Mr.PRASANNA

CO-ORDINATOR HOD PRINCIPAL

LIST OF CO’s

DATA WAREHOUSING AND DATA MINING

CO1 Able to understand data warehouse concepts, architecture, business analysis and tools
CO2 Able to understand data pre-processing and data visualization techniques
CO3 Able to study algorithms for finding hidden and interesting patterns in data
CO4 Able to understand and apply various classification techniques using tools
CO5 Able to understand and apply various clustering techniques using tools

DATA MINING LAB

CO1 Able to understand the mathematical basics quickly and covers each and every condition
of data mining in order to prepare for real-world problems
CO2 The various classes of algorithms will be covered to give a foundation to further apply
knowledge to dive deeper into the different flavors of algorithms
CO3 Students should aware of packages and libraries of R and also familiar with functions
used in R for visualization
CO4 To enable students to use R to conduct analytics on large real life datasets
CO5 To familiarize students with how various statistics like mean median etc and data can be
collected for data exploration in R
List of CO-PO Mapping

COURSE OUTCOME PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
DATA WAREHOUSING
AND DATA MINING
CO1 Able to understand data
warehouse concepts,
architecture, business analysis
2 2
and tools
CO2 Able to understand data pre- 2 2
processing and data
visualization techniques
CO3 Able to study algorithms for 3 2
finding hidden and interesting
patterns in data
CO4 Able to understand and apply 3 2
various classification
techniques using tools
CO5 Able to understand and apply 2 2
various clustering techniques
using tools
2.40 2.00

COURSE OUTCOME PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2
DATA MINING LAB
CO1 Able to understand the
mathematical basics quickly
and covers each and every
3 3 3 2 2 1 1
condition of data mining in
order to prepare for real-
world problems
CO2 The various classes of 2
algorithms will be covered to
give a foundation to further
3 3 3 1 1
apply knowledge to dive
deeper into the different
flavors of algorithms
CO3 Students should aware of 2
packages and libraries of R
and also familiar with
2 3 3 1 1
functions used in R for
visualization
CO4 To enable students to use R to 2
conduct analytics on large
real life datasets
3 3 3 2 1 1
CO5 To familiarize students with 2
how various statistics like
mean median etc and data can
3 3 3 2 1 1
be collected for data
exploration in R
2.80 3.00 3.00 2.00 2.00 1.00 1.00
SUB CODE:R1931051 MID 1 MID 2
DE DE
ON A1 AV MID1 ON A1 AV MID2 Ma Mi
SN S A2 A3 S A2
Reg No L 5 G TOTA L 5 G TOTA x n Internal
o 10 5M 5M 10 5M
10M M 5M L 10M M 5M L 0.8 0.2
M M
1 19HP1A0501 4 4 5 5 5 5 13 3 4 5 5 5 12 10 2 12
2 19HP1A0502 8 3 5 5 5 5 16 7 4 5 5 5 16 13 3 16
3 19HP1A0503 9 AB 5 5 5 5 14 8 3 5 5 5 16 13 3 16
4 19HP1A0504 8 5 5 5 5 5 18 9 4 5 5 5 18 14 4 18
5 19HP1A0505 7 5 5 5 5 5 17 8 AB 5 5 5 13 14 3 17
6 19HP1A0506 9 7 5 5 5 5 21 7 5 5 5 5 17 17 3 20
7 19HP1A0507 7 2 5 5 5 5 14 9 2 5 5 5 16 13 3 16
8 19HP1A0508 10 5 5 5 5 5 20 8 5 5 5 5 18 16 4 20
9 19HP1A0509 10 6 5 5 5 5 21 8 5 5 5 5 18 17 4 21
10 19HP1A0510 8 6 5 5 5 5 19 8 5 5 5 5 18 15 4 19
11 19HP1A0511 6 5 5 5 5 5 16 2 2 5 5 5 9 13 2 15
12 19HP1A0512 7 5 5 5 5 5 17 7 5 5 5 5 17 14 3 17
13 19HP1A0513 0 0 5 5 5 5 5 9 3 5 5 5 17 14 1 15
14 19HP1A0514 10 7 5 5 5 5 22 10 2 5 5 5 17 18 3 21
15 19HP1A0515 AB AB 0 0 0 0 0 AB AB 0 0 0 0 0 0 0
16 19HP1A0516 9 6 5 5 5 5 20 6 5 5 5 5 16 16 3 19
17 19HP1A0517 10 6 5 5 5 5 21 9 5 5 5 5 19 17 4 21
18 19HP1A0518 8 4 5 5 5 5 17 8 4 5 5 5 17 14 3 17
19 19HP1A0519 8 3 5 5 5 5 16 7 5 5 5 5 17 14 3 17
20 19HP1A0520 10 7 5 5 5 5 22 7 5 5 5 5 17 18 3 21
21 19HP1A0521 9 6 5 5 5 5 20 0 3 5 5 5 8 16 2 18
22 19HP1A0522 8 6 5 5 5 5 19 8 7 5 5 5 20 16 4 20
23 19HP1A0523 9 5 5 5 5 5 19 7 4 5 5 5 16 15 3 18
24 19HP1A0524 AB AB 5 5 5 5 5 3 5 5 5 5 13 10 1 11
25 19HP1A0525 8 4 5 5 5 5 17 8 4 5 5 5 17 14 3 17
26 19HP1A0526 10 5 5 5 5 5 20 10 3 5 5 5 18 16 4 20
27 19HP1A0527 10 6 5 5 5 5 21 10 4 5 5 5 19 17 4 21
28 19HP1A0528 9 6 5 5 5 5 20 8 3 5 5 5 16 16 3 19
29 19HP1A0529 9 5 5 5 5 5 19 10 6 5 5 5 21 17 4 21
30 19HP1A0530 10 5 5 5 5 5 20 9 AB 5 5 5 14 16 3 19
31 19HP1A0531 7 4 5 5 5 5 16 7 AB 5 5 5 12 13 2 15
32 19HP1A0532 AB AB 5 5 5 5 5 9 2 5 5 5 16 13 1 14
33 19HP1A0533 9 3 5 5 5 5 17 8 5 5 5 5 18 14 3 17
34 19HP1A0534 8 6 5 5 5 5 19 6 4 5 5 5 15 15 3 18
35 19HP1A0535 7 4 5 5 5 5 16 3 3 5 5 5 11 13 2 15
36 19HP1A0536 9 5 5 5 5 5 19 9 6 5 5 5 20 16 4 20
37 19HP1A0537 10 5 5 5 5 5 20 10 6 5 5 5 21 17 4 21
38 19HP1A0539 5 5 5 5 5 5 15 6 8 5 5 5 19 15 3 18
39 19HP1A0540 5 6 5 5 5 5 16 8 7 5 5 5 20 16 3 19
40 19HP1A0541 6 5 5 5 5 5 16 5 6 5 5 5 16 13 3 16
41 19HP1A0542 7 5 5 5 5 5 17 7 5 5 5 5 17 14 3 17
42 19HP1A0543 4 6 5 5 5 5 15 5 4 5 5 5 14 12 3 15
43 19HP1A0544 3 3 5 5 5 5 11 3 2 5 5 5 10 9 2 11
44 19HP1A0545 3 5 5 5 5 5 13 3 6 5 5 5 14 11 3 14
45 19HP1A0546 5 6 5 5 5 5 16 3 4 5 5 5 12 13 2 15
46 19HP1A0547 6 4 5 5 5 5 15 7 6 5 5 5 18 14 3 17
47 19HP1A0549 7 6 5 5 5 5 18 AB AB 5 5 5 5 14 1 15
48 19HP1A0550 8 7 5 5 5 5 20 7 6 5 5 5 18 16 4 20
49 19HP1A0551 8 4 5 5 5 5 17 7 5 5 5 5 17 14 3 17
50 19HP1A0552 5 3 5 5 5 5 13 4 2 5 5 5 11 10 2 12
51 19HP1A0553 8 6 5 5 5 5 19 8 5 5 5 5 18 15 4 19
52 19HP1A0554 7 2 5 5 5 5 14 3 4 5 5 5 12 11 2 13
53 19HP1A0555 10 4 5 5 5 5 19 7 AB 5 5 5 12 15 2 17
54 19HP1A0556 7 5 5 5 5 5 17 8 5 5 5 5 18 14 3 17
55 19HP1A0557 6 6 5 5 5 5 17 6 4 5 5 5 15 14 3 17
56 19HP1A0558 9 7 5 5 5 5 21 8 5 5 5 5 18 17 4 21
57 19HP1A0559 9 7 5 5 5 5 21 7 6 5 5 5 18 17 4 21
58 19HP1A0560 7 5 5 5 5 5 17 8 4 5 5 5 17 14 3 17
59 19HP1A0561 10 6 5 5 5 5 21 10 6 5 5 5 21 17 4 21
60 19HP1A0563 7 3 5 5 5 5 15 5 4 5 5 5 14 12 3 15
61 19HP1A0564 9 4 5 5 5 5 18 7 7 5 5 5 19 15 4 19
62 19HP1A0565 7 4 5 5 5 5 16 7 5 5 5 5 17 14 3 17
63 20HP5A0501 7 6 5 5 5 5 18 7 3 5 5 5 15 14 3 17
64 20HP5A0502 6 4 5 5 5 5 15 9 5 5 5 5 19 15 3 18
65 20HP5A0503 10 6 5 5 5 5 21 9 7 5 5 5 21 17 4 21
66 20HP5A0504 8 4 5 5 5 5 17 9 5 5 5 5 19 15 3 18
67 20HP5A0505 7 3 5 5 5 5 15 7 3 5 5 5 15 12 3 15
68 20HP5A0506 8 6 5 5 5 5 19 7 3 5 5 5 15 15 3 18

SUB MID 1 MID 2

CODE:R1931051
DE DE
ON A1 A2 A3 AV MID1 ON A1 A2 AV MID2 Ma Mi
SN S S
Reg No L 5 5 5 G TOTA L 5 5 G TOTA x n Internal
o 10 10
10M M M M 5M L 10M M M 5M L 0.8 0.2
M M
1 19HP1A0566 8 6 5 5 5 5 19 3 4 5 5 5 12 15 2 17
2 19HP1A0567 10 7 5 5 5 5 22 10 7 5 5 5 22 18 4 22
3 19HP1A0568 10 6 5 5 5 5 21 8 4 5 5 5 17 17 3 20
4 19HP1A0569 9 6 5 5 5 5 20 8 3 5 5 5 16 16 3 19
5 19HP1A0570 7 7 5 5 5 5 19 6 3 5 5 5 14 15 3 18
6 19HP1A0571 7 6 5 5 5 5 18 7 2 5 5 5 14 14 3 17
7 19HP1A0572 3 3 5 5 5 5 11 3 4 5 5 5 12 10 2 12
8 19HP1A0573 9 5 5 5 5 5 19 9 4 5 5 5 18 15 4 19
9 19HP1A0574 10 6 5 5 5 5 21 10 6 5 5 5 21 17 4 21
10 19HP1A0575 7 7 5 5 5 5 19 2 3 5 5 5 10 15 2 17
11 19HP1A0577 9 6 5 5 5 5 20 10 7 5 5 5 22 18 4 22
12 19HP1A0578 9 6 5 5 5 5 20 10 7 5 5 5 22 18 4 22
13 19HP1A0579 8 4 5 5 5 5 17 7 5 5 5 5 17 14 3 17
14 19HP1A0580 6 7 5 5 5 5 18 9 5 5 5 5 19 15 4 19
15 19HP1A0581 5 6 0 0 0 0 11 7 3 0 0 0 10 13 3 16
16 19HP1A0582 5 3 5 5 5 5 13 8 4 5 5 5 17 14 3 17
17 19HP1A0583 10 6 5 5 5 5 21 9 6 5 5 5 20 17 4 21
18 19HP1A0584 10 7 5 5 5 5 22 10 5 5 5 5 20 18 4 22
19 19HP1A0585 10 4 5 5 5 5 19 6 6 5 5 5 17 15 3 18
20 19HP1A0586 8 7 5 5 5 5 20 9 6 5 5 5 20 16 4 20
21 19HP1A0587 4 3 5 5 5 5 12 5 1 5 5 5 11 10 2 12
22 19HP1A0588 10 7 5 5 5 5 22 9 3 5 5 5 17 18 3 21
23 19HP1A0589 7 6 5 5 5 5 18 7 5 5 5 5 17 14 3 17
24 19HP1A0590 10 6 5 5 5 5 21 10 6 5 5 5 21 17 4 21
25 19HP1A0591 8 6 5 5 5 5 19 5 3 5 5 5 13 15 3 18
26 19HP1A0592 8 6 5 5 5 5 19 6 3 5 5 5 14 15 3 18
27 19HP1A0593 10 5 5 5 5 5 20 10 5 5 5 5 20 16 4 20
28 19HP1A0594 10 5 5 5 5 5 20 8 5 5 5 5 18 16 4 20
29 19HP1A0596 10 6 5 5 5 5 21 10 5 5 5 5 20 17 4 21
30 19HP1A0597 7 5 5 5 5 5 17 9 3 5 5 5 17 14 3 17
31 19HP1A0598 5 2 5 5 5 5 12 1 4 5 5 5 10 10 2 12
32 19HP1A0599 7 6 5 5 5 5 18 8 4 5 5 5 17 14 3 17
33 19HP1A05A0 10 6 5 5 5 5 21 8 4 5 5 5 17 17 3 20
34 19HP1A05A1 6 6 5 5 5 5 17 8 4 5 5 5 17 14 3 17
35 19HP1A05A2 6 7 5 5 5 5 18 4 4 5 5 5 13 14 3 17
36 19HP1A05A3 10 6 5 5 5 5 21 10 6 5 5 5 21 17 4 21
37 19HP1A05A4 8 6 5 5 5 5 19 4 5 5 5 5 14 15 3 18
38 19HP1A05A5 2 6 5 5 5 5 13 3 2 5 5 5 10 10 2 12
39 19HP1A05A6 10 5 5 5 5 5 20 10 4 5 5 5 19 16 4 20
40 19HP1A05A7 7 7 5 5 5 5 19 7 3 5 5 5 15 15 3 18
41 19HP1A05A8 4 4 5 5 5 5 13 5 5 5 5 5 15 12 3 15
42 19HP1A05A9 9 7 5 5 5 5 21 9 6 5 5 5 20 17 4 21
43 19HP1A05B0 4 8 5 5 5 5 17 4 5 5 5 5 14 14 3 17
44 19HP1A05B1 10 7 5 5 5 5 22 9 5 5 5 5 19 18 4 22
45 19HP1A05B2 10 7 5 5 5 5 22 9 6 5 5 5 20 18 4 22
46 19HP1A05B3 6 6 5 5 5 5 17 5 4 5 5 5 14 14 3 17
47 19HP1A05B4 7 5 5 5 5 5 17 6 6 5 5 5 17 14 3 17
48 19HP1A05B5 9 7 5 5 5 5 21 7 6 5 5 5 18 17 4 21
49 19HP1A05B6 9 6 5 5 5 5 20 9 4 5 5 5 18 16 4 20
50 19HP1A05B7 10 5 5 5 5 5 20 10 6 5 5 5 21 17 4 21
51 19HP1A05B8 9 5 5 5 5 5 19 8 4 5 5 5 17 15 3 18
52 19HP1A05B9 4 6 5 5 5 5 15 4 5 5 5 5 14 12 3 15
53 19HP1A05C0 9 6 5 5 5 5 20 4 7 5 5 5 16 16 3 19
54 19HP1A05C1 6 7 5 5 5 5 18 2 4 5 5 5 11 14 2 16
55 19HP1A05C2 6 6 5 5 5 5 17 8 6 5 5 5 19 15 3 18
56 19HP1A05C3 6 5 5 5 5 5 16 6 5 5 5 5 16 13 3 16
57 19HP1A05C4 7 7 5 5 5 5 19 8 5 5 5 5 18 15 4 19
58 19HP1A05C5 4 5 5 5 5 5 14 3 3 5 5 5 11 11 2 13
59 19HP1A05C6 10 6 5 5 5 5 21 10 4 5 5 5 19 17 4 21
60 19HP1A05C7 10 5 5 5 5 5 20 7 5 5 5 5 17 16 3 19
61 20HP5A0507 5 4 5 5 5 5 14 7 4 5 5 5 16 13 3 16
62 20HP5A0508 6 3 5 5 5 5 14 7 3 5 5 5 15 12 3 15
63 20HP5A0509 5 5 5 5 5 5 15 5 2 5 5 5 12 12 2 14
64 20HP5A0510 5 5 5 5 5 5 15 4 4 5 5 5 13 12 3 15
65 20HP5A0511 8 5 5 5 5 5 18 10 4 5 5 5 19 15 4 19
66 20HP5A0512 8 5 5 5 5 5 18 10 5 5 5 5 20 16 4 20
67 20HP5A0513 5 5 5 5 5 5 15 7 4 5 5 5 16 13 3 16

III B.TECH SEM-I, MID–I EXAMINATION

DATE: 22-11-2021 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30

SUBJECT: DATA WAREHOUSING AND DATA MINING

ANSWER ALL QUESTIONS

Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
Compare and contrast OLAP and OLTP (or)
1a Define the terms OLTP and OLAP? 5 CO1 PO2 - L3 5/30 16.66%
What is data cube? Give an example for 2-D
view of and 3-D data cube representation of PO1,PO
b the data. 5 CO1 2 - L3 5/30 16.66%
Given the following measurement for the
variable age: 18, 22, 25, 42, 28, 43, 33, 35,
56, 28 Standardize the variables by the
following:
(i) Compute the mean absolute deviation
for age.
(ii) Compute the Z-score for the first
four measurements. PO1,PO
2a 5 CO2 2 - L3 5/30 16.66%
What is noisy data? Explain the binning
b methods for data smoothening. 5 CO2 PO1 - L2 5/30 16.66%
Write the algorithm to discover frequent
item sets without candidate generation and
3 explain it with an example. 10 CO3 PO2 - L2 10/30 33.33%

SCHEME OF EVALUATION
DATE: 22-11-2021 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30

SUBJECT: DATA WAREHOUSING AND DATA MINING

1a) Compare and contrast OLAP and OLTP (or) Define the terms OLTP and OLAP
OLAP and OLTP difference: 5 Marks
b) What is data cube? Give an example for 2-D view of and 3-D data cube representation of the data.
Data cube: 1 Mark
2-D view: 2 Marks
3-D view: 2 Marks
2 a) Given the following measurement for the variable age: 18, 22, 25, 42, 28, 43, 33, 35, 56, 28
Standardize the variables by the following:
(i) Compute the mean absolute deviation for age.
(ii) Compute the Z-score for the first four measurements.
Mean absolute deviation for age: 2.5 Marks
Z-score for the first four measurements: 2.5 Marks
b) What is noisy data? Explain the binning methods for data smoothening.
Noisy data: 1 Mark
Binning Method: 4 Marks
3 Write the algorithm to discover frequent item sets without candidate generation and explain it with
an example.
Frequent itemset: 2 Marks
Explanation: 3 Marks
Example: 5 Marks

ANSWERS
DATE: 22-11-2021 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30

SUBJECT: DATA WAREHOUSING AND DATA MINING

1a) Compare and contrast OLAP and OLTP (or) Define the terms OLTP and OLAP

b) What is data cube? Give an example for 2-D view of and 3-D data cube representation of the data.
“What is a data cube?” : A data cube allows data to be modeled and viewed in multiple dimensions.
It is defined by dimensions and facts.
 In general terms, dimensions are the perspectives or entities with respect to which an organization
wants to keep records. For example, AllElectronics may create a sales data warehouse in order to
keep records of the store’s sales with respect to the dimensions time, item, branch, and location.
 These dimensions allow the store to keep track of things like monthly sales of items and the branches
and locations at which the items were sold. Each dimension may have a table associated with it,
called a dimension table, which further describes the dimension.
 For example, a dimension table for item may contain the attributes item name, brand, and type.
Dimension tables can be specified by users or experts, or automatically generated and adjusted based
on data distributions.
 A multidimensional data model is typically organized around a central theme, such as sales. This
theme is represented by a fact table.
 Facts are numeric measures. Think of them as the quantities by which we want to analyze
relationships between dimensions. Examples of facts for a sales data warehouse include dollars sold
(sales amount in dollars), units sold (number of units sold), and amount budgeted. The fact table
contains the names of the facts, or measures, as well as keys to each of the related dimension tables.
Although we usually think of cubes as 3-D geometric structures, in data warehousing the data cube is
n-dimensional.
 To gain a better understanding of data cubes and the multidimensional data model, let’s start by
looking at a simple 2-D data cube that is, in fact, a table or spreadsheet for sales data from
AllElectronics. In particular, we will look at the AllElectronics sales data for items sold per quarter
in the city of Vancouver. These data are shown in Table.

 In this 2-D representation, the sales for Vancouver are shown with respect to the time dimension
(organized in quarters) and the item dimension (organized according to the types of items sold). The
fact or measure displayed is dollars sold (in thousands).
 Now, suppose that we would like to view the sales data with a third dimension. For instance, suppose
we would like to view the data according to time and item, as well as location, for the cities Chicago,
New York, Toronto, and Vancouver. These 3-D data are shown in Table.

 The 3-D data in the table are represented as a series of 2-D tables. Conceptually, we may also
represent the same data in the form of a 3-D data cube, as in Figure.

 Suppose that we would now like to view our sales data with an additional fourth dimension such as
supplier. Viewing things in 4-D becomes tricky. However, we can think of a 4-D cube as being a
series of 3-D cubes, as shown in Figure
 If we continue in this way, we may display any n-dimensional data as a series of (n − 1)-dimensional
“cubes.” The data cube is a metaphor for multidimensional data storage. The actual physical storage
of such data may differ from its logical representation. The important thing to remember is that data
cubes are n-dimensional and do not confine data to 3-D.
2 a) Given the following measurement for the variable age: 18, 22, 25, 42, 28, 43, 33, 35, 56, 28
Standardize the variables by the following:
(i) Compute the mean absolute deviation for age.
(ii) Compute the Z-score for the first four measurements.

b) What is noisy data? Explain the binning methods for data smoothening.
“What is noise?” Noise is a random error or variance in a measured variable. Given a numeric
attribute such as price, how can we “smooth” out the data to remove the noise? Let’s look at the
following data smoothing techniques.
 Binning: Binning methods smooth a sorted data value by consulting its “neighborhood,” that is, the
values around it. The sorted values are distributed into a number of “buckets,” or bins. Because
binning methods consult the neighborhood of values, they perform local smoothing.
 In this example, the data for price are first sorted and then partitioned into equal-frequency bins of
size 3 (i.e., each bin contains three values).
 In smoothing by bin means, each value in a bin is replaced by the mean value of the bin. For
example, the mean of the values 4, 8, and 15 in Bin 1 is 9.
 Therefore, each original value in this bin is replaced by the value 9. Similarly, smoothing by bin
medians can be employed, in which each bin value is replaced by the bin median.
 In smoothing by bin boundaries, the minimum and maximum values in a given bin are identified as
the bin boundaries. Each bin value is then replaced by the closest boundary value.
 In general, the larger the width, the greater the effect of the smoothing. Alternatively, bins may be
equal width, where the interval range of values in each bin is constant. Binning is also used as a
discretization technique .
 Regression: Data smoothing can also be done by regression, a technique that conforms data values
to a function. Linear regression involves finding the “best” line to fit two attributes (or variables)
so that one attribute can be used to predict the other. Multiple linear regression is an extension of
linear regression, where more than two attributes are involved and the data are fit to a
multidimensional surface.
 Outlier analysis: Outliers may be detected by clustering, for example, where similar values are
organized into groups, or “clusters.” Intuitively, values that fall outside of the set of clusters may be
considered outliers.
 Many data smoothing methods are also used for data discretization and data reduction . For
example, the binning techniques described before reduce the number of distinct values per attribute.
This acts as a form of data reduction for logic-based data mining methods, such as decision tree
induction, which repeatedly makes value comparisons on sorted data.
 Concept hierarchies are a form of data discretization that can also be used for data smoothing. A
concept hierarchy for price, for example, may map real price values into inexpensive, moderately
 priced, and expensive, thereby reducing the number of data values to be handled by the mining
process. Some methods of classification (e.g., neural networks) have built-in data smoothing
mechanisms.
3 Write the algorithm to discover frequent item sets without candidate generation and explain it with
an example.
 FP-growth algorithm that takes a radically different approach to discovering frequent itemsets. The
algorithm does not subscribe to the generate-and-test paradigm of Apriori. Instead, it encodes the
data set using a compact data structure called an FP-tree and extracts frequent itemsets directly
from this structure.
1 FP-Tree Representation
 An FP-tree is a compressed representation of the input data. It is constructed by reading the data set
one transaction at a time and mapping each transaction onto a path in the FP-tree.
 As different transactions can have several items in common, their paths may overlap. The more the
paths overlap with one another, the more compression we can achieve using the FP-tree structure.
 If the size of the FP-tree is small enough to fit into main memory, this will allow us to extract
frequent itemsets directly from the structure in memory instead of making repeated passes over the
data stored on disk.
 The below fig shows a data set that contains ten transactions and five items. The structures of the FP-
tree after reading the first three transactions are also depicted in the diagram. Each node in the tree
contains the label of an item along with a counter that shows the number of transactions mapped onto
the given path. Initially, the FP-tree contains only the root node represented by the null symbol.

The FP-tree is subsequently extended in the following way:

 1. The data set is scanned once to determine the support count of each item. Infrequent items are
discarded, while the frequent items are sorted in decreasing support counts. For the data set shown in
fig, a is the most frequent item, followed by b, c, d, and e.
 2. The algorithm makes a second pass over the data to construct the FP-tree. After reading the first
transaction, {a,b}, the nodes labeled as a and b are created. A path is then formed from nullàaàb to
encode the transaction. Every node along the path has a frequency count of 1.
 3. After reading the second transaction, {b,c,d}, a new set of nodes is created for items b, c, and d. A
path is then formed to represent the transaction by connecting the nodes nullàbàcàd. Every node
along this path also has a frequency count equal to one. Although the first two transactions have an
item in common, which is b, their paths are disjoint because the transactions do not share a common
prefix.
 4. The third transaction, {a,c,d,e}, shares a common prefix item (which is a) with the first
transaction. As a result, the path for the third transaction, null ) aàcàdàe, overlaps with the path
for the first transaction, nullàaàb. Because of their overlapping path, the frequency count for node
a is incremented to two, while the frequency counts for the newly created nodes, c, d, and e are equal
to one.
 5. This process continues until every transaction has been mapped onto one of the paths given in the
FP-tree. The resulting FP-tree after reading all the transactions is shown at the bottom of fig.
 The size of an FP-tree is typically smaller than the size of the uncompressed data because many
transactions in market basket data often share a few items in common. In the best-case scenario,
where all the transactions have the same set of items, the FP-tree contains only a single branch of
nodes. The worst-case scenario happens when every transaction has a unique set of items.
III B.TECH SEM-I, MID–II EXAMINATION
DATE: 18-01-2022 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30

SUBJECT: DATA WAREHOUSING AND DATA MINING

ANSWER ALL QUESTIONS

Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
What is correlation analysis? How
correlation analysis will be used to generate
1 interested strongest association rule 10 CO3 PO2 - L3 10/30 33.33%
Explain in detail about Characteristics of PO1,PO
2a Naive Bayes Classifiers 5 CO4 2 - L2 5/30 16.66%
Explain in detail about rule induction using
b a Sequential Covering Algorithm 5 CO4 PO2 - L2 5/30 16.66%
How can we conduct cluster analysis on
3a high-dimensional data? Explain in detail. 5 CO5 PO2 - L2 5/30 16.66%
Explain Clustering Based on Density PO1,PO
b Distribution Functions 5 CO5 2 L2 5/30 16.66%
SCHEME OF EVALUATION
DATE: 18-01-2022 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30

SUBJECT: DATA WAREHOUSING AND DATA MINING

1 What is correlation analysis? How correlation analysis will be used to generate interested strongest
association rule
Association rule: 3 Marks
Correlation analysis: 2 Marks
Correlation analysis to generate interested strongest association rule: 5 Marks
2 a) Explain in detail about Characteristics of Naive Bayes Classifiers
Naive Bayes Classifiers: 2 Marks
Characteristics of Naive Bayes Classifiers: 3 Marks
b) Explain in detail about rule induction using a Sequential Covering Algorithm
Association rule: 2 Marks
Rule induction using a Sequential Covering Algorithm: 3 Marks
3 a) How can we conduct cluster analysis on high-dimensional data? Explain in detail.
Cluster: 2 Marks
Cluster analysis on high-dimensional data: 3 Marks
b) Explain Clustering Based on Density Distribution Functions
Clustering: 1 Mark
Clustering Based on Density Distribution Functions: 4 Marks
ANSWERS
DATE: 18-01-2022 YEAR/BRANCH: III CSE
TIME: 01:00 PM TO 02:30 PM MAX MARKS: 30

SUBJECT: DATA WAREHOUSING AND DATA MINING

1 What is correlation analysis? How correlation analysis will be used to generate interested strongest
association rule
Association rule:
 Given: (1) database of transaction, (2) each transaction is a list of items (purchased by a customer in
visit)
 Find: all rules that correlate the presence of one set of items with that of another set of items.
 E.g., 98% of people who purchase tires and auto accessories also get automotive - services
done.
 E.g., Market Basket Analysis - This process analyzes customer buying habits by finding
associations between the different items that customers place in their “Shopping Baskets”.
The discovery of such associations can help retailers develop marketing strategies by gaining
insight into which items are frequently purchased together by customer
 The support and confidence measures are insufficient at filtering out uninteresting association rules.
To tackle this weakness, a correlation measure can be used to augment the support–confidence
framework for association rules. This leads to correlation rules of the form
A ⇒ B [support, confidence, correlation].
 That is, a correlation rule is measured not only by its support and confidence but also by the
correlation between itemsets A and B. There are many different correlation measures from which to
choose.
 Lift is a simple correlation measure that is given as follows. The occurrence of itemset A is
independent of the occurrence of itemset B if P(A∪B) = P(A)P(B); otherwise, itemsets A and B are
dependent and correlated as events. This definition can easily be extended to more than two itemsets.
The lift between the occurrence of A and B can be measured by computing
lift(A, B) = P(A∪B) / P(A)P(B) …….à(Eq1)
Correlation analysis using χ2. To compute the correlation using χ2 analysis for nominal data, we
need the observed value and expected value (displayed in parenthesis) for each slot of the contingency table,
as shown in Table 6.7. From the table, we can compute the χ2 value as follows:
2 a) Explain in detail about Characteristics of Naive Bayes Classifiers
Naive Bayes classifiers generally have the following characteristics:
 They are robust to isolated noise points because such points are averaged out when estimating
conditional probabilities from data. Naive Bayes classifiers can also handle missing values by
ignoring the example during model building and classification.
 They are robust to irrelevant attributes. If Xi is an irrelevant attribute, then P(Xi|Y) becomes almost
uniformly distributed. The class-conditional probability for Xi has no impact on the overall
computation of the posterior probability.
 Correlated attributes can degrade the performance of naive Bayes classifiers because the conditional
independence assumption no longer holds for such attributes. For example, consider the following
probabilities: P(A=0|Y=0)=0.4, P(A=1|Y=0)=0.6,
P(A=0|Y=1)=0.6, P(A=1|Y=1)=0.4,
 where A is a binary attribute and Y is a binary class variable. Suppose there is another binary
attribute B that is perfectly correlated with A when Y=0, but is independent of A when Y=1. For
simplicity, assume that the class-conditional probabilities for B are the same as for A. Given a record
with attributes A=0, B=0, we can compute its posterior probabilities as follows:

b) Explain in detail about rule induction using a Sequential Covering Algorithm

 IF-THEN rules can be extracted directly from the training data (i.e., without having to
generate a decision tree first) using a sequential covering algorithm. The name comes from the
notion that the rules are learned sequentially (one at a time), where each rule for a given class will
ideally cover many of the class’s tuples (and hopefully none of the tuples of other classes).
 Sequential covering algorithms are the most widely used approach to mining disjunctive sets of
classification rules. There are many sequential covering algorithms. Popular variations include AQ,
CN2, and the more recent RIPPER. The general strategy is as follows. Rules are learned one at a
time. Each time a rule is learned, the tuples covered by the rule are removed, and the process repeats
on the remaining tuples. This sequential learning of rules is in contrast to decision tree induction.
Because the path to each leaf in a decision tree corresponds to a rule, we can consider decision tree
induction as learning a set of rules simultaneously.

 In this way, the rules learned should be of high accuracy. The rules need not necessarily be of high
coverage. This is because we can have more than one rule for a class, so that different rules may
cover different tuples within the same class.
 The process continues until the terminating condition is met, such as when there are no more training
tuples or the quality of a rule returned is below a user-specified threshold.
 The Learn_One_Rule procedure finds the “best” rule for the current class, given the current set of
training tuples.
 “How are rules learned?” Typically, rules are grown in a general-to-specific manner (below Fig). We
can think of this as a beam search, where we start off with an empty rule and then gradually keep
appending attribute tests to it.
 We append by adding the attribute test as a logical conjunct to the existing condition of the rule
antecedent.

3 a) How can we conduct cluster analysis on high-dimensional data? Explain in detail.

Clustering is the process of partitioning a set of data objects (or observations) into subsets. Each
subset is a cluster, such that objects in a cluster are similar to one another, yet dissimilar to objects in
other clusters.
• The set of clusters resulting from a cluster analysis can be referred to as a clustering.
Clustering High-Dimensional Data
• Clustering high dimensional data is the cluster analysis of data with anywhere from a few dozen to
many thousands of dimensions.
• A data object may be described by 10 or more attributes. Such objects are referred to as a high-
dimensional data space.
• such high-dimensional data space of data are often encountered in areas such as medicine,
where DNA microarray technology can produce many measurements at once, and the clustering
of text documents, where, if a word-frequency vector is used, the number of dimensions equals
the size of the vocabulary.
• The clustering methods so far work well when the dimensionality is not high, that is, having less than
10 attributes.
• How can we conduct cluster analysis on high-dimensional data?
• Graph-based clustering is perhaps most robust for high-dimensional data as it uses the distance on a
graph, e.g. the number of shared neighbors, which is more meaningful in high dimensions compared
to the Euclidean distance.

• Clustering high-dimensional data is the search for clusters and the space in which they exist.
• Thus, there are two major kinds of methods: Subspace clustering approaches search for clusters
existing in subspaces of the given high-dimensional data space, where a subspace is defined using a
subset of attributes in the full space.
• Dimensionality reduction approaches try to construct a much lower-dimensional space and search for
clusters in such a space. Often, a method may construct new dimensions by combining some
dimensions from the original data.
b) Explain Clustering Based on Density Distribution Functions
• Density-Based Clustering refers to unsupervised learning methods that identify distinctive
groups/clusters in the data, based on the idea that a cluster in a data space is a contiguous region of
high point density, separated from other such clusters by contiguous regions of low point density
• Partitioning and hierarchical methods are designed to find spherical-shaped clusters. They have
difficulty finding clusters of arbitrary shape such as the “S” shape and oval clusters.
• In Density - Based Method we use three types of algorithms they are:
• Density-based spatial clustering of applications with noise (DBSCAN)
• Ordering Points to Identify the Clustering Structure (OPTICS)
• Clustering-Based on Density Distribution Functions (DENCLUE)
• Density-based clustering by Hinnebirg and Kiem. It enables a compact mathematical description of
arbitrarily shaped clusters in high dimension state of data, and it is good for data sets with a huge
amount of noise.
• Density is a measurement that compares the amount of matter an object has to its volume. An object
with much matter in a certain volume has high density, An object with little matter in the small
amount of volume has a low density.
• Density estimation is a core issue in density-based clustering methods. DENCLUE (DENsity-based
CLUstEring) is a clustering method based on a set of density distribution functions.
• Formally, let x1, ..…, xn be an independent and identically distributed sample of a random variable
f . The kernel density approximation of the probability density function is

• A frequently used kernel is a standard Gaussian function with a mean of 0 and a variance of 1:

• DENCLUE uses a Gaussian kernel to estimate density based on the given set of objects to be
clustered.
• . A point x ∗ is called a density attractor. if it is a local maximum of the estimated density function.
• To avoid trivial local maximum points, DENCLUE uses a noise threshold, ξ , and only considers
those density attractors x ∗ such that ˆ f (x ∗ ) ≥ ξ.
III B.TECH SEM-I, WEEKLY TEST–I EXAMINATION
DATE: 06-10-2021 YEAR/BRANCH: III CSE
TIME: 9:00 AM TO 10:00AM MAX MARKS: 20

SUBJECT: DATA WAREHOUSING AND DATA MINING

ANSWER ALL QUESTIONS

Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
Compare and contrast OLAP and OLTP (or)
1a Define the terms OLTP and OLAP? 5 CO1 PO2 - L3 5/20 25%
What is data cube? Give an example for 2-D
view of and 3-D data cube representation of PO1,PO
b the data. 5 CO1 2 - L3 5/20 25%
What is data ware housing? Do various
operations on multi dimensional data
2a models. 5 CO1 PO2 - L2 5/20 25%
What is the main advantage of using
Multidimensional OLAP (MOLAP) PO1,PO
b Servers? 5 CO1 2 L2 5/20 25%
III B.TECH SEM-I, WEEKLY TEST–II EXAMINATION
DATE: 08-12-2021 YEAR/BRANCH: III CSE
TIME: 9:00 AM TO 10:00AM MAX MARKS: 20

SUBJECT: DATA WAREHOUSING AND DATA MINING

ANSWER ALL QUESTIONS

Bloom’s % of
Q.no Question Marks CO PO PSO Level Marks Marks
Explain in detail about mining multilevel
1 and multidimensional association rule 10 CO3 PO2 - L3 10/20 50%
What is correlation analysis? How
correlation analysis will be used to generate
2 interested strongest association rule 10 CO3 PO2 - L3 10/20 50%
Students List

SNO Roll No SNO

Roll No
19HP1A050 19HP1A053
1 1 35 5
19HP1A050 19HP1A053
2 2 36 6
19HP1A050 19HP1A053
3 3 37 7
19HP1A050 19HP1A053
4 4 38 9
19HP1A050 19HP1A054
5 5 39 0
19HP1A050 19HP1A054
6 6 40 1
19HP1A050 19HP1A054
7 7 41 2
19HP1A050 19HP1A054
8 8 42 3
19HP1A050 19HP1A054
9 9 43 4
19HP1A051 19HP1A054
10 0 44 5
19HP1A051 19HP1A054
11 1 45 6
19HP1A051 19HP1A054
12 2 46 7
19HP1A051 19HP1A054
13 3 47 9
19HP1A051 19HP1A055
14 4 48 0
19HP1A051 19HP1A055
15 5 49 1
19HP1A051 19HP1A055
16 6 50 2
19HP1A051 19HP1A055
17 7 51 3
19HP1A051 19HP1A055
18 8 52 4
19HP1A051 19HP1A055
19 9 53 5
19HP1A052 19HP1A055
20 0 54 6
19HP1A052 19HP1A055
21 1 55 7
19HP1A052 19HP1A055
22 2 56 8
19HP1A052 19HP1A055
23 3 57 9
19HP1A052 19HP1A056
24 4 58 0
19HP1A052 19HP1A056
25 5 59 1
19HP1A052 19HP1A056
26 6 60 3
19HP1A052 19HP1A056
27 7 61 4
19HP1A052 19HP1A056
28 8 62 5
19HP1A052 20HP5A050
29 9 63 1
19HP1A053 20HP5A050
30 0 64 2
19HP1A053 20HP5A050
31 1 65 3
19HP1A053 20HP5A050
32 2 66 4
19HP1A053 20HP5A050
33 3 67 5
19HP1A053 20HP5A050
34 4 68 6

SNO Roll No SNO

Roll No
19HP1A05A
19HP1A0566
1 35 2
19HP1A05A
19HP1A0567
2 36 3
19HP1A05A
19HP1A0568
3 37 4
19HP1A05A
19HP1A0569
4 38 5
19HP1A05A
19HP1A0570
5 39 6
19HP1A05A
19HP1A0571
6 40 7
19HP1A05A
19HP1A0572
7 41 8
19HP1A05A
19HP1A0573
8 42 9
19HP1A05B
19HP1A0574
9 43 0
19HP1A05B
19HP1A0575
10 44 1
19HP1A05B
19HP1A0577
11 45 2
19HP1A05B
19HP1A0578
12 46 3
19HP1A05B
19HP1A0579
13 47 4
19HP1A05B
19HP1A0580
14 48 5
19HP1A05B
19HP1A0581
15 49 6
19HP1A05B
19HP1A0582
16 50 7
19HP1A05B
19HP1A0583
17 51 8
19HP1A05B
19HP1A0584
18 52 9
19HP1A05C
19HP1A0585
19 53 0
19HP1A05C
19HP1A0586
20 54 1
19HP1A05C
19HP1A0587
21 55 2
19HP1A05C
19HP1A0588
22 56 3
19HP1A05C
19HP1A0589
23 57 4
19HP1A05C
19HP1A0590
24 58 5
19HP1A05C
19HP1A0591
25 59 6
19HP1A05C
19HP1A0592
26 60 7
27 19HP1A0593 61 20HP5A0507
28 19HP1A0594 62 20HP5A0508
29 19HP1A0596 63 20HP5A0509
30 19HP1A0597 64 20HP5A0510
31 19HP1A0598 65 20HP5A0511
32 19HP1A0599 66 20HP5A0512
19HP1A05A
20HP5A0513
33 0 67
19HP1A05A
34 1

Internal (CIE) Marks

MID-I MID-II
SNO Roll No
Q1 Q2 Q3 Q4 Q5 Q6
1 19HP1A0501 7 2 2 4 3 0 6
2 19HP1A0502 3 10 9 10 8.5 0 14
3 19HP1A0503 9 10 7 8 7 8 16
4 19HP1A0504 10 9 3 10 9.5 6.5 16
5 19HP1A0505 9 10 2 6 10 5.5 14
6 19HP1A0506 9 9 8 10 2.5 6 15
7 19HP1A0507 8 9 3 7 10 8.5 15
8 19HP1A0508 10 10 9 8 4.5 9 17
9 19HP1A0509 10 10 10 6 9.5 6 17
10 19HP1A0510 10 9 4 5 10 7.5 15
11 19HP1A0511 10 6 0 0 0 5 7
12 19HP1A0512 6 6 8 6.5 3.5 8.5 13
13 19HP1A0513 AB AB AB 7 10 9 9
14 19HP1A0514 10 10 10 10 10 8.5 20
15 19HP1A0515 AB AB AB AB AB AB 0
16 19HP1A0516 10 6 9 10 0 5.5 14
17 19HP1A0517 10 10 9 10 4.5 10 18
18 19HP1A0518 10 9 3 10 7 5 15
19 19HP1A0519 10 10 2 5 7 7.5 14
20 19HP1A0520 10 10 8 10 10 0 16
21 19HP1A0521 9 10 8 0 0 0 9
22 19HP1A0522 8 9 7 10 8.5 3 15
23 19HP1A0523 10 10 7 5 7.5 7 16
24 19HP1A0524 AB AB AB 0 0 7 2
25 19HP1A0525 9 6 7 6 6 10 15
26 19HP1A0526 10 10 8 10 10 7.5 19
27 19HP1A0527 10 10 10 10 8 10 19
28 19HP1A0528 9 10 8 7 5 9.5 16
29 19HP1A0529 9 10 8 10 10 9 19
30 19HP1A0530 10 10 9 9 10 8 19
31 19HP1A0531 10 10 0 10 5 4 13
32 19HP1A0532 AB AB AB 10 10 6 9
33 19HP1A0533 6 10 9 10 8.5 3 16
34 19HP1A0534 10 10 3 5 3.5 8 13
35 19HP1A0535 9 10 0 8 0 0 9
36 19HP1A0536 10 10 5 10 7 9 17
37 19HP1A0537 10 10 9 10 10 7.5 19
38 19HP1A0539 8 6 0 3 4 8.5 10
39 19HP1A0540 6 7 1 9 2.5 10 12
40 19HP1A0541 1 7 8 8.5 0 4 10
41 19HP1A0542 10 9 0 6 7 6.5 13
42 19HP1A0543 6 4 0 5 2 7 8
43 19HP1A0544 2 5 0 0 0 6.5 5
44 19HP1A0545 3 5 0 4 0 3.5 5
45 19HP1A0546 8 6 0 4 0 2.5 7
46 19HP1A0547 10 6 0 8.5 5 7 12
47 19HP1A0549 6 9 4 AB AB AB 6
48 19HP1A0550 8 7 8 8 3 8 14
49 19HP1A0551 6 10 8 5 5 9 14
50 19HP1A0552 4 9 0 3.5 2 4 8
51 19HP1A0553 9 6 7 8 7.5 6.5 15
52 19HP1A0554 5 9 7 4 2 1 9
53 19HP1A0555 10 9 9 0 10 9 16
54 19HP1A0556 8 5 7 8 5 8.5 14
55 19HP1A0557 10 6 0 8 1 7 11
56 19HP1A0558 8 10 9 8 4.5 9 16
57 19HP1A0559 10 9 8 10 0 8.5 15
58 19HP1A0560 8 8 3 8 3.5 10 14
59 19HP1A0561 10 9 9 10 9 9 19
60 19HP1A0563 8 5 7 6 0 6.5 11
61 19HP1A0564 9 10 8 6 5.5 7 15
62 19HP1A0565 9 9 1 10 0 9 13
63 20HP5A0501 10 9 0 8.5 0 10 13
64 20HP5A0502 10 7 0 7 9 10 14
65 20HP5A0503 10 10 8 7 9 8.5 18
66 20HP5A0504 10 7 5 9 9.5 6 16
67 20HP5A0505 10 9 2 8 7 4 13
68 20HP5A0506 9 9 4 7.5 10 2 14

MID-I MID-II
SNO Roll No
Q1 Q2 Q3 Q4 Q5 Q6
1 19HP1A0566 5 9 8 0 2.5 5 10
2 19HP1A0567 10 10 9 10 10 9 19
3 19HP1A0568 9 10 9 8 9 5 17
4 19HP1A0569 9 10 7 10 5 7 16
5 19HP1A0570 9 10 0 10 5 1 12
6 19HP1A0571 8 10 1 6.5 4.5 7.5 13
7 19HP1A0572 2 5 0 2 1.5 3 5
8 19HP1A0573 8 9 8 9.5 5 10 17
9 19HP1A0574 10 10 9 10 9 8.5 19
10 19HP1A0575 9 10 0 0 4 0 8
11 19HP1A0577 10 9 8 10 9 10 19
12 19HP1A0578 9 10 7 10 10 9 18
13 19HP1A0579 3 10 9 9 0 10 14
14 19HP1A0580 9 5 3 9 7 9 14
15 19HP1A0581 9 3 1 4 8 6.5 11
16 19HP1A0582 10 3 0 10 8.5 3 12
17 19HP1A0583 10 10 9 10 5 10 18
18 19HP1A0584 10 10 9 10 9 9.5 19
19 19HP1A0585 9 10 9 8 4 3.5 15
20 19HP1A0586 10 8 5 10 4.5 10 16
21 19HP1A0587 4 5 2 4 2 6.5 8
22 19HP1A0588 10 10 9 10 5 9.5 18
23 19HP1A0589 10 8 2 10 10 0 13
24 19HP1A0590 10 10 9 10 9 9 19
25 19HP1A0591 9 8 7 5 0 7.5 12
26 19HP1A0592 10 5 9 10 1 4.5 13
27 19HP1A0593 10 10 10 10 10 9 20
28 19HP1A0594 10 10 9 8 7 7 17
29 19HP1A0596 10 10 9 10 10 9 19
30 19HP1A0597 9 7 4 8.5 9 7 15
31 19HP1A0598 8 7 0 1 1 0 6
32 19HP1A0599 9 10 0 7 5 10 14
33 19HP1A05A0 10 10 8 7 7 9 17
34 19HP1A05A1 6 6 5 10 6.5 5 13
35 19HP1A05A2 8 1 8 7 2.5 0 9
36 19HP1A05A3 10 10 10 10 9 9.5 20
37 19HP1A05A4 10 5 7 10 0 0 11
38 19HP1A05A5 2 3 0 4 3 0 4
39 19HP1A05A6 10 10 9 10 8 10 19
40 19HP1A05A7 10 9 0 10 10 0 13
41 19HP1A05A8 10 0 0 5 1 7 8
42 19HP1A05A9 10 10 6 9 10 7 17
43 19HP1A05B0 7 2 1 7 0 4 7
44 19HP1A05B1 10 10 10 9 6.5 10 19
45 19HP1A05B2 10 10 9 8 9 8 18
46 19HP1A05B3 8 8 0 5 3 5 10
47 19HP1A05B4 10 8 3 6 0 10 12
48 19HP1A05B5 10 10 5 8 4 8 15
49 19HP1A05B6 10 9 8 8 9 9 18
50 19HP1A05B7 10 10 10 10 9 9 19
51 19HP1A05B8 6 10 9 7.5 5 9 16
52 19HP1A05B9 9 1 0 4.5 4 1 7
53 19HP1A05C0 10 9 6 0 10 0 12
54 19HP1A05C1 10 5 1 0 5 0 7
55 19HP1A05C2 10 8 0 5.5 6 10 13
56 19HP1A05C3 9 6 1 5 7 5 11
57 19HP1A05C4 10 4 7 9 4 9 14
58 19HP1A05C5 9 1 0 3.5 1 3 6
59 19HP1A05C6 10 10 8 9 10 9 19
60 19HP1A05C7 10 10 9 5 4.5 9 16
61 20HP5A0507 10 0 3 6 4 9 11
62 20HP5A0508 10 0 6 6 5 10 12
63 20HP5A0509 9 4 0 7 4 2 9
64 20HP5A0510 9 3 1 0 6.5 4 8
65 20HP5A0511 10 10 3 9 10 8.5 17
66 20HP5A0512 10 10 2 9 10 9 17
67 20HP5A0513 10 4 1 7 6.5 5 11
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools

SUGGESTION:

How this course can be modified to improve the quality and enhance the learning process?
(Write your suggestions below)
-------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------
COURSE ASSESSMENT – STUDENT (Course–end-Survey)
Please answer all the questions in this form. Your response is very important for the continuous
quality improvement of this course. These responses will be treated as confidential.
A.Y: 2021-2022 Year: III B.Tech Semester: I Course Code: R1931051
Course Title: DATA WAREHOUSING AND DATA MINING
Indicate your rating for each of the indicators in the following table:
3: Good 2: Fair 1: Poor
Course Indicator Rating
outcome
1 Able to understand data warehouse concepts, architecture,
business analysis and tools
2 Able to understand data pre-processing and data visualization
techniques
3 Able to study algorithms for finding hidden and interesting
patterns in data
4 Able to understand and apply various classification techniques
using tools
5 Able to understand and apply various clustering techniques
using tools

SUGGESTION: