21uad404-Dwdm April 2024 QB
21uad404-Dwdm April 2024 QB
CO.
Course Outcomes Taxonomy
No
Explain the basic concepts of Data warehousing, Data mining techniques and its
CO1 Understand
tools.
Apply the knowledge of OLAP models & schema, and implement various DM
CO2 algorithms in an optimized way to solve the complex engineering problems using Apply
various tools.
CO3 Analyze how data warehousing and data mining maps to current industry. Analyze
Create a model for various real time data mining applications using the concepts of
CO4 Create
Schema, DM algorithms & techniques to solve the complex engineering problems.
Evaluate various storage and DM evaluation algorithm plans to optimize
CO5 Evaluate
implementation cost.
Work individually or in teams and demonstrate the solutions to the given exercises
CO6 Value
through presentation.
PART A ( 2 Marks)
UNIT - I (Minimum 8 Questions)
1. Compare a data warehouse from a database? How are they similar? CO1 U
2. Define data warehousing. CO1 U
3. State the issues for data warehousing and data mining? CO1 U
4. List down the major tasks involved in data cleaning process CO1 U
5. Apply the OLAP Operations for the dimension location (Tamilnadu, Karnataka, and CO2 APP
Kerala)
6. Apply drill-down operation on time Q1 and Q2for the following multidimensional CO2 APP
cube.
2
7. Apply snow flake schema for any of the real time application with proper description. CO2 APP
8. Apply the Galaxy schema for a college library and draw the proper schema for the CO2 APP
above.
UNIT - II (Minimum 8 Questions)
1. Illustrate the OLAP operations and explain the best operation for real time CO1 U
implementation.
2. Differentiate fact and dimension table. CO1 U
3. What is snow flake schema? CO1 U
4. Differentiate between OLTP vs. OLAP CO1 U
5. Apply any one of the multi-dimensional data tools for an engineering problem with CO2 APP
proper description.
6. Apply the conceptual hierarchy for our department with necessary description CO2 APP
7. Apply the HOLAP process for an engineering problem with proper description. CO2 APP
8. Draw the OLAP operations for any application. CO2 APP
UNIT - III (Minimum 8 Questions)
1. Define data mining. CO1 U
2. What are the process of knowledge discovery in databases (KDD) CO1 U
3. Define the data mining task. CO1 U
4. Specify the major tasks involved in data preprocessing? CO1 U
5. List down the major tasks involved in data cleaning process. CO2 APP
6. State why data preprocessing is an important issue for data warehousing and data CO2 APP
mining?
7. State why concept hierarchies are useful in data mining. CO2 APP
8. What is pattern evaluation? Explain with any data set? CO2 APP
UNIT - IV (Minimum 8 Questions)
1. What is Apriori algorithm? CO1 U
2. Define association rule mining. CO1 U
3. What is meant by Lazy Learner? CO1 U
4. What is MBA in Data mining (candidate generation technique)? CO1 U
5. What are the things suffering from the performance of Apriori candidate generation CO2 APP
3
technique?
6. Give few techniques to improve the efficiency of Apriori algorithm. CO2 APP
7. List the two interesting measures of an association rule. CO2 APP
8. Define constraint-Based Association Mining CO2 APP
UNIT - V (Minimum 8 Questions)
1. Define k-means clustering. CO1 U
2. State the difference between classification and clustering? CO1 U
3. What are the requirements of cluster analysis? CO1 U
4. List the types of data used in cluster analysis. CO1 U
5. Apply Manhattan distance with your own dataset. CO2 APP
6. Classify hierarchical clustering methods. CO2 APP
7. Differentiate agglomerative and divisive approaches of clustering. CO2 APP
8. Compare CLARA and CLARANS. CO2 APP
PART – B
UNIT - I
1. Suppose that a data warehouse consists of the three dimensions time, doctor, and (16)
patient, and the two measures count and charge, where charge is the fee that a
doctor charges a patient for a visit.
(a) Enumerate three classes of schemas that are popularly used for modeling data CO2
App
warehouses. 8 Marks
(b) Draw a schema diagram for the above data warehouse using one of the schema
classes listed in (a). 8 Marks
2. Construct a data warehouse for a University / Hospital / Enterprise using Galaxy (16)
CO2
schemas with necessary description. App
3. Identify the appropriate data warehousing schema techniques for the result analysis (16)
CO2
of our SIT Exam cell. App
4. Identify the appropriate data warehousing techniques for an engineering problem. CO2 (16)
App
5. Explain in detail about the 3-Tier Data Warehouse architecture. CO1 U (16)
6. Explain in detail about the Data Warehouse Components with neat diagram. CO1 U (16)
7. Design a data warehouse for a regional weather bureau. The weather bureau has CO2 (16)
App
about 1,000 probes, which are scattered throughout various land and ocean
locations in the region to collect basic weather data, including air pressure,
temperature, and precipitation at each hour. All data are sent to the central station,
which has collected such data for over 10 years. Your design should facilitate
4
efficient querying and on-line Analytical processing, and derive general weather
patterns in multidimensional space.
8. Suppose your task as a software engineer at Big-University is to design a data (16)
mining system to examine their university course database, which contains the
following information: the name, address, and status (e.g., undergraduate or
CO2
graduate) of each student, the courses taken, and their cumulative grade point App
average (GPA). Describe the architecture you would choose. What is the purpose of
each component of this architecture?
9. Describe the architecture you would choose for our SIT Exam cell. What is the (16)
CO2
purpose of each component of this architecture? App
10. Apply the 3-tier data warehouse architecture with a real time example. CO2 (16)
App
11. Explain about the data warehouse schema and its types in detail. CO1 U (16)
12. i)Define Concept Hierarchies in detail with neat diagram. 6marks (16)
CO1 U
ii) Define Schemas and types with neat diagram. 10 marks
UNIT - II
1. Define in detail about the OLAP Operations in Multi-dimensional Data Model. CO1 U (16)
2. Define Concept Hierarchies in detail with diagrams. CO1 U (16)
3. Apply the OLAP Operational tools / Multi-dimensional data tools for a University (16)
CO2
or Hospital or Enterprise with necessary description. App
4. i) List the Difference between OLAP and OLTP (8) (16)
CO1 U
ii) What are the four views regarding the design of a data warehouse? (8)
5. i) Differentiate HOLAP vs MOLAP vs ROLAP (10) (16)
CO1
ii) With relevant examples discuss the different OLAP operations. (6) U
6. Apply the Data warehousing multi-dimensional model for a University or Hospital (16)
CO2
or Enterprise with necessary description. App
7. Define the types of OLAP servers and explain their operations in detail. CO1 (16)
U
8. Suppose that a data warehouse for Big University consists of the following four CO2 (16)
App
dimensions: student, course, semester, and instructor, and two measures count and
avg_grade. When at the lowest conceptual level (e.g., for a given student, course,
semester, and instructor combination), the avg_grade measure stores the actual
course grade of the student. At higher conceptual levels, avg_grade stores the
average grade for the given combination.
(a)Draw a three multi-dimensional data model diagram for the data warehouse.
5
8 Marks
(b)Starting with the base cuboid [student, course, semester, instructor], what
specific OLAP operations (e.g., roll-up from semester to year) should one perform
in order to list the average grade of CS courses for each Big University student.
8 Marks
9. Identify the appropriate OLAP operations for our college result analysis process. CO2 (16)
App
10. Illustrate the different schemas for Multidimensional databases CO1 (16)
U
11. i) Differentiate Star schema vs Snow flake schema vs Galaxy schema (10) (16)
CO1
ii) With relevant examples discuss the different schema operations. (6) U
12. Define OLAP Servers and OLAP operations in detail. CO1 (16)
U
UNIT - III
1. Define the data mining functionalities and types of data mining in detail. CO1 U (16)
2. i)Explain in detail about Data mining functionalities.(8) (16)
CO1 U
ii) Explain in detail about Interestingness of patterns in data mining(8)
3. Discuss and define about the Integration of a Data Mining system with a Data (16)
CO1 U
Warehouse.
4. i) Describe the various patterns in classification algorithm(8) (16)
CO1 U
ii) Describe the data pre-processing techniques in data mining.(8)
5. i) Apply the data pre-processing techniques in data mining for your own dataset. (16)
(8) CO2
App
ii) Identify the various issues in data mining.(8)
6. Apply any of the classification techniques for the Data Mining systems with your (16)
CO2
own dataset. App
7. Define Data preprocessing in detail. CO1 U (16)
8. i)Describe the various descriptive statistical measures for data mining (8) (16)
ii) Describe the various data transformation techniques used as data pre-processing CO1 U
techniques .(8)
9. Discuss and define the Classification in DataMining Systems and explain the (16)
CO1 U
various types of classification algorithm.
10. Define Data mining with associated algorithms in detail. CO1 U (16)
11. Discuss in detail about the Integration of Data warehousing or Database in Data CO1 U (16)
Mining.
12. Define issues in Data mining and Data preprocessing in data mining. CO1 U (16)
UNIT - IV
6
1. Make use of Apriori algorithm to find the support and confidence from the (16)
following transaction table.
CO2
App
2. Apply the Apriori algorithm for discovering frequent item sets for mining (16)
association rules of the following table. Use 3 for the minimum support value and
Confidence of 50%. Illustrate each step of the Apriori algorithm.
Trans ID Items Purchased
101 milk, bread,eggs
102 milk, juice
103 juice,b u t t e r CO2
App
104 milk,bread,eggs
105 coffee,eggs
106 coffee
107 coffee , juice
108 milk, bread,cookies,eggs
109 cookies, butter
110 milk , bread
3. Write the difference between the various kinds of association rules and explain (16)
CO1 U
them with respective diagram.
4. Apply FP growth for discovering frequent item sets for mining association rules of CO2 (16)
App
the following table. Min.Sup = 2, Min Conf=50%.
Trans ID Items Purchased
101 milk, bread,eggs
102 milk, juice
103 juice,b u t t e r
104 milk,bread,eggs
105 coffee,eggs
106 coffee
107 coffee , juice
108 milk, bread,cookies,eggs
109 cookies, butter
110 milk , bread
7
5. i) Explain how the Constraint based association are trained to perform association. (16)
(8) CO1 U
ii) What is Correlation? Explain with Correlation Analysis. (8)
6. Define association rule mining in detail with neat diagram. CO1 U (16)
7. i) Explain multi-level association rules from transaction database. (12) (16)
ii)Write the algorithm for mining frequent item sets without candidate generation. CO1 U
(4)
8. Consider a fictional dataset that describes the weather conditions for playing a game (16)
of golf. Given the weather conditions, each tuple classifies the conditions as fit
(“Yes”) or unfit (“No”) for playing golf. Here is a tabular representation of our
dataset.
Play
Outlook Temperature Humidity Windy
Golf
1 Rainy Hot High False No
2 Rainy Hot High True No
3 Overcast Hot High False Yes CO2
4 Sunny Mild High False Yes App
5 Sunny Cool Normal False Yes
6 Sunny Cool Normal True No
7 Overcast Cool Normal True Yes
8 Rainy Mild High False No
9 Rainy Cool Normal False Yes
10 Sunny Mild Normal False Yes
11 Rainy Mild Normal True Yes
12 Overcast Mild High True Yes
13 Overcast Hot Normal False Yes
14 Sunny Mild High True No
9. Suppose we have a dataset of weather conditions and corresponding target variable CO2 (16)
App
"Play". So using this dataset we need to decide that whether we should play or not
on a particular day according to the weather conditions.
Problem: If the weather is sunny, then the Player should play or not?
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
8
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes
10. Apply Naive Bayes algorithm for the below data set. (16)
Problem: If the color is Red, then the item is Stolen or not?
CO2
App
11. i) Explain the different types of Classification rules available .(8) (16)
ii) Who is Lazy Learner? Explain with eager learners and explain with an CO1 U
example. (8)
12. i) Explain how the Bayesian belief networks are trained to perform (16)
classification.(8) CO1 U
ii) What is classification? Explain with Back propagation. (8)
UNIT - V
1. i) Explain the different types of data used in cluster analysis. (10) (16)
CO1 U
ii) Discuss the use of outlier analysis (6)
2. What is grid based clustering? With an example explain an algorithm for grid (16)
CO1 U
based clustering.
3. Choose a dataset of your own and analyze the performance using various clustering (16)
CO2
techniques. App
4. Let's say we want to cluster a group of 20 individuals between the ages of 20 and (16)
40. We have collected data on their ages, which are as follows:
25, 22, 28, 36, 32, 23, 27, 30, 31, 29, 33, 24, 26, 34, 37, 38, 21, 35, 39, 40 CO2
App
Our goal is to divide these individuals into two clusters based on their age using the
k-means algorithm.
5. Consider five points{x1,x2,x3,x4,x5} with the following co-ordinates as a two CO2 (16)
App
dimensional sample for clustering:
x1=(0,2), x2=(1,0), x3=(2,1), x4=(4,1) and x5=(5,3). Illustrate the k-means
algorithm on the above data set. The required number of cluster is two, &
9
initially clusters are formed from random distribution of samples: c1={x1, x2,
x4} and c2= {x3, x5}. Compare the cluster results with the K-mediods.
6. Explain in detail about the method involved in hierarchical clustering and Write (16)
CO1-
the difference between the partitioning method and Hierarchical method U
7. Compare the hierarchical clustering method with the density based clustering (16)
CO2
techniques App
8. With relevant example discuss constraint based cluster analysis CO1 U (16)
9. i) Select the suitable example to compare and analyze the systematic way of (16)
implementing agglomerative and Divisive hierarchical clustering. (10) CO2
App
ii) Compare and contrast the CLARA and CLARANS. (6)
10. Explain with an example about the partitioning based methods with neat (16)
CO1 U
diagram.
11. Apply k-means clustering algorithm on a dataset of 10 two-dimensional points: (16)
(3, 5), (4, 4), (3, 4), (4, 5), (6, 8), (7, 9), (6, 9), (8, 8), (10, 10), (9, 9) CO2
App
Where k=2.
12. Explain in detail about the centroid based techniques (k-means) and object (16)
CO1 U
based techniques (k-mediods) with relevant example.
10