0% found this document useful (0 votes)

268 views10 pages

Data Mining - UOG (HH) - Final - F23-1

Past papers UOG

Uploaded by

chudarybushra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

268 views10 pages

Data Mining - UOG (HH) - Final - F23-1

Past papers UOG

Uploaded by

chudarybushra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIVERSITY OF GUJRAT

Hafiz Hayat Campus (Morning)

FINAL TERM EXAMINATIONS
FALL-2023

Course Code : IT-446 Course Title : Data Mining

Section A
Q2 Answer the following Questions.

i) What are over fitted models? Explain their effects on performance.

Ans : Definition: Overfitting occurs when a machine learning model learns the training data too well,
capturing noise or random fluctuations in the data instead of just the underlying patterns.

Causes:

▪ Too complex model architecture with too many parameters.

▪ Too many features relative to the amount of training data.
▪ Lack of regularization techniques to prevent overfitting.

Effects on Performance:
▪ Poor Generalization: Overfitted models may perform exceptionally well on the training data but
fail to generalize to new, unseen data.
▪ High Variance: Overfitting increases the variance of the model, making it sensitive to small
fluctuations in the training data.
▪ Loss of Interpretability: Capturing noise as meaningful patterns, leading to less interpretable
and useful models.
▪ Increased Error on Test Data: The model tends to perform poorly on new, unseen data because
it has essentially memorized the training set rather than learning the underlying patterns.
▪ Complex Models: Overfitting often results from overly complex models that capture noise
rather than the true underlying relationships in the data.
▪ Loss of Predictive Power: Overfitted models may not provide accurate predictions or
classifications for real-world scenarios due to their focus on training data specifics.
ii) Explain K-Fold-Cross Validation Technique with diagram.
K-fold cross-validation approach divides the input dataset into K groups of samples of equal sizes. These
samples are called folds. For each learning set, the prediction function uses k-1 folds, and the rest of the
folds are used for the test set. This approach is a very popular CV approach because it is easy to
understand, and the output is less biased than other methods.

▪ The steps for k-fold cross-validation are:

▪ Split the input dataset into K groups
▪ For each group:
▪ Take one group as the reserve or test data set.
▪ Use remaining groups as the training dataset
▪ Fit the model on the training set and evaluate the performance of the model using the
test set.

iii) Why Binning is used in Data Preprocessing?

Ans : Binning, or discretization, is used in data preprocessing for various purposes to improve the
quality and effectiveness of data analysis and machine learning models.
Quantization and Error Reduction: Binning reduces the impact of minor errors in data by grouping
values into intervals and assigning representative values, aiding in error reduction.

Non-Linearity and Model Performance: Introducing non-linearity through binning can improve model
performance, especially when transforming continuous variables into categorical features.

Overfitting Prevention: Binning helps prevent overfitting, providing a smoother representation of data,
which is particularly beneficial in small datasets.

Identification of Outliers and Missing Values: Binning can be used to identify outliers and missing values
in the dataset.

Categorical Transformation: It transforms continuous variables into categorical features, enhancing

interpretability and addressing non-linear relationships in the data.
iv) What is Clustering? Explain Hierarchical clustering with Example.

Ans : Definition: Clustering is a data analysis technique that involves grouping similar data points
together based on certain characteristics, aiming to discover patterns and structures within a dataset. It
is commonly used in machine learning, exploratory data analysis, and pattern recognition to reveal
inherent groupings within the data.

Hierarchical Clustering: (Type: Agglomerative (Bottom-Up)

Definition: Hierarchical clustering is an algorithm that builds a hierarchy of clusters. It starts with
individual data points and progressively merges or divides them.

Example:

Dataset: Consider customer data with features like spending habits and types of products bought.

Steps:

1. Start with Individual Points: Each customer is a separate cluster.

2. Calculate Similarity: Measure similarity based on features.
3. Merge Similar Clusters: Merge the two most similar clusters.
4. Repeat: Iterate until all points are in one cluster.
5. Result: A hierarchical tree (dendrogram) visually representing clusters.

Application: Identifying customer segments for tailored marketing strategies.

Section – B

Q. 2 Write Down about your Term project that clearly mentions the Objectives/Goal of the
Project, Feature Extraction/Addition/Deletion Techniques, Machine Learning Algorithms and
Experimental Results.
Ans : Title: Automated Speech Emotion Recognition

Objectives/Goals:

▪ Develop a system capable of accurately recognizing and classifying emotions from spoken
language.
▪ Improve human-computer interaction by enabling machines to understand and respond to
users' emotional states.
▪ Enhance applications such as customer service and virtual assistants with emotion-aware
functionalities.

Feature Extraction/Addition/Deletion Techniques:

▪ Utilize signal processing techniques to extract acoustic features such as pitch, intensity, and
formants from speech signals.
▪ Explore natural language processing (NLP) techniques for extracting linguistic features, including
sentiment-related words and tone.
▪ Investigate the addition of prosodic features like speech rate and pauses to capture emotional
nuances.
▪ Experiment with feature scaling and normalization for better model convergence.

Machine Learning Algorithms:

▪ Employ deep learning models, such as Convolutional Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs), for end-to-end emotion recognition from audio signals.
▪ Combine acoustic and linguistic features using ensemble methods like Stacking or Fusion
models.
▪ Implement transfer learning approaches to leverage pre-trained models on large speech
datasets.

Experimental Results:

▪ Evaluate the model on diverse datasets containing a variety of emotional expressions in speech.
▪ Measure accuracy, precision, recall, and F1 score to assess the model's performance across
different emotion categories.
▪ Conduct user studies to evaluate the system's effectiveness in real-world scenarios.
▪ Showcase the system's potential applications in improving user experience in virtual
environments, customer service applications, and other relevant domains.

OR
Title: Fraud Detection in Financial Transactions

Objectives/Goals:

▪ Develop a robust system for real-time detection of fraudulent activities in financial transactions.
▪ Enhance security measures and protect customers from unauthorized access and financial loss.
▪ Improve the efficiency of fraud detection systems to minimize false positives and negatives.

Feature Extraction/Addition/Deletion Techniques:

▪ Implement feature scaling and normalization to standardize numerical variables in transaction

data.
▪ Explore dimensionality reduction techniques such as Principal Component Analysis (PCA) to
handle high-dimensional data.
▪ Investigate the addition of derived features like transaction frequency, geographical location,
and user behavior patterns.
▪ Experiment with anomaly detection methods to identify irregularities in transaction patterns.

Machine Learning Algorithms:

▪ Utilize supervised learning algorithms, including Logistic Regression and Random Forests, for
binary classification of transactions into fraudulent and non-fraudulent categories.
▪ Implement unsupervised learning algorithms such as Isolation Forest and One-Class SVM for
anomaly detection in transaction data.
▪ Combine multiple models using ensemble techniques like Bagging or Boosting to improve
overall system performance.

Experimental Results:

▪ Evaluate the model on a large and diverse dataset of financial transactions to assess its accuracy
and efficiency.
▪ Measure metrics such as precision, recall, and F1 score to quantify the system's ability to detect
fraudulent activities.
▪ Analyze the system's performance in real-time scenarios to ensure timely and accurate fraud
detection.
▪ Showcase the reduction in financial losses and false alarms achieved through the implemented
fraud detection system.

Q. 3 With the help of confusion matrix

Predicted (YES) Predicted (NO)

Actual Yes 44 (TP) 36 (FN)

Actual NO 59 (FP) 101 (TN)

Find :

a) F1 Score
b) Accuracy
c) Precision

Solution:

Formulas Used to Solve this Question are Given Below (In Image):
From the Question we got the values below

▪ TP = 44
▪ FP = 59
▪ FN = 36
▪ TN = 101

Note : Must Watch this video to understand other related concepts regarding to Confusion Matrix

Link : https://www.youtube.com/watch?si=71QU5QS1dMLWqGEt&v=AyP85ocS-
8Y&feature=youtu.be
By Putting Values in Formulas we get the Answers:
Q.4 Use the computed conditional probabilities to predict the class label for a test sample
(A=1; B = 0; C=0) using the NAÏVE BASE

A B C Q
1 0 1 1
1 1 1 1
0 1 1 0
1 1 0 0
1 0 1 0
0 0 0 1
0 0 0 1
0 0 1 0

Ans : Hand Written Solution Below

eshon no

Count t instances sahere Q24

-Coundt no
ot instapces here Q=
(-ve class)()
Sstepalclate Rriex Reossblies
-Nunb o ingtances shee
Tota inetances

Pstep Conditiona Robablties

oinstances shere Ql
2

ATQ-o)2-o.5
3Step: fosteeio Probiities tos Test
Sample(A L, B-oao)

05XoSXo:5X o-2S
-PlQ-o|A-l,B-oczo) oo312.S)
Step No~malize rbablies
Ihe noxnai2tan is he Sum ot he
prababilities

=Q'o935 0.03125
Caleulate Actoal Proba bilites
we obtain he adual
pibabiËes ydiwidig- posteio
homalizohion Constant:
PQA,B.oCeo) o1375
r(QolAl,Bzo,Ce) o0312S}

pese aties xepxest hekhesd

test sanle esch
class given heobsexveda
A.L B=o,ceD

Concept Learning for AI Students
No ratings yet
Concept Learning for AI Students
13 pages
Data Reduction
No ratings yet
Data Reduction
22 pages
03 Data Mining Functionalities
No ratings yet
03 Data Mining Functionalities
16 pages
Bangladeshi Flower ID via ML Techniques
100% (1)
Bangladeshi Flower ID via ML Techniques
16 pages
Depth Limit Search Input
No ratings yet
Depth Limit Search Input
3 pages
Chapter 3 Class Diagram Solutions
No ratings yet
Chapter 3 Class Diagram Solutions
14 pages
CS3352 Foundations of Data Science Nov Dec 2022 Question Paper Download
No ratings yet
CS3352 Foundations of Data Science Nov Dec 2022 Question Paper Download
4 pages
Probability in Poker and Coin Flipping
67% (6)
Probability in Poker and Coin Flipping
14 pages
Version Space Problem: Japanese Economy Car
No ratings yet
Version Space Problem: Japanese Economy Car
4 pages
Understanding Version Spaces in ML
No ratings yet
Understanding Version Spaces in ML
26 pages
Unit - 1 (Introduction)
No ratings yet
Unit - 1 (Introduction)
103 pages
Assignment Questions
No ratings yet
Assignment Questions
1 page
Space and Time Trade Off
No ratings yet
Space and Time Trade Off
8 pages
Apriori Algorithm in Association Analysis
No ratings yet
Apriori Algorithm in Association Analysis
32 pages
DM Unit V
No ratings yet
DM Unit V
13 pages
Syllabus
No ratings yet
Syllabus
9 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
Probability-Based Expert Systems
No ratings yet
Probability-Based Expert Systems
64 pages
Module 2 Linear Filter
100% (1)
Module 2 Linear Filter
20 pages
Feature Extraction in Machine Learning
No ratings yet
Feature Extraction in Machine Learning
17 pages
Int. To Data Analytics and Cyber Security Syllabus
No ratings yet
Int. To Data Analytics and Cyber Security Syllabus
2 pages
UCS310 Latest 2025
No ratings yet
UCS310 Latest 2025
2 pages
Java Array Questions
No ratings yet
Java Array Questions
4 pages
Probability and Statistics Assignment Guide
No ratings yet
Probability and Statistics Assignment Guide
3 pages
Linear Programming - Sensitivity Analysis - Using Solver
No ratings yet
Linear Programming - Sensitivity Analysis - Using Solver
7 pages
Binary, Multi-Class & Multi-Label Classification
No ratings yet
Binary, Multi-Class & Multi-Label Classification
6 pages
SQL Joins, Grouping, and Subqueries Guide
No ratings yet
SQL Joins, Grouping, and Subqueries Guide
2 pages
Understanding Convolutional Neural Networks
No ratings yet
Understanding Convolutional Neural Networks
24 pages
Automatic Differentiation in Computing
No ratings yet
Automatic Differentiation in Computing
65 pages
ML CT Question Paper 2023 24
No ratings yet
ML CT Question Paper 2023 24
2 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Fuzzy Logic
No ratings yet
Fuzzy Logic
72 pages
Association Rule Miningsolvedexamples
No ratings yet
Association Rule Miningsolvedexamples
8 pages
University AI Exam Questions Guide
No ratings yet
University AI Exam Questions Guide
3 pages
DMDW Full Notes
No ratings yet
DMDW Full Notes
26 pages
Data Mining with WEKA: Regression & Attribute Selection
100% (1)
Data Mining with WEKA: Regression & Attribute Selection
10 pages
Data Analysis with WEKA Guide
No ratings yet
Data Analysis with WEKA Guide
21 pages
Single Layer & Multilayer Perceptron
No ratings yet
Single Layer & Multilayer Perceptron
14 pages
ML Assignment Week 4 2019 Nptel
No ratings yet
ML Assignment Week 4 2019 Nptel
30 pages
Important Data Mining Questions Guide
No ratings yet
Important Data Mining Questions Guide
3 pages
4 - Discretization and Concept Hierarchy
No ratings yet
4 - Discretization and Concept Hierarchy
27 pages
AdaBoost for Enhanced Classification
No ratings yet
AdaBoost for Enhanced Classification
20 pages
Hill Climbing Algorithm in AI
No ratings yet
Hill Climbing Algorithm in AI
5 pages
Cyclomatic Complexity Notes
No ratings yet
Cyclomatic Complexity Notes
3 pages
Tableau Lab Manual
No ratings yet
Tableau Lab Manual
6 pages
Graph Algorithms Explained
No ratings yet
Graph Algorithms Explained
20 pages
IRDM Assignment-I PDF
No ratings yet
IRDM Assignment-I PDF
4 pages
Review Question #5
33% (3)
Review Question #5
2 pages
Bayesian Decision Theory in Pattern Recognition
No ratings yet
Bayesian Decision Theory in Pattern Recognition
22 pages
Candidate Elimination Algorithm
No ratings yet
Candidate Elimination Algorithm
24 pages
AI Search Problem Solving Guide
No ratings yet
AI Search Problem Solving Guide
253 pages
Clustering with iris.arff Dataset
No ratings yet
Clustering with iris.arff Dataset
41 pages
AI Probability & Decision Theory
No ratings yet
AI Probability & Decision Theory
31 pages
SQL Basics: Null Values and Structure
No ratings yet
SQL Basics: Null Values and Structure
98 pages
DWDM 5 Unit Notes
No ratings yet
DWDM 5 Unit Notes
18 pages
Lecture Notes on Statistics Fundamentals
No ratings yet
Lecture Notes on Statistics Fundamentals
26 pages
Lesson 10
No ratings yet
Lesson 10
27 pages
Computer Network Error Detection Methods
No ratings yet
Computer Network Error Detection Methods
32 pages
Machine Learning Data Preprocessing Guide
No ratings yet
Machine Learning Data Preprocessing Guide
7 pages
DWM Quesans
No ratings yet
DWM Quesans
21 pages
AI Learning Path
No ratings yet
AI Learning Path
5 pages
Understanding Home Automation Systems
No ratings yet
Understanding Home Automation Systems
8 pages
7cse - 2025-26 Syllabus v7 - 25 Aug
No ratings yet
7cse - 2025-26 Syllabus v7 - 25 Aug
31 pages
Sample Paper
No ratings yet
Sample Paper
12 pages
Quantum Computing Review A Decade of Research
No ratings yet
Quantum Computing Review A Decade of Research
15 pages
Database Integration Issues and Solutions
No ratings yet
Database Integration Issues and Solutions
12 pages
Aishwarya Resume (Aug)
No ratings yet
Aishwarya Resume (Aug)
1 page
IoT Internal Assignment for CSE VIII Semester
No ratings yet
IoT Internal Assignment for CSE VIII Semester
1 page
III B.Tech I Sem DL II Mid QP 2023
No ratings yet
III B.Tech I Sem DL II Mid QP 2023
2 pages
Data Scientist Resume: AI & ML Skills
No ratings yet
Data Scientist Resume: AI & ML Skills
2 pages
Machine Learning Guide: Concepts & Types
No ratings yet
Machine Learning Guide: Concepts & Types
13 pages
Oral Communication in Science Courses
No ratings yet
Oral Communication in Science Courses
33 pages
Data Science 6th Sem Notes
No ratings yet
Data Science 6th Sem Notes
34 pages
SS1 Data Processing Workbook
No ratings yet
SS1 Data Processing Workbook
10 pages
Detecting Offensive Memes with CNN
No ratings yet
Detecting Offensive Memes with CNN
22 pages
Movie Recommendation System Project
No ratings yet
Movie Recommendation System Project
3 pages
Formal Letter or E-Mail
No ratings yet
Formal Letter or E-Mail
8 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
Final Year Project Topics in Computer Science
No ratings yet
Final Year Project Topics in Computer Science
4 pages
Introduction To DBMS
No ratings yet
Introduction To DBMS
104 pages
Digital Library
No ratings yet
Digital Library
13 pages
Quranic Theme Classification
No ratings yet
Quranic Theme Classification
19 pages
MapReduceBusinessDriver - NOSQL Case Studypdf
No ratings yet
MapReduceBusinessDriver - NOSQL Case Studypdf
3 pages
English Assignment: Letters Analysis
No ratings yet
English Assignment: Letters Analysis
5 pages
Career Opportunities in Ict
No ratings yet
Career Opportunities in Ict
39 pages
Hash Functions and Sorting Algorithms Explained
No ratings yet
Hash Functions and Sorting Algorithms Explained
5 pages
MBA MCQ: Management Information Systems
No ratings yet
MBA MCQ: Management Information Systems
12 pages
HBP TRL Assessment Guide Public
No ratings yet
HBP TRL Assessment Guide Public
17 pages
Predictive Analytics in Employee Churn
No ratings yet
Predictive Analytics in Employee Churn
11 pages
JNTUK Results31 Edited
No ratings yet
JNTUK Results31 Edited
2 pages