Course Outline
Course Outline
Spring 2025
Name: Ms. Sumaira Saeed
Email: [email protected]
Counselling hours: Tues/Thurs 2:30 to 3:30
Course Description
The course introduces students to fundamentals of data mining theory and algorithms. In addition to
building a strong mathematical foundation, the course puts heavy emphasis on analysis and mining of
actual data sets via popular data mining tools such as KNIME and Python. The list of covered topics
include classification (k-nearest neighborhood, classification tree, naïve Bayes, random forest),
regression, clustering (k-means, fuzzy c-mean, hierarchical clustering), association rules and text mining.
Feature selection, data cleaning, data transformation, model evaluation and data visualization are also
covered in sufficient details.
Course Objectives
• To excite students about the potential that resides in data and the value that data analytics can
add to business processes
• To impart skills related to data cleaning/wrangling, data transformation/preprocessing, and data
comprehension through statistical analysis
• To impart skills related to analytical (mathematical) data modeling
Learning Outcomes
Thorough knowledge about the science of data-driven decision making with respect to data
science and its relationship to solving core business problems, along with success stories
• Knowledge of data cleaning/wrangling in data science and practical skillset
• Knowledge on data transformation/ pre-processing in data science and practical skillset
• Practical skillset on extracting initial insights from data to facilitate data comprehension (through
hands-on activity)
• Theoretical mathematical knowledge about standard predictive modeling algorithms (supervised
learning).
• Practical skillset on how predictions can be generated from data.
Grading Scheme
Midterm- 30
Final Exam 40
Quiz 10
Assignments – 15
CP - 5
Course Outline
Week Topics
1 Course Overview, What is Data Mining and its Origin, Typical Data
Mining Tasks, Data Mining Applications/Examples, Data Mining vs.
OLAP, Statistics and Machine Learning
2 CRISP-DM Model , Data preparation, Data Cleaning, Introduction to
Decision Trees
3 Handling Continuous variables, Avoiding overfitting in Decision Trees,
Python Demo of DT
4 Variance-Bias Tradeoff, Receiver Evaluation Metrices
5 Lazy Learner vs. Eager Learner, k-Nearest Neighbor: Pros and Cons,
Hold-out Method vs Cross-Validation
6 ROC curve, Feature Selection and Correlation Analysis through
Hypothesis Testing, Scatterplots
7 Naïve Bayes Classifier, Feature Selection: Filter vs Wrappers, Forward
and Backward Selection
8 Ensemble Methods: Bagging vs Boosting, Working of Random Forest
and AdaBoost
9 Stacking, Revisiting Variance-Bias Tradeoff, Feature Reduction using
Principal Component Analysis (PCA) Python Code
10 Multiple Linear Regression, Regression Diagnostics and Evaluation
11 kNN Regression, Regression Tree and Tree Ensemble Regression
12 Clustering: Agglomerative vs Partitional
13 Association Rule Mining
14 Project Presentations
Reference Books
Principles of Data Mining by Max Bremer (2020)
Data Mining – Concepts, Models, Methods, and Algorithms by Mehmed Kantardzic (2020)
Data Mining for Business Analytics – Concepts, Techniques and Applications in Python (2020)