MDI3002 Foundations of Data Science L T P J C
3 0 0 0 3
Pre-requisite NIL Syllabus version
1.0
Course Objectives:
1. To provide fundamental knowledge on data science and to understand the role of
statistics and optimization to perform mathematical operation in the field of data
science.
2. To understand the process of handling heterogeneous data and visualize them for
better understanding.
3. To gain the fundamental knowledge on various open source data science tools and
understand their process of applications to solve various industrial problems.
Course Outcome:
1. Ability to obtain fundamental knowledge on data science.
2. Demonstrate proficiency in statistical analysis of data.
3. Develop mathematical knowledge and study various optimization techniques to
perform data science operations.
4. Handle various types of data and visualize them using through programming for
knowledge representation.
5. Demonstrate numerous open source data science tools to solve real-world problems
through industrial case studies.
Student Learning Outcomes (SLO): 1,5,14
Module:1 Basics of Data Science 5 hours
Introduction; Typology of problems; Importance of linear algebra, statistics and optimization
from a data science perspective; Structured thinking for solving data science problems,
Structured and unstructured data
Module:2 Statistical Foundations 7 hours
Descriptive statistics, Statistical Features, summarizing the data, outlier analysis, Understanding
distributions and plots, Univariate statistical plots and usage, Bivariate and multivariate
statistics, Dimensionality Reduction, Over and Under Sampling, Bayesian Statistics, Statistical
Modeling for data analysis
Module:3 Algorithmic Foundations 8 hours
Linear algebra Matrices and their properties (determinants, traces, rank, nullity, etc.); Eigenvalues
and eigenvectors; Matrix factorizations; Inner products; Distance measures; Projections; Notion
of hyperplanes; half-planes, elementary spectral graph theory. Sampling and VC-dimension -
Random walks and graph sampling, MCMC algorithms, learning, linear and non-linear
separators, PAC learning
Module:4 Optimization 7 hours
Unconstrained optimization; Necessary and sufficiency conditions for optima; Gradient descent
methods; Constrained optimization, KKT conditions; Introduction to non-gradient techniques;
Introduction to least squares optimization
Module:5 Programming Foundation and Exploratory Data Analysis 6 hours
Introduction to Python Programming, Types, Expressions and Variables, String Operations,
selection, iteration, Data Structures- Strings, Regular Expression, List and Tuples, Dictionaries,
Sets; Exploratory Data Analysis (EDA) - Definition, Motivation, Steps in data exploration, The
basic datatypes, Data type Portability, Basic Tools of EDA, Data Analytics Life cycle, Discovery
Module:6 Data Handling and Visualization 6 hours
Data Acquisition, Data Pre-processing and Preparation, Data Quality and Transformation,
Proceedings of the 61st Meeting of the Academic Council [18.02.2021] 292
Handling Text Data; Introduction to data visualization, Visualization workflow: describing data
visualization workflow, Visualization Periodic Table; Data Abstraction -Analysis: Four Levels
for Validation- Task Abstraction - Analysis: Four Levels for Validation Data Representation:
chart types: categorical, hierarchical, relational, temporal & spatial
Module:7 Data Science Tools and Techniques 4 hours
Overview and Demonstration of Open source tools such as R, Octave, Scilab. Python libraries:
SciPy and sci-kitLearn, PyBrain, Pylearn2; Weka.
Module:8 Recent Trends 2 hours
Total Lecture hours 45 hours
Text Books
1. R. V. Hogg, J. W. McKean and A. Craig, Introduction to Mathematical Statistics, 8th Ed.,
Pearson Education India, 2019.
2. Avrim Blum, John Hopcroft, Ravindran Kannan, “Foundations of Data Science”,
Cambridge University Press, 2020.
Reference Books
1 Ani Adhikari and John DeNero, „Computational and Inferential Thinking: The Foundations
of Data Science‟ , GitBook, 2019.
2 Cathy O‟Neil and Rachel Schutt, „Doing Data Science: Straight Talk from the
Frontline‟, O‟Reilly Media, 2013.
3. Hossein Pishro-Nik, “Introduction to Probability, Statistics, and Random Processes”,
Kappa Research, LLC, 2014.
Mode of Evaluation: CAT / Assignment / Quiz / FAT / Project / Seminar
Recommended by Board of Studies 11-02-2021
Approved by Academic Council No. 61 Date 18-02-2021
Proceedings of the 61st Meeting of the Academic Council [18.02.2021] 293