Cab112:Introduction To Data Science: Session 2024-25 Page:1/2
Cab112:Introduction To Data Science: Session 2024-25 Page:1/2
CO2 :: perform EDA techniques to understand data distributions, relationships, and patterns
among problems and solutions.
CO3 :: execute machine learning algorithms for regression, classification, and predictive
modeling.
CO4 :: develop a comprehensive data science project from start to finish.
Unit I
Introduction to Data Science and Python for Data Analysis: Foundations of Data Science :
Overview of data science and its importance, The data science process and lifecycle
Python Libraries for Data Science : NumPy for numerical computations, Pandas for data
manipulation and analysis, Matplotlib and Seaborn for data visualization
Data Collection and Cleaning : Data collection techniques Handling missing data and outliers, Data
transformation and normalization
Unit II
Exploratory Data Analysis and Data Visualization: Exploratory Data Analysis (EDA) :
Descriptive statistics Data visualization techniques, Correlation and covariance
Advanced Data Visualization : Advanced plotting with Matplotlib and Seaborn, Interactive
visualizations with Plotly and Dash
Unit III
Supervised Learning Algorithms: Regression Algorithms : Linear regression, Polynomial
regression, Ridge and Lasso regression
Classification Algorithms : Logistic regression, k-Nearest Neighbors (k-NN), Decision Trees and
Random Forests, Support Vector Machines (SVM)
Model Evaluation : Metrics for regression: MAE, MSE, RMSE, R², Metrics for classification: accuracy,
precision, recall, F1-score, ROC-AUC, Cross-validation techniques
Unit IV
Unsupervised Learning Algorithms: Clustering Algorithms : k-Means clustering, Hierarchical
clustering, DBSCAN
Dimensionality Reduction : Principal Component Analysis (PCA) , t-Distributed Stochastic Neighbor
Embedding (t-SNE), Apriori algorithm, Eclat algorithm
Unit V
Advanced Topics in Machine Learning: Neural Networks and Deep Learning : Introduction to
neural networks, Deep learning basics with TensorFlow and Keras, Convolutional Neural Networks
(CNNs) ? Recurrent Neural Networks (RNNs)
Unit VI
Natural Language Processing : Text preprocessing and normalization, Tokenization and
stemming/lemmatization, Bag-of-Words and TF-IDF, Sentiment analysis and text classification
Practical's
• Perform complex data manipulations such as merging, joining, and group operations on a real-world
dataset.
• Implement matrix operations and linear algebraic computations.
• Visualize data distributions using histograms, box plots, and scatter plots.
• Normalize features using Min-Max scaling and standardize features using Z-score scaling.
• Use advanced imputation techniques (e.g., KNN imputation) and evaluate their impact on the
dataset.
• Implement linear regression, polynomial regression, and support vector regression (SVR).
• Evaluate regression models using metrics such as MAE, MSE, and RMSE.
• Implement Random Forest, AdaBoost, and Gradient Boosting models on a classification problem.
• Build and evaluate a convolutional neural network (CNN) for image classification.
• Implement sentiment analysis on social media data using machine learning algorithms.
Text Books:
1. INTRODUCTION TO DATA SCIENCE:PRACTICAL APPROACH WITH R AND PYTHON by B.
UMA MAHESWARI, R. SUJATHA, NA
2. MACHINE LEARNING USING PYTHON by MANARANJAN PRADHAN ,U DINESH KUMAR, NA
References:
1. DATA SCIENCE AND MACHINE LEARNING USING PYTHON by REEMA THAREJA, MC GRAW
HILL