Data Analytics
Certification Program
Detailed Syllabus
Preparatory Session Module 0 (08 hours)
Preparatory Session Fundamentals of programming
A brief introduction to tools related to Types of code editors in python
data Introduction to Anaconda & Jupyter
Learn about particular real-time notebook
projects and Capstone projects Flavors of python
Data and its impact on career Introduction to Git, GitHub
opportunities Python Fundamentals
Fundamental relevance of projects Source code vs Byte code vs Machine
using data code
Role of data in businesses Compiler & Interpreter
Significance of data in decision- Memory Management in Python
making
Scope of data in research and
development
Utilizing data, to enhance industrial
operations and management
Data in performance evaluation
Data in customer segmentation
Fundamentals of Statistics
Mean, Median, Mode
Standard Deviation, Average.
Probability, permutations, and
combinations
Introduction to Linear Algebra
TERM 1
Program Syllabus
Python Programming Module 1 (50 hours)
Programming Basics & Environment Python Programming Overview
Setup Python Overview
Installing Anaconda, Anaconda Basics Python 2.7 vs Python 3
and Introduction Writing your First Python Program
Get familiar with version control, Git Lines and Indentation, Python
and GitHub. Identifiers
Basic Github Commands. Various Operators and Operators
Introduction to Jupyter Notebook Precedence
environment. Basics Jupyter notebook Getting input from User, Comments,
Commands. Multi line Comments
Programming language basics
Python Data Types
Strings, Decisions & Loop Control List, Tuples, Dictionaries
Python Lists, Tuples, Dictionaries
Working With Numbers, Booleans
Accessing Values, Basic Operations
and Strings, String types and
Indexing, Slicing, and Matrixes
formatting, String operations
Built-in Functions & Methods
Simple if Statement, if-else Statement
Exercises on List, Tuples And
if-elif Statement.
Dictionary
Introduction to while Loops, for
Loops, Using continue and break
Class Hands-on:
6 programs/coding exercise on string, Functions And Modules
loop and conditions in classroom Anonymous Functions - Lambda
Using Built-In Modules, User-Defined
Modules, Module Namespaces,
Functions And Modules Iterators And Generators
Introduction To Functions Class Hands-on:
Defining & Calling Functions 8+ Programs to be covered in class of
Functions With Multiple Arguments functions, Lambda, modules, Generators
and Packages.
TERM 1
Program Syllabus
Python Programming Module 1 (50 hours)
File I/O An d Exceptional Handling and Data Analysis Using Numpy
Regular Expression Introduction to Numpy. Array
Opening and Closing Files Creation, Printing Arrays, Basic
open Function, file Object Attributes Operation - Indexing, Slicing and
close() Method, Read, write, seek. Iterating, Shape Manipulation -
Exception Handling, try-finally Clause Changing shape, stacking and
Raising an Exceptions, User-Defined splitting of array
Exceptions Vector stacking, Broadcasting with
Regular Expression- Search and Numpy, Numpy for Statistical
Replace Operation
Regular Expression Modifiers
Regular Expression Patterns
Assignment 1 (Week 2):
Class hands-on :
10 Coding exercises on Python
10+ Programs to be covered in class
Basics - Variables, Operators,
from File IO, Reg-ex and exception
Strings, Loops, Control Statement
handling.
Assignment 2 (Week 3):
10 Python programs and practice
set on List, Tuples, Dictionaries &
Data Analysis Using Pandas Matrices operations
Pandas : Introduction to Pandas Assignment 3 (Week 4):
Importing data into Python 10 Coding exercises on Functions,
Pandas Data Frames, Indexing Data Lambda, Input-Output, File and
Frames ,Basic Operations With Data Regular Expression
frame, Renaming Columns,
Subsetting and filtering a data frame.
TERM 1
Program Syllabus
Python Programming Module - 1 (50 hours)
Data Visualization using Matplotlib Data Visualization using Seaborn
Matplotlib: Introduction, plot(), Seaborn: Intro to Seaborn And
Controlling Line Properties, Subplot Visualizing statistical relationships ,
with Functional Method, Multiple Plot, Import and Prepare data. Plotting
Working with Multiple Figures, with categorical data and Visualizing
Histograms linear relationships.
Seaborn Exercise
Case Study
3 Case Study on Numpy, Pandas, Matplotlib
1 Case Study on Pandas And Seaborn
Assessment Test in Python :
2 hour of Assesment Test in Python (
Coding & Objective Questions )
Real time Use cases in Python to be Covered in Class with 5 assignments
TERM 2
Program Syllabus
Statistics Module - 1 (30 hours)
Fundamentals of Math and Probability All about Population & Sample
Probability distributed function & Population vs Sample, Sample Size
cumulative distribution function. Simple Random Sampling, Systematic
Conditional Probability, Baye’s Sampling, Cluster Sampling, Stratified
Theorem Sampling, Convenience Sampling,
Problem solving for probability Quota Sampling, Snowball Sampling
assignments and Judgement Sampling
Random Experiments, Mutually
Exclusive Events, Joint Events,
Dependent & Independent Events
Descriptive Statistics
Measures of Central Tendency –
Mean, Median and Mode
Introduction to Statistics, Statistical
Measures of Dispersion – Standard
Thinking
Deviation, Variance, Range, IQR (Inter-
Variable and its types Quartile Range)
Quantitative, Categorical, Discrete, Measure of Symmetricity/ Shape –
Continuous, Skewness and Kurtosis
*all with examples
Five Point Summary and Box Plot
Outliers, Causes of Outliers, How to Inferential Statistics
treat Outliers, I-QR Method and Z-
Characteristics of Z-distribution and
Score Method
T-Distribution.
Type of test and rejection region.
Type of errors in Hypothesis Testing
Inferential Statistics
Central Limit Theorem
Point estimate and Interval estimate
Creating confidence interval for
population parameter
TERM 2
Program Syllabus
Statistics Module - 1 (30 hours)
Hypothesis Testing Linear Algebra
Type of test and Rejection Region Dot Product, Projecting Point on Axis.
Type o errors-Type 1 Errors, Type 2 Matrices in Python, Element Indexing,
Errors. P value method, Z score Square Matrix, Triangular Matrix,
Method. The Chi-Square Test of Diagonal Matrix, Identity Matrix,
Independence. Addition of Matrices, Scalar
Regression. Factorial Analysis of Multiplication, Matrix Multiplication,
Variance. Pearson Correlation Matrix Transpose, Determinant, Trace
Coefficients in Depth. Statistical T-Test, Analysis of variance (ANOVA),
Significance and Analysis of Covariance (ANCOVA)
Null and Alternative Hypothesis One- Regression analysis in ANOVA
tailed and Two-tailed Tests, Critical Class Hands-on:
Value, Rejection region, Inference Problem solving for C.L.T Problem
based on Critical Value solving Hypothesis Testing Problem
Binomial Distribution: Assumptions solving for T-test, Z-score test Case
of Binomial Distribution, Normal study and model run for ANOVA,
Distribution, Properties of Normal ANCOVA
Distribution, Z table, Empirical Rule of
Normal Distribution & Central Limit
Theorem and its Applications
Data Processing & Exploratory Data
Analysis
What is Data Wrangling
Data Pre-processing and cleaning?
How to Restructure the data?
What is Data Integration and
Transformation
TERM 2
Program Syllabus
Statistics Module - 1 (30 hours)
EDA
Finding and Dealing with Missing Values.
What are Outliers?
Using Z-scores to Find Outliers.
Bivariate Analysis, Scatter Plots and Heatmaps.
Introduction to Multivariate Analysis
Note: Problem-Solving Techniques and Case Studies using Statistics will be covered
in class from week 2
Statistics Assignments : Total 4 practice set and Assignments from Statistics
TERM 2
Program Syllabus
Machine Learning Module - 2 (40 hours)
Machine Learning Introduction Data Preprocessing
Definition, Examples, Importance of Types of Missing values (MCAR, MAR,
Machine Learning MNAR) , Methods to handle missing
Definition of ML Elements: Algorithm, values
Model, Predictor Variable, Response Outliers, Methods to handle outliers:
Variable, Training - Test Split, Steps in IQR Method, Z Method
Machine Learning, Feature Scaling: Definition , Methods:
ML Models Type: Supervised Absolute Maximum Scaling, Min-Max
Learning, Unsupervised Learning and Scaler , Normalization,
Reinforcement Learning Standardization, Robust Scaling
Data Preprocessing Logistic Regression Model
Encoding the data: Definition, Definition. Why is it called the
Methods: OneHot Encoding, Mean “Regression model”?
Encoding, Label Encoding, Target Sigmoid Function, Transformation &
Guided Ordinal Encoding Graph of Sigmoid Function
K Nearest Neighbours Model
Evaluation Metrics for Classification
model Definition, Steps in KNN Model, Types
of Distance: Manhattan Distance,
Confusion Matrix, Accuracy,
Euclidean Distance, ‘Lazy Learner
Misclassification, TPR, FPR, TNR,
Model’.
Precision, Recall, F1 Score, ROC Curve,
Confusion Matrix of Multi Class
and AUC. Using Python library Sklearn
Classification
to create the Logistic Regression
Using Python library Sklearn to create
Model and evaluate the model
the K Nearest Neighbours Model and
created
evaluate the model
TERM 2
Program Syllabus
Machine Learning Module - 2 (40 hours)
Decision Tree Model Random Forest Model
Definition, Basic Terminologies, Tree Ensemble Techniques:
Splitting Constraints, Splitting Bagging/bootstrapping & Boosting.
Algorithms: Definition of Random Forest, OOB
CART, C4.5, ID3, CHAID Score
Splitting Methods: K-Fold Cross-Validation
GINI, Entropy, Chi-Square, and
Reduction in Variance
Using Python library Sklearn to create
the Decision Tree Model and evaluate Naive Baye’s Model
the model created Definition, Advantages, Baye’s
Theorem Applicability, Disadvantages
of Naive Baye’s Model, Laplace’s
Correction, Types of Classifiers:
Hyperparameter Tuning
Gaussian, Multinomial and Bernoulli
GridSearchCV, Variable Importance. Using Python library Sklearn to create
Using Python library Sklearn to create the Naive Baye’s Model and evaluate
the Random Forest Model and the model created
evaluate the model created.
Use cases
Case Study
Business Case Study for Kart Model
Business Case Study for Random Forest
Business Case Study for SVM
To classify an email as spam or not spam using logistic Regression.
Application of Linear Regression for Housing Price Prediction
TERM 2
Program Syllabus
Machine Learning Module - 2 (40 hours)
K Means and Hierarchical Clustering Hierarchical Clustering
Definition of Clustering, Use cases of Dendrogram, Agglomerative
Clustering Clustering, Divisive Clustering,
K Means Clustering Algorithm, Comparison of K Means Clustering
Assumptions of K Means Clustering and Hierarchical Clustering
Sum of Squares Curve or Elbow Curve Using Python library Sklearn to create
and evaluate the clustering model
Principal Component Analysis(PCA)
Support Vector Machine(SVM)
Definition, Curse of Dimensionality,
Dimensionality Reduction Technique, Model: Definition, Use Cases, Kernel
When to use PCA, Function, Aim of Support Vectors,
Use Cases Hyperplane, Gamma Value,
Steps in PCA, EigenValues and Regularization Parameter
EigenVectors, Scree Plot. Using Python library Sklearn to create
Using Python library Sklearn to create and evaluate the SVM Model
Principal Components
Summary of all Machine Learning Models and Discussion about the Capstone
Project
Note : All Machine Learning Algorithms are covered in depth with real time case
studies for each algorithm. Once 60% of ML is completed, Capstone Project will be
released for the batch.
TERM 2
Program Syllabus
CASE STUDY Module - 2 (40 hours)
Recommendation Engine for e-commerce/retail chain
Twitter data analysis using NLP
TERM 3
Program Syllabus
SQL Module - 1 (14 hours)
SQL and RDBMS Advance SQL
RDBMS And SQL Operations. Advance SQL Operations
Single Table Queries - SELECT, Data Aggregations and summarizing
WHERE, the data
ORDER BY, Distinct, And, OR Ranking Functions: Top-N Analysis
Multiple Table Queries: INNER, SELF, Advanced SQL Queries for Analytics
CROSS, and OUTER, Join, Left Join,
Right
Join, Full Join, Union
JSON Data & CRUD
Basics and CRUD Operation
Databases, Collection & Documents
NoSQL, HBase & MongoDB Shell & MongoDB drivers
NoSQL Databases What is JSON Data
Introduction to HBase Create, Read, Update, Delete
HBase Architecture, HBase Finding, Deleting, Updating, Inserting
Components, Storage Model of HBase Elements
HBase vs RDBMS Working with Arrays
Introduction to Mongo DB, CRUD Understanding Schemas and
Advantages of MongoDB over RDBMS Relations
Programming with SQL
Programming with SQL
Partitioning
Mathematical Functions Filtering Data
Variables Subqueries
Conditional Logic
Loops
Custom Functions
Grouping and Ordering
TERM 3
Program Syllabus
SQL Module - 1 (14 hours)
Assignments
Working with multiple tables
Practice Joins, Grouping and Subqueries
Using GROUP BY and HAVING Clauses
Practice Aggregation Queries
TERM 3
Program Syllabus
MongoDB Module - 2 (14 hours)
Introduction to MongoDB MongoDB (Advance)
What is MongoDB MongoDB Use cases
Characteristics and Features MongoDB Structures
MongoDB Ecosystem MongoDB Shell vs MongoDB Server
Installation process Data Formats in MongoDB
Connecting to MongoDB database MongoDB Aggregation Framework
Introduction to NoSQL Aggregating Documents
Introduction of MongoDB module Working with MongoDB Compass &
What are Object Ids in MongoDB exploring data visually
Understanding Create, Read, Update,
Delete
Schemas & Relations
Document Structure
Working with Numeric Data
Assignment
Working on Scheme Designing
Obtain the data in the
format you want by
formulating queries that are
both effective and high- Tools covered
performing.
TERM 3
Program Syllabus
Tableau Module - 3 (14 hours)
Introduction to Tableau Visual Analytics
Connecting to data source Getting Started With Visual Analytics
Creating dashboard pages Sorting and grouping
How to create calculated columns Working with sets, set action
Different charts Filters: Ways to filter, Interactive
Filters
Forecasting and Clustering
Dashboard and Stories
Working in Views with Dashboards
Tableau (Advance)
and Stories
Working with Sheets Mapping
Fitting Sheets Coordinate points
Legends and Quick Filters Plotting Latitude and Longitude
Tiled and Floating Layouts, Floating Custom Geocoding
Objects Polygon Maps
WMS and Background Image
Hands-on Assignments
Tools covered
Connecting data source and
data cleansing
Working with various charts
Deployment of Predictive
model in visualization
TERM 3
Program Syllabus
PowerBI Module - 4 (14 hours)
Getting Started With Power BI Programming with Power BI
Installing Power BI Desktop and Working with Time Series
Connecting to Data Understanding aggregation and
Overview of the Workflow in Power BI granularity
Desktop Filters and Slicers in Power BI Maps
Introducing the Different Views of the Scatterplots and BI Reports
Data Mode Connecting Dataset with Power BI
Query Editor Interface Creating a Customer Segmentation
Working on Data Model Dashboard Analyzing the Customer
Segmentation Dashboard
Assignments
Tools covered
Create Bar charts
Create Pie charts
Create Tree maps
Create Donut Charts
Create Waterfall Diagrams
Creating Table Calculations
for Gender
TERM 3
Program Syllabus
Big Data & Sparks Analytics Module - 5 (16 hours)
Introduction To Hadoop & Big Data What is Spark
Distributed Architecture - A Brief Introduction to Spark RDD
Overview. Understanding Big Data Introduction to Spark SQL and Data
Introduction To Hadoop, Hadoop frames
Architecture Using R-Spark for machine learning
HDFS, Overview of MapReduce Hands-on:
Framework Installation and configuration of
Hadoop Master: Slave Architecture Spark
MapReduce Architecture Using R-Spark for machine learning
Use cases of MapReduce programming
Hands-on
Tools covered
Map reduce Use Case 1: Youtube
data analysis
Map reduce Use Case 2: Uber
data analytics
Spark RDD programming
Spark SQL and Data frame
programming
TERM 3
Program Syllabus
Time Series Module - 6 (14 hours)
Introduction to Time Series Forecasting Introduction to ARIMA Models
Basics of Time Series Analysis and ARIMA Model Calculations, Manual
Forecasting ARIMA Parameter Selection
Method Selection in Forecasting ARIMA with Explanatory Variables
Moving Average (MA) Forecast Understanding Multivariate Time
Example Series and their Structure
Different Components of Time Series Checking for Stationarity and
Data Differencing the MTS
Log Based Differencing, Linear
Regression for Detrending
CASE STUDY
Time series classification of smartphone data to predict user behavior
Performing Time Series Analysis on Stock Prices
Time series forecasting of sales data
Note: All the assignments and case studies will be covered in-depth with real-time
examples
TERM 4
Program Syllabus
Excel Essentials (30 hours)
Getting started with Excel Using Excel Toolbars: Hiding,
Displaying, and Moving Toolbars
Creating a New Workbook
Navigating in Excel Entering Values in a Worksheet and
Moving the Cell Pointer Selecting a Cell Range
Using Excel Menus Previewing and Printing a Worksheet
Creating Headers, Footers, and Page Saving a Workbook & Re-opening a
Numbers saved workbook
Adjusting Page Margins and
Orientation
Adding Print Titles and Gridlines, rows
to repeat at top of each page
Switching Between Sheets in a
Formatting Fonts & Values
Workbook
Adjusting Row Height and Column
Splitting and Freezing a Window
Width
Inserting Page Breaks
Changing Cell Alignment
Advanced Printing Options
Adding Borders
Applying Colors and Patterns
Using the Format Painter
Merging Cells, Rotating Text
Using Auto Fill
Entering Date Values and using
AutoComplete
Editing, Clearing, and Replacing Cell
Contents Cutting,
Switching Between Sheets in a Copying, and Pasting Cells Moving
Workbook and Copying Cells with Drag and Drop
Inserting and Deleting Worksheets Collecting and Pasting Multiple Items
Renaming and Moving Worksheets Using the Paste Special Command
Protecting a Workbook
Hiding Columns, Rows and Sheets
Splitting and Freezing a Window
TERM 4
Program Syllabus
Excel Essentials (30 hours)
Inserting and Deleting Cells, Rows, and Using Excel Toolbars: Hiding,
Columns Displaying, and Moving Toolbars
Using Undo, Redo, and Repeat Entering Values in a Worksheet and
Checking Your Spelling Selecting a Cell Range
Finding and Replacing Information Previewing and Printing a Worksheet
Inserting Cell Comments Saving a Workbook & Re-opening a
Creating a basic Formula saved workbook
Cell Referencing
Calculating Value Totals with
AutoSum
Editing & Copying Formulas Creating & Working with Charts
Fixing Errors in Your Formulas Creating a Chart
Formulas with Several Operators Moving and Resizing a Chart
Cell Ranges Formatting and Editing Objects in a
Conditional Formatting Chart
Changing a Chart's Source Data
Working with the Forms Menu
Data Analysis & Pivot Tables
Sorting, Subtotaling & Filtering Data
Copy & Paste Filtered Records Creating a PivotTable
Using Data Validation Specifying the Data a PivotTable
Analyzes
Changing a PivotTable's Calculation
Changing a Chart Type and Working
with Pie Charts
Adding Titles, Gridlines, and a Data
Table
Formatting a Data Series and Chart
Axis
Using Fill Effects
TERM 4
Program Syllabus
Excel Essentials (30 hours)
Data Analysis & Pivot Tables
Creating a PivotTable
Specifying the Data a PivotTable Analyzes
Changing a PivotTable's Calculation
Selecting What Appears in a PivotTable
Grouping Dates in a PivotTable
Updating a PivotTable
Formatting and Charting a PivotTable
Automating Tasks with Macros
Recording a Macro
Playing a Macro and Assigning a Macro
Shortcut Key
BONUS
Program Syllabus MODULE
AI Generative Tools and
Future Trends
Emerging Trends in AI and Generative Natural Language Processing and
Modeling ChatGPT
Exploring other AI generative tools Introduction to natural language
beyond ChatGPT and DALL·E processing techniques
Overview of Midjourney Understanding ChatGPT and its
Discussion on future trends and architecture
advancements in AI generative tools Hands-on exercises using ChatGPT
Open-ended project and/or for text generation and completion
presentation on a selected topic, tasks
incorporating learned concepts Fine-tuning ChatGPT for specific
CASE STUDY
applications
Midjourney
DALL·E: Image Generation with AI Graph Neural Networks (GNN) for
Data Analysis
Introduction to DALL·E and its
capabilities Introduction to graph theory and its
Exploring image generation using relevance in data analysis
DALL·E Overview of Graph Neural Networks
Hands-on exercises for creating (GNN) and their applications
unique images with DALL·E Hands-on exercises using GNN for
Ethical considerations and tasks such as node classification and
limitations of AI-generated images link prediction
Case studies on real-world
applications of GNN in data science
DALL·E
BONUS
TERM 1
Program Syllabus MODULE
Adv. Gen-AI
Python Bootcamp for AI
Python Essentials: Syntax, Data Types, and Variables
Flow Control: Conditionals and Loops
Functions and Custom Modules
Data Handling with Pandas
Linux Basics and Environment Setup
Assessment: MCQ and Mini-Project.
Build Your Interview Assistant
Project Overview: Interview Automation Bot
Components & Architecture
Natural Language Models (LLMs): Introduction and Uses
GPT-3 Deep Dive: Attention, Transformers, RL
Interview Prompt Design
Evaluation Metrics and Performance Tuning
Speech Integration using Whisper
Deployment with Flask
Assessment: MCQ and Project.
Large Language Models (LLM
Historical Overview of NLP: From Rule-Based Systems to Machine Learning.
Evolution of Neural Network Architectures in NLP.
Milestones in NLP: Key Models and Breakthroughs leading to LLMs.
The Rise of Transformer Models and their Impact on NLP.
BONUS
TERM 1
Program Syllabus MODULE
Adv. Gen-AI
Visual AI for eCommerce
Introduction: Digital Transformation for Offline Businesses
Multimodal Models: DALL-E and Beyond
Style & Photography Principles for Visual AI
Designing Image Prompts
Standardizing Product Image Generation
Image and Text Synchronization
Assessment: MCQ and Project.
Intelligent News Aggregator
Project Outline: Personalized News Recommendation
GPT-3 & Copilot for Code Automation
Data Loading and Cleaning Techniques
Generating Data Analysis Code with Prompts
Model Development for Content Recommendation
Assessment: MCQ and Mini-Project.
Customer Support Bot - HelpMate Pro
Project Introduction and Components
Embeddings vs Fine-Tuning: When and How
Semantic Search in Customer Service
Query Answering with Vectorstore
Scaling with Pinecone
UI Improvements with GPT Models
Assessment: MCQ and Project.
BONUS
TERM 1
Program Syllabus MODULE
Adv. Gen-AI
Knowledge Discovery Bot
Project Overview and Architecture
LangChain Tools and Concepts
Backend Development with Vectorstore
Intelligent Indexing and Search
Connecting Components with LangChain Chains
User Feedback and Continuous Improvements
Assessment: MCQ and Mini-Project.
Azure OpenAI Integration
OpenAI on Azure: Services and Scalability
Revisiting HelpMate Pro: Scaling Strategy
UI/UX Best Practices for Bots
Azure OpenAI Services in Action
Assessment: MCQ and Mini-Project.
The Future & Ethics of Generative AI
Responsible AI: Bias and Fairness
Future Trends: Multimodal Models and RLHF
Closing Remarks
Assessment: MCQ
Capstone Project (3 Weeks)
Building an Integrated Prompt Engineering Solution
Project Submission and Peer Review
Contact Us
Click here to whatsapp
or call us at
+91 77956 87988
www.learnbay.co