UNIT - 1 INTRODUCTION OF MACHINE LEARNING and Foundation of ML

INTRODUCTION OF
MACHINE LEARNING
PRESENTED BY
J.KOWSY SARA
ASSISTANT PROFESSOR

Before diving into Machine Learning (ML), it's
essential to build a strong foundation in several
key areas. Here’s a structured learning path:

1. Before starting with Machine
Learning, study:
1. Mathematics
2. Programming
3. Data Handling
4. Machine Learning Basics
5. Advanced Topics

Learning, study:
MATHEMATICS

Learning, study:
PROGRAMMING

Learning, study:
Data Handling – Data Cleaning (Handling missing values, duplicates, outliers.), Feature Engineering
(Encoding categorical variables, feature scaling), and Visualization (Encoding categorical variables, feature
scaling).

Learning, study:
Machine Learning Basics

Learning, study:
Advanced Topics

Learning, study:

WHAT IS MACHINE LEARNING
 A branch of AI that allows computers to learn from data and make predictions without explicit programming
TYPES OF MACHINE LEARNING
 Supervised Learning – Labeled data used for training (e.g., regression, classification). Eg. Spam detection in
emails (labeled as spam or not).

 Unsupervised Learning – No labels, finds hidden patterns (e.g., clustering,
dimensionality reduction). Eg. Customer segmentation in marketing (grouping users
without labels).

 Reinforcement Learning – Agent learns by interacting with the
environment (eg. Self-driving cars learning optimal routes by interacting
with the environment.)

UNIT – 1 INTRODUCTION TO MACHINE LEARNING
 Review of Linear Algebra for machine learning;
 Introduction and motivation for machine learning;
 Examples of machine learning applications,
 Vapnik-Chervonenkis (VC) dimension,
 Probably Approximately Correct (PAC) learning,
 Hypothesis spaces,
 Inductive bias,
 Generalization,
 Bias variance trade-off.

1.Review of Linear Algebra for machine
learning
 Linear algebra is the backbone of machine learning
 Solve and compute large the complex data set
 Handling high-dimensional data, transformations, and optimization techniques
Data in Linear Algebra
Scalars, Vectors, Matrices, and Tensors
Operation
Matrix Operations
Vector-matrix operation
Matrix Operations
• Scalar-Matrix Multiplication
• Matrix-Matrix Addition,Sub,Mullti
Vector-matrix operation
• Vector-matrix multiplication
• Transpose
• Inverse

1.1 Scalars (Single Values)
 A scalar is a single numerical value. It represents a simple quantity, such
as:
1. A temperature reading (e.g., 30°C)

1.2 Vectors (1D Arrays)
 A vector is a 1D array of numbers and represents features or parameters.
Example: Feature Representation
A student's test scores across 3 subjects:

1.3 Matrices (2D Arrays)
 A matrix is a 2D array of numbers and is used to store datasets,
transformation functions, and model parameters.
Example: Dataset Representation
A dataset with 3 students' test scores:

1.4 Tensors (Higher-Dimensional Arrays)
 A tensor is a multi-dimensional array and is used in deep learning to store
inputs, outputs, and gradients.
Example: Image Data Representation
A color image (RGB) with height h, width w, and 3 color channels is
represented as a 3D tensor.

2. Vector and Matrix Operations in Machine
Learning
 2.1 Dot Product (Inner Product)
The dot product of two vectors measures similarity.
Example: Feature Weighting in Linear Regression

Learning
2.2 Matrix Multiplication
Multiplying a matrix by a vector transforms data.
Example: Linear Transformation
For matrix A and vector x:

Learning
3. Eigenvalues and Eigen
Eigenvalues and eigenvectors are used to understand transformations.
Example: PCA(Principal component analysis (PCA) is a machine learning technique that reduces
the number of dimensions in large data sets
PCA finds principal components using eigenvectors of the covariance matrix.
 Compute the covariance matrix of the dataset.
 Find its eigenvectors and eigenvalues.
 Select the top k eigenvectors to reduce dimensionality.

Learning
Example: PCA(Principal component analysis (PCA) is a machine learning technique that reduces
the number of dimensions in large data sets

Learning
Compute the covariance matrix
(A covariance matrix is a square matrix that shows how much different variables in a data set change together. )

1.2. Vector and Matrix Operations in Machine
Learning
Example: PCA for Dimensionality Reduction

Learning
4. Gradient Descent and Matrix Calculus
Machine learning models optimize parameters using gradient descent.
Gradient Descent:
 Gradient Descent is a technique used in Machine Learning to find the best solution by
minimizing errors.

Importance of Linear Algebra
Data Representation:
 Datasets are represented as matrices or vectors for efficient processing.
 Each row can represent a data point, and each column can represent a feature.

Model Computations:
 Linear transformations (e.g., matrix multiplication) are used in algorithms to make predictions.
 Weight updates in models like linear regression and neural networks involve matrix operations.

Dimensionality Reduction:
Techniques like PCA (Principal Component Analysis) use eigenvectors and eigenvalues to reduce data
dimensions while preserving variance.
• Eigenvectors are special vectors that don’t change
direction when a matrix transformation is applied.
• Eigenvalues are scaling factors that tell how much an
eigenvector is stretched or shrunk.

Introduction and motivation for machine
learning

learning
📌 What is Machine Learning?
 A subset of AI that enables computers to learn from data without explicit
programming.
 Uses statistical techniques to improve performance on tasks over time.
📌 Types of Machine Learning:
 Supervised Learning (e.g., Spam detection in emails)
 Unsupervised Learning (e.g., Customer segmentation in marketing)
 Reinforcement Learning (e.g., Self-driving cars optimizing routes)
Introduction to Machine Learning

learning
📌 Real-World Applications:
 Healthcare – Disease prediction (e.g., AI-powered cancer diagnosis)
 Finance – Fraud detection in transactions
 Retail – Recommendation systems (e.g., Amazon, Netflix)
 Manufacturing – Predictive maintenance of machines
📌 Key Benefits:
✅ Automates repetitive tasks
✅ Enhances decision-making with data-driven insights
✅ Improves efficiency and accuracy
Why Machine Learning? (Motivation)

Motivation for Machine Learning
✅ Automation & Efficiency – Reduces manual effort by automating complex tasks.
✅ Data-Driven Decisions – Extracts valuable insights from large datasets.
✅ Improved Accuracy – Enhances predictions and reduces human errors.
✅ Scalability – Can handle vast amounts of data in real-time.
✅ Personalization – Powers recommendation systems (e.g., Netflix, Amazon).
✅ Solving Complex Problems – Used in healthcare, finance, self-driving cars, and more.

learning
📌 How ML is Used in Autonomous Vehicles?
 Computer Vision – Identifies pedestrians, traffic signs, and other vehicles.
 Sensor Fusion – Combines data from cameras, radar, and LiDAR for
navigation.
 Decision Making – Uses Reinforcement Learning to optimize driving behavior.
📌 Why is it Important?
🚗 Reduces human error and accidents
Increases efficiency in transportation
⚡
Leads to smart, interconnected cities
🌎
Real-Time Example – Self-Driving Cars

Examples of machine learning applications
📌 Real-World Examples:
•🎯 Recommendation Systems – Netflix, YouTube, and Amazon suggest
content/products.
•💬 Chatbots & Virtual Assistants – Siri, Alexa, and ChatGPT for customer
support.
•📸 Image Recognition – Facial recognition in smartphones (Face ID).
•🔎 Search Engines – Google ranks and personalizes search results.
•🏦 Fraud Detection – Banks use ML to detect fraudulent transactions.

Examples of machine learning applications

Vapnik-Chervonenkis (VC) dimension
BALANCING BIAS AND VARIANCE
 Low VC dimension means the model is simple and may underfit
 high VC dimension means the model is complex and may overfit.
the VC dimension is 3
VC dimension can help in choosing the right algorithm for a machine learning problem!

 Definition:
 VC (Vapnik-Chervonenkis) Dimension in Machine Learning is a measure of a model's capacity
to classify data by counting the maximum number of points it can shatter (perfectly separate).
 VC Dimension is the largest number of points that a hypothesis class can shatter.
 The VC dimension is a measure of a model's complexity.
 Hypothesis Class:
 It is a set of functions or decision rules that a learning algorithm can choose from when trying to
classify data.
 Each function in the hypothesis class represents one possible way to assign labels (e.g., + or -) to
input data.
In simple terms, "shatter" in machine learning means a
model’s ability to perfectly classify all possible label
combinations for a set of points.

BALANCING BIAS AND VARIANCE
 Low VC dimension
A simple model with a low VC dimension (e.g., a linear classifier) may have high bias, meaning it
might underfit the data and miss important patterns.
 high VC dimension
A complex model with a high VC dimension (e.g., deep neural networks) has more flexibility but
may overfit, leading to high variance.
Overfitting
Model learns too much from the training
data, including noise and irrelevant details.
Underfitting
Model doesn’t learn enough from the
training data, missing key patterns.

 Avoiding Overfitting
 VC dimension can guide you in selecting models with the right level of complexity:
 If the data is simple: Choose an algorithm with a lower VC dimension (e.g., logistic
regression or decision trees with limited depth).
 If the data is complex: You may need an algorithm with a higher VC dimension (e.g., deep
learning or support vector machines with non-linear kernels).

 Avoiding Overfitting
 VC dimension can guide you in selecting models with the right level of
complexity:
 If the data is simple: Choose an algorithm with a lower VC dimension
(e.g., logistic regression or decision trees with limited depth).
 If the data is complex: You may need an algorithm with a higher VC
dimension (e.g., deep learning or support vector machines with non-linear
kernels).

 VC dimension is often theoretical, but we can approximate model complexity using:
✅ Python + Scikit-Learn – To analyze decision boundaries and model complexity.
✅ MATLAB – Used in research for theoretical VC dimension calculations.
✅ TensorFlow/PyTorch – Can help visualize model complexity and capacity.
 🔹 Example:
Using Python to check how a model's complexity (e.g., Decision Trees, SVM) affects its performance
and generalization.
Finding VC Dimension (Model Complexity Analysis)

Finding VC Dimension (Model Complexity Analysis)

Probably Approximately Correct (PAC)
Learning
What is PAC Learning?
 A framework in machine learning that ensures an algorithm can learn a function with high probability and
low error
 The goal is to find a hypothesis that is Probably close to the true function and Approximately correct.
 PAC Learning is a way to check if a machine learning model can make good predictions on new
data with high confidence and low mistakes.

Learning
What is PAC Learning?
 A framework in machine learning that ensures an algorithm can learn a function with high probability and
low error
 The goal is to find a hypothesis that is Probably close to the true function and Approximately correct.
 PAC Learning is a way to check if a machine learning model can make good predictions on new
data with high confidence and low mistakes.
 The model doesn’t have to be perfect but should be "probably" correct most of the time.
 It should learn from past data and make good guesses on new data with minimal errors.

Learning
Real-World Example: Spam Detection
How PAC Learning Helps in Spam Detection:
 An email filter learns from past emails labeled as "spam" or "not spam."
 It doesn’t need to be 100% perfect but should be correct most of the time.
 If it marks 98 out of 100 spam emails correctly, it is "probably approximately correct" with a small
error (2 missed spams).
 Over time, as it gets more data, it improves and makes fewer mistakes.

Learning
Checking PAC Learning (Model Generalization & Error Bounds)
 PAC learning is tested by evaluating model performance on unseen data using:
✅ Scikit-Learn – For train-test splits, cross-validation, and error measurement.
✅ TensorFlow/PyTorch – For deep learning models and checking generalization.
✅ NumPy & StatsModels – To analyze statistical confidence and error bounds.

Learning
Checking PAC Learning (Model Generalization & Error Bounds)

Introduction to Hypothesis Spaces
📌 What is a Hypothesis Space?
 A hypothesis space is the set of all possible functions (models) a learning algorithm can choose
from to map inputs to outputs.
 Different algorithms have different hypothesis spaces (e.g., decision trees, neural networks, linear
regression).

Example:
 A linear model assumes data follows a straight line → small hypothesis space (fewer functions to
choose from).
 A neural network can fit complex patterns → large hypothesis space (many possible functions).
Real-World Example:
🏠 House Price Prediction
 A simple linear model assumes price changes linearly with square footage.
 A more complex model (neural network) considers location, crime rate, and market trends.

Common Learning Algorithms:
•Linear Regression (for predicting continuous values)
•Decision Trees (for classification)
•Neural Networks (for deep learning tasks)
•K-Means Clustering (for grouping similar data)
•Support Vector Machines (SVM) (for classification problems)

Practical Applications & Challenges
📌 Where is Hypothesis Space Used?
✔ Medical Diagnosis – Finding disease patterns from symptoms.
✔ Fraud Detection – Identifying unusual financial transactions.
✔ Autonomous Cars – Predicting safe driving actions based on traffic conditions.
📌 Challenges:
⚠ Computational Cost – Large hypothesis spaces need more training time.
⚠ Bias-Variance Trade-off – Picking the right complexity is crucial for performance.

Inductive Bias
What is Inductive Bias?
• Inductive bias refers to the assumptions a learning algorithm makes to generalize from
training data to unseen data.
• Since we never have infinite data, a model needs biases to make reasonable
predictions.
• Inductive bias is the assumption a machine learning model makes to predict new data based
on past learning.

Inductive Bias
• Inductive bias is the assumption a machine learning model makes to predict new data based
on past learning.
Why is it needed?
•A model never sees all possible data in the world.
•To make good predictions, it needs some assumptions about how things work.

Generalization & Bias-Variance Trade-off
What is Generalization?
 Generalization is the ability of a machine learning model to perform well on new, unseen data,
not just on the training data.
 A good model should learn patterns, not just memorize past data

Example: Handwriting Recognition ✍️
 You train a model using handwriting samples from 100 people.
 A well-generalized model can correctly recognize handwriting from a new person it has never
seen before.
 If the model only memorizes the 100 people's writing styles, it will fail on new handwriting (poor
generalization).

What is the Bias-Variance Trade-off?
📌 Bias and Variance Explained Simply:
✅ High Bias (Underfitting) → The model is too simple and misses patterns in the data.
✅ High Variance (Overfitting) → The model is too complex and memorizes training data instead
of learning general patterns.
✅ Ideal Model → Balanced bias and variance, making good predictions on both seen and
unseen data.

🔹Example:
📊 Predicting House Prices
A high-bias model (like a straight line) may miss important factors (house size, location).
A high-variance model (overly complex) might memorize noise (temporary price fluctuations)
and fail on new houses.

UNIT - 1 INTRODUCTION OF MACHINE LEARNING and Foundation of ML

UNIT - 1 INTRODUCTION OF MACHINE LEARNING and Foundation of ML

More Related Content

Similar to UNIT - 1 INTRODUCTION OF MACHINE LEARNING and Foundation of ML

Recently uploaded

UNIT - 1 INTRODUCTION OF MACHINE LEARNING and Foundation of ML