Unit-3-ML
Unit-3-ML
What is ML?
• As per the 1959 definition of Arthur Samuel, machine
learning can be defined as a process of inputting data
to the computer systems in a way that the computer
will learn the ability to process and perform the
activity in the future without being explicitly
programmed or being fed with similar or extra data.
• He has developed the game of checkers
• Robert Nealey, the self-proclaimed checkers master,
played the game on an IBM 7094 computer in 1962,
and he lost to the computer
• Definition:
• Machine learning is a field of artificial intelligence that allows
systems to learn and improve from experience without being
explicitly programmed
Types of ML
• Supervised Machine Learning
• Learn from the given right answers
• Maps the input (x) onto output (y)
Types of supervised Machine learning
Algorithms: Regression and Classification
• Regression
• Regression algorithms are used if there is a
relationship between the input variable and the
output variable. It is used for the prediction of
continuous variables, such as Weather forecasting,
Market Trends, etc. Below are some popular
Regression algorithms which come under
supervised learning:
Types of supervised Machine learning
Algorithms:
Classification
Classification predicts Categories
Classifiers can predict a small number of possible outputs but
not all like (0.5,0.7 etc)
Types of supervised Machine learning
Algorithms:
• Classification algorithms are used when the output
variable is categorical, which means there are two
classes such as Yes-No, Male-Female, True-false, etc.
• K-NN
• Decision Trees
• Logistic Regression
• Support vector Machines
Unsupervised Machine Learning
•Unsupervised learning is helpful for finding useful
insights from the data.
• K-means clustering
• KNN (k-nearest neighbors)
• Hierarchal clustering
• Anomaly detection
• Neural Networks
• Principle Component Analysis
• Independent Component Analysis
• Apriori algorithm
• Singular value decomposition
Supervised Vs Unsupervised Learning
Supervised Learning Unsupervised Learning
Supervised learning algorithms are trained using labeled data. Unsupervised learning algorithms are trained using unlabeled data.
Supervised learning model takes direct feedback to check if it is Unsupervised learning model does not take any feedback.
predicting correct output or not.
Supervised learning model predicts the output. Unsupervised learning model finds the hidden patterns in data.
In supervised learning, input data is provided to the model along with the In unsupervised learning, only input data is provided to the model.
output.
The goal of supervised learning is to train the model so that it can predict The goal of unsupervised learning is to find the hidden patterns and
the output when it is given new data. useful insights from the unknown dataset.
Supervised learning needs supervision to train the model. Unsupervised learning does not need any supervision to train the model.
Supervised learning model produces an accurate result. Unsupervised learning model may give less accurate result as compared
to supervised learning.
Supervised learning is not close to true Artificial intelligence as in this, we Unsupervised learning is more close to the true Artificial Intelligence as it
first train the model for each data, and then only it can predict the learns similarly as a child learns daily routine things by his experiences.
correct output.
It includes various algorithms such as Linear Regression, Logistic It includes various algorithms such as Clustering, KNN, and Apriori
Regression, Support Vector Machine, Multi-class Classification, Decision algorithm.
tree, Bayesian Logic, etc.
Reinforcement Learning (Learns from
Mistakes)
• RL falls between supervised and unsupervised learning
• Unlike supervised learning where the feedback provided to the agent
is correct set of actions for performing a task, reinforcement learning
uses rewards and punishments as signals for positive and negative
behavior
• As compared to unsupervised learning, reinforcement learning is different
in terms of goals. While the goal in unsupervised learning is to find
similarities and differences between data points, in the case of
reinforcement learning the goal is to find a suitable action model that
would maximize the total cumulative reward of the agent. The figure
below illustrates the action-reward feedback loop of a generic RL model.
• Most commonly used RL algorithms are-
• Deep Q Networks
• SARSA (State-Action-Reward-State-
Action)
Importance of ML
• To find relationship between data
• Helps to make data driven decisions
• Predictions of future outcomes
Applications
Steps in ML
• Identify the Problem statement
• Data Collection
• Data Cleaning
• Building ML Models
• Improving ML models
Data Collection
• This step includes the below tasks:
• Identify various data sources
• Collect data
• Integrate the data obtained from different sources
• By performing the above task, we get a coherent set of data, also
called as a dataset. It will be used in further steps.
Data preparation
• Data exploration:
It is used to understand the nature of data that we have to work with.
We need to understand the characteristics, format, and quality of
data.
A better understanding of data leads to an effective outcome. In this,
we find Correlations, general trends, and outliers.
• Data pre-processing:
Now the next step is preprocessing of data for its analysis.
Data Wrangling
• Data wrangling is the process of cleaning and converting raw data into a useable
format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in
the next step.
• In real-world applications, collected data may have various issues, including:
• Missing Values
• Duplicate data
• Invalid data
• Noise
• So, we use various filtering techniques to clean the data.
• It is mandatory to detect and remove the above issues because it can negatively
affect the quality of the outcome.
Data Analysis
• Now the cleaned and prepared data is passed on to the analysis step.
This step involves:
• Selection of analytical techniques
• Building models
• Review the result
• we select the machine learning techniques such
as Classification, Regression, Cluster analysis, Association, etc. then
build the model using prepared data, and evaluate the model.
Train Model
We use datasets to train the model using various machine learning
algorithms. Training a model is required so that it can understand the
various patterns, rules, and, features.
Test Model
• Testing the model determines the percentage accuracy of the model
as per the requirement of project or problem.
Deployment
• If the above-prepared model is producing an accurate result as per
our requirement with acceptable speed, then we deploy the model in
the real system.
• But before deploying the project, we will check whether it is
improving its performance using available data or not.
• The deployment phase is similar to making the final report for a
project.
Regression applications
• Evaluating trends and sales estimates
• Analyse pricing elasticity
• Assess risk in an insurance company
• Sports analysis
• Predicting age of a person
• Predicting house price based on area
• Predict the number of copies a music album will be sold next month
Linear Regression
Model
Types of
Regression Models
1 Explanatory Regression 2+ Explanatory
Variable Models Variables
Simple Multiple
Non- Non-
Linear Linear
Linear Linear
Linear Regression Model
Relationship between variables is a linear function
y = 0 + 1 x +
Dependent Independent
(Response) (Explanatory)
Variable Variable
Line of Means
Change
β1 = Slope in y
Change in x
β0 = y-intercept
x
Linear Regression Model
y yi = 0 + 1 xi + i Observed
value
i = Random error
E ( y ) = 0 + 1 x
x
Observed value
Sample Linear Regression Model
y
60
40
20
0 x
0 20 40 60
Thinking Challenge
y
60
40
20
0 x
0 20 40 60
Least Squares
• ‘Best fit’ means difference between actual y values and predicted
y values are a minimum
• But positive differences off-set negative
n n
( y − yˆ ) = ˆ
2 2
i i i
i =1 i =1
• Least Squares minimizes the Sum of the
Squared Differences (SSE)
Least Squares Graphically
n
LS minimizes i = 1 + 2 + 3 + 4
ˆ 2
ˆ 2
ˆ 2
ˆ 2
ˆ 2
i =1
x i yi
i =1
n
i =1
x y
i i −
ˆ SS xy n
Slope 1 = = i =1
2
n
SS xx
x i
n
i =1
xi
2
−
i =1 n
2 2
xi yi xi yi x i yi
2
x1 y1 x1 y12 x 1 y1
2 2
x2 y2 x2 y2 x 2 y2
: : : : :
2
xn yn xn2 yn xnyn
2 2
Σxi Σyi Σxi Σyi Σxiyi
Interpretation of Coefficients
^
1. Slope (1)
^
• Estimated y changes by 1 for each 1unit increase
in x
^
— If 1 = 2, then Sales (y) is expected to increase by 2
for each 1 unit increase in Advertising (x)
^
2. Y-Intercept (0)
• Average value of y when x = 0
^
— If 0 = 4, then Average Sales (y) is expected to be
4 when Advertising (x) is 0
Least Squares Example
You’re a marketing analyst for Hasbro Toys.
You gather the following data:
Ad $ Sales (Units)
1 1
2 1
3 2
4 2
5 4
Find the least squares line relating
sales and advertising.
Scattergram
Sales vs. Advertising
Sales
4
3
2
1
0
0 1 2 3 4 5
Advertising
Parameter Estimation Solution Table
2 2
xi yi xi yi x i yi
1 1 1 1 1
2 1 4 1 2
3 2 9 4 6
4 2 16 4 8
5 4 25 16 20
15 10 55 26 37
Parameter Estimation Solution
n
n
x i yi
n
i =1 i =1 (15 )(10 )
x y
i i −
n
37 −
ˆ1 = i =1
= 5 = .70
(15 )
2 2
n
x i 55 −
n
5
i =1
xi
2
−
i =1 n
yˆ = −.1 + .7 x
Parameter Estimation
Computer Output
Parameter Estimates
^1
yˆ = −.1 + .7 x
Coefficient Interpretation Solution
^
1. Slope (1)
• Sales Volume (y) is expected to increase by .7
units for each $1 increase in Advertising (x)
^
2. Y-Intercept (0)
• Average value of Sales Volume (y) is -.10 units
when Advertising (x) is 0
— Difficult to explain to marketing manager
— Expect some sales without advertising
Regression Line Fitted
to the Data
Sales
4
3 yˆ = −.1 + .7 x
2
1
0
0 1 2 3 4 5
Advertising
Least Squares
Thinking Challenge
You’re an economist for the county cooperative.
You gather the following data:
Fertilizer (lb.) Yield (lb.)
4 3.0
6 5.5
10 6.5
12 9.0
Find the least squares line relating © 1984-1994 T/Maker Co.
Yield (lb.)
10
8
6
4
2
0
0 5 10 15
Fertilizer (lb.)
Parameter Estimation Solution Table*
2 2
xi yi xi yi x i yi
4 3.0 16 9.00 12
6 5.5 36 30.25 33
10 6.5 100 42.25 65
12 9.0 144 81.00 108
32 24.0 296 162.50 218
Parameter Estimation Solution*
n
n
x i yi
n
i =1 ( 32 )( 24 )
i =1
x y
i i − 218 −
n
ˆ1 = i =1
= 4 = .65
( 32 )
2 2
n
x i 296 −
n
4
i =1
xi −
2
i =1 n
ˆ0 = y − ˆ1 x = 6 − (.65 )( 8 ) = .80
yˆ = .8 + .65 x
Coefficient Interpretation Solution*
^
1. Slope (1)
• Crop Yield (y) is expected to increase by .65 lb. for
each 1 lb. increase in Fertilizer (x)
^
2. Y-Intercept (0)
• Average Crop Yield (y) is expected to be 0.8 lb.
when no Fertilizer (x) is used
Regression Line Fitted
to the Data*
Yield (lb.)
10
8 yˆ = .8 + .65 x
6
4
2
0
0 5 10 15
Fertilizer (lb.)
Comments on coefficients
• The expression for the regression coefficient or slope is
𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦 σ(𝑥 − 𝑥)(𝑦 ҧ − 𝑦)
ത
𝑏= 2 2
=
𝑛σ𝑥 − σ𝑥 σ(𝑥 − 𝑥)ҧ 2
• Thus : for a feature x and a target variable y, the regression coefficient
𝜎𝑥𝑦
is given by
𝜎𝑥𝑥
Zero-mean data
σ 𝑦−𝑏 σ 𝑥
• Intercept=𝑎 = = 𝑦ത − 𝑏𝑥ҧ
𝑛
• From this expression, it is seen that regression line passes through
(𝑥,ҧ 𝑦)
ത
• Adding a constant to all x-values (a translation) will affect only the
intercept but not the regression coefficient (slope of the line does not
change).
• zero-centre the x-values by subtracting 𝑥.ҧ a = 𝑦.
ത
• model = LinearRegression().fit(X, y)
• r_sq = model.score(X, y)
• print(f"coefficient of determination: {r_sq}")
• c=model.intercept_
• m=model.coef_
• print(f"intercept: {model.intercept_}")
• print(f"coefficients: {model.coef_}")
• y_pred = model.predict(X)
• print(f"predicted response:\n{y_pred}")
CTRP Spend Revenue
Feature Scaling 133
111
111600 1197576
104400 1053648
• Feature scaling in Machine Learning is a method 129
117
97200 1124172
79200 987144
used to normalize or standardize the range of 130 126000 1283616
independent variables or features of data. 154 108000 1295100
• Standardization
• Standardization is another scaling
method where the values are centered
around the mean with a unit standard
deviation. This means that the mean of
the attribute becomes zero, and the µ is the mean of the feature values and
resultant distribution has a unit standard σ is the standard deviation of the feature values.
deviation.
• Value don’t fall in perfect range
• Useful if data contains outliers
Normalization Standardization
Centers data around the mean and scales to a
Rescales values to a range between 0 and 1
standard deviation of 1
Useful when the distribution of the data is Useful when the distribution of the data is
unknown or not Gaussian Gaussian or unknown
Sensitive to outliers Less sensitive to outliers
Retains the shape of the original distribution Changes the shape of the original distribution
May not preserve the relationships between the Preserves the relationships between the data
data points points
Equation: (x – min)/(max – min) Equation: (x – mean)/standard deviation
However, the choice of using normalization or standardization will depend on problem and the
machine learning algorithm using.
There is no hard and fast rule to tell you when to normalize or standardize your data.
You can always start by fitting your model to raw, normalized, and standardized data and comparing
the performance for the best results.
It is a good practice to fit the scaler on the training data and then use it to transform the testing data. This
would avoid any data leakage during the model testing process.
Also, the scaling of target values is generally not required.
Python Functions for Normalization
# data normalization with sklearn
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()
X_train_scaled=scaler.fit_transform(X_train)
# Output is in an array form that needs to convert into dataframe
X_train_scaled_df = pd.DataFrame(X_train_scaled, columns=X_train.columns)
Python Functions for Standardization
# data Standardization with sklearn
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
X_train_scaled=scaler.fit_transform(X_train)
# Output is in an array form that needs to convert into dataframe
X_train_scaled_df = pd.DataFrame(X_train_scaled, columns=X_train.columns)
# To getback data in the normal format
X_train_inv=scaler.inverse_transform(X_train_scaled)
X _train_inv= pd.DataFrame(X_train_inv, columns=X_train.columns)
Logistic Regression
KNN Algorithm - Finding K-Nearest Neighbors
• It’s a type of supervised ML algorithm which can be used for both
classification as well as regression predictive problems.
• However, it is mainly used for classification predictive problems in
industry.
• The following two properties would define KNN well :
• Lazy learning algorithm − KNN is a lazy learning algorithm because it does not
have a specialized training phase and uses all the data for training while
classification.
• Non-parametric learning algorithm − KNN is also a non-parametric learning
algorithm because it doesn’t assume anything about the underlying data.
• There is only memorization of training data no actual learning
KNN algorithm
• The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean
distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number
of the neighbor is maximum.
• Step-6: Our model is ready.
KNN- Example
•Firstly, we will choose the number •By calculating the Euclidean distance
• Suppose we have a new data we got the nearest neighbors, as three
point and we need to put it in of neighbors, so we will choose the
the required category. Consider k=5. nearest neighbors in category A and two
the below image: •Next, we will calculate nearest neighbors in category B.
the Euclidean distance between the Consider the below image:
data points. The Euclidean distance is
the distance between two points,
which we have already studied in
geometry. It can be calculated as:
Machines
own needs. PowerPoint originals are
available. If you make use of a significant
portion of these slides in your own lecture,
please include this message, or the following
link to the source repository of Andrew’s
tutorials:
http://www.cs.cmu.edu/~awm/tutorials .
Comments and corrections gratefully
received.
Thanks:
Andrew Moore
CMU
And
Martin Law
Michigan State University
History of SVM
• SVM is related to statistical learning theory [3]
• SVM was first introduced in 1992 [1]
• SVM becomes popular because of its success in handwritten digit
recognition
• 1.1% test error rate for SVM. This is the same as the error rates of a carefully
constructed neural network, LeNet 4.
• See Section 5.11 in [2] or the discussion in [3] for details
• SVM is now regarded as an important example of “kernel methods”,
one of the key area in machine learning
• Note: the meaning of “kernel” is different from the “kernel” function for
Parzen windows
[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on
Computational Learning Theory 5 144-152, Pittsburgh, 1992.
[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th
IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82.
[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.
2023/10/11 102
Introduction to SVM
Definition of SVM
• Support Vector Machine or SVM is one of the most
popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine
Learning.
Significance of SVM
• The goal of the SVM algorithm is to create the best line or
decision boundary that can segregate n-dimensional space
into classes so that we can easily put the new data point in
the correct category in the future. This best decision
boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in
creating the hyperplane. These extreme cases are called as
support vectors, and hence algorithm is termed as Support
Vector Machine. Consider the below diagram in which
there are two different categories that are classified using
a decision boundary or hyperplane:
• Hyperplane and Support Vectors in the SVM algorithm:
• Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in
n-dimensional space, but we need to find out the best decision boundary that helps to
classify the data points. This best boundary is known as the hyperplane of SVM.
• The dimensions of the hyperplane depend on the features present in the dataset, which
means if there are 2 features (as shown in image), then hyperplane will be a straight line.
And if there are 3 features, then hyperplane will be a 2-dimension plane.
• We always create a hyperplane that has a maximum margin, which means the maximum
distance between the data points.
• Support Vectors:
• The data points or vectors that are the closest to the hyperplane and which affect the
position of the hyperplane are termed as Support Vector. Since these vectors support the
hyperplane, hence called a Support vector.
• Linear SVM: Linear SVM is used for linearly separable data, which
means if a dataset can be classified into two classes by using a single
straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated
data, which means if a dataset cannot be classified by using a straight
line, then such data is termed as non-linear data and classifier used is
called as Non-linear SVM classifier.
How does Linear Classifiers works? Estimation:
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
w: weight vector
denotes -1 x: data vector
2023/10/11 106
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2023/10/11 107
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2023/10/11 108
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2023/10/11 109
a
Linear Classifiers
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1
2023/10/11 110
a
Classifier Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 Define the margin of a
linear classifier as the
width that the boundary
could be increased by
before hitting a datapoint.
2023/10/11 111
a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x - b)
denotes +1
denotes -1 The maximum margin
linear classifier is the linear
classifier with the, um,
maximum margin.
This is the simplest kind of
SVM (Called an LSVM)
Linear SVM
2023/10/11 112
a
Maximum Margin
x f yest
f(x,w,b) = sign(w. x + b)
denotes +1
denotes -1 The maximum margin
linear classifier is the linear
classifier with the, um,
maximum margin.
This is the simplest kind of
Support Vectors SVM (Called an LSVM)
are those
datapoints that Linear SVM
the margin pushes
up against
2023/10/11 113
f(x,w,b) = sign(w. x - b)
X – Vector
W
W – Normal Vector
b – Scale Value
◼ http://mathworld.wolfram.com/Point-LineDistance2-
Dimensional.html
◼ In our case, w1*x1+w2*x2+b=0,
2023/10/11 115
Estimate the Margin
denotes +1
denotes -1 x
wx +b = 0
X – Vector
W
W – Normal Vector
b – Scale Value
Class 2
m
Class 1
2023/10/11 117
Finding the Decision Boundary
• Let {x1, ..., xn} be our data set and let yi {1,-1} be the class label of
xi
• The decision boundary should classify all points correctly
• To see this: when y=-1, we wish (wx+b)<1, when y=1, we wish
(wx+b)>1. For support vectors, we wish y(wx+b)=1.
• The decision boundary can be found by solving the following
constrained optimization problem
2023/10/11 118
A Geometrical Interpretation
Class 2
a10=0
a8=0.6
a7=0
a2=0
a5=0
a1=0.8
a4=0
a6=1.4
a9=0
a3=0
Class 1
2023/10/11 119