ML - Unit 2
ML - Unit 2
Pallavi Shukla
Assistant Professor
UCER
Regression
• Regression analysis is a statistical method to model the relationship
between dependent (target) and independent (predictor) variables
with one or more independent variables.
• Helps us to understand how the value of the dependent variable
changes corresponding to an independent variable when other
independent variables are held fixed.
• Regression searches for relationships among variables.
• For example, you can observe several employees of some company
and try to understand how their salaries depend on the features, such
as experience, level of education, role, city they work in, and so on.
Regression
• In Regression, we plot a graph between the variables which best fits the
given datapoints.
• Using this plot, the machine learning model can make predictions about
the data.
• In simple words, "Regression shows a line or curve that passes through
all the datapoints on target-predictor graph in such a way that the
vertical distance between the datapoints and the regression line is
minimum."
• The distance between datapoints and line tells whether a model has
captured a strong relationship or not.
Examples
Ex – Ex –
Max f(x) of sin θ = 90o Arg.max f(x) of sin θ = 1
It means sin θ has its max value at 90o It means sin θ has a maximum value of 1.
BRUTE FORCE BAYESIAN CONCEPT
LEARNING -
• Also called Brute Force Algorithm.
P(h|D) = P(D|h) . P(h)
P(D)
• hMAP = Arg max P(h|D)
• Let P(h) = 1 / |H| for all h in H.
• h = a single hypothesis , H = a set consisting pf all hypothesis
• H = {h1, h2, h3,……..hn}
• Now, P(h) = Probability of hypothesis (h)
• P(D|h) = 1 , if di = h(xi)
0 , otherwise
P(D|h) = Conditional probability of data(D) when hypothesis (h) is given
di = Data Value
Xi = Variable Value
P(h|D) = 1 . 1/|H|
P(D)
• But P(D) = |VS H,D| / |H|
• Now, putting thois value in above equation ,
• P(h|D) = 1
• |VS H,D|
• Where |VS H,D| is called the version space of hypothesis set(H)
BAYE’S OPTIMAL CLASSIFIER
• It is a “Probabilistic Model” which makes the most probable prediction for a new
example.
• Equation is
• Question : We have been given dataset for weather condition with two
columns in which one has a value of weather condition and the other
column reports regarding whether player has gone for playing or not.
Find the probability of player going for play on sunny day.
0 Outlook Play
0. Rainy Yes
1. Sunny Yes
2. Overcast Yes
3. Overcast Yes
4. Sunny No
5. Rainy Yes
6. Sunny Yes
7. Overcast Yes
8. Rainy No
9. Sunny No
10. Sunny Yes
11. Rainy No
12. Overcast Yes
13. Overcast Yes
Solution: Frequency Table
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 5
Make Likelihood Table :
• SVM classifier is a frontier that best segregates the two classes (Hyper plane/
line)
Support Vector Machine Terminology
1.Hyperplane: Hyperplane is the decision boundary that is used to
separate the data points of different classes in a feature space. In
the case of linear classifications, it will be a linear equation i.e.
wx+b = 0.
2.Support Vectors: Support vectors are the closest data points to
the hyperplane, which makes a critical role in deciding the
hyperplane and margin.
3.Margin: Margin is the distance between the support vector and
hyperplane. The main objective of the support vector machine
algorithm is to maximize the margin. The wider margin indicates
better classification performance.
Support Vector Machine Terminology
1. Kernel: Kernel is the mathematical function, which is used in SVM to map the
original input data points into high-dimensional feature spaces, so, that the
hyperplane can be easily found out even if the data points are not linearly
separable in the original input space. Some of the common kernel functions are
linear, polynomial, radial basis function(RBF), and sigmoid.
2. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane
is a hyperplane that properly separates the data points of different categories
without any misclassifications.
3. Soft Margin: When the data is not perfectly separable or contains outliers, SVM
permits a soft margin technique. Each data point has a slack variable introduced
by the soft-margin SVM formulation, which softens the strict margin
requirement and permits certain misclassifications or violations. It discovers a
compromise between increasing the margin and reducing violations.
Types of SVM
• Linear SVM
Non Linear SVM
UNIT II
KERNAL
Pallavi Shukla
Assistant Professor
KERNAL
- It is the mathematical function, which is used in SVM to map the original input
data points into high-dimensional feature spaces, so, that the hyperplane can be
easily found out even if the data points are not linearly separable in the original
input space.
- Some of the common kernel functions are linear, polynomial, radial basis
function(RBF), and sigmoid.
• KERNAL are used to solve a non-linear problem by using a linear classifier.
• The amazing thing about kernel is that we can go to higher dimensions
and perform smooth calculations with the help of it.
• We can go up to an infinite number of dimensions using kernels.
• Sometimes, we cannot have a hyperplane for certain problems. This
problem arises when we go up to higher dimensions and try to form a
hyperplane.
• A kernel helps to form the hyperplane in the higher dimension without
raising the complexity.
Characteristics of Kernel Function
• is the simplest and most commonly used kernel function, and it defines
the dot product between the input vectors in the original feature space.
• The linear kernel can be defined as:
• K(x, y) = x .y
• Where x and y are the input feature vectors.
• When using a linear kernel in an SVM, the decision boundary is a linear
hyperplane that separates the different classes in the feature space.
Polynomial Kernel
• This is in close relation with the previous kernel i.e. the Gaussian kernel
with the only difference is — the square of the norm is removed.
•
Laplace Kernel