0% found this document useful (0 votes)
12 views13 pages

ML Mod 2

ml

Uploaded by

neha1831sewani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views13 pages

ML Mod 2

ml

Uploaded by

neha1831sewani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module 02:

Linear Regression

Module 02: 1
Gini index

Module 02: 2
Calculate gini of last column i.e decision

Then find for money column

Module 02: 3
Module 02: 4
The least value o.e Parents is root node

Module 02: 5
Confusion matrix

Module 02: 6
Linear Regression is a machine learning algorithm based on supervised
regression algorithm. Regression models a target prediction value based on
independent variables. It is mostly used for finding out the relationship between
variables and forecasting. Different regression models differ based on – the
kind of relationship between the dependent and independent variables, they
are considering and the number of independent variables being used. Logistic
regression is basically a supervised classification algorithm. In a
classification problem, the target variable(or output), y, can take only discrete
values for a given set of features(or inputs), X.

Sl.No. Linear Regression Logistic Regression

Linear Regression is a Logistic Regression is a supervised


1.
supervised regression model. classification model.

Equation of linear regression:y Equation of logistic regressiony(x) =


= a0 + a1x1 + a2x2 + … + e(a0 + a1x1 + a2x2 + … + aixi) / (1 +
aixiHere,y = response e(a0 + a1x1 + a2x2 + … +
2.
variablexi = ith predictor aixi))Here,y = response variablexi =
variableai = average effect on ith predictor variableai = average
y as xi increases by 1 effect on y as xi increases by 1

In Linear Regression, we
In Logistic Regression, we predict
3. predict the value by an integer
the value by 1 or 0.
number.

Here activation function is used to


Here no activation function is
4. convert a linear regression equation
used.
to the logistic regression equation

Here no threshold value is


5. Here a threshold value is added.
needed.

6. Here we calculate Root Mean Here we use precision to predict the


Square Error(RMSE) to predict next weight value.

Module 02: 7
the next weight value.

Here the dependent variable


Here dependent variable consists of only two categories.
should be numeric and the Logistic regression estimates the
7.
response variable is odds outcome of the dependent
continuous to value. variable given a set of quantitative or
categorical independent variables.

It is based on the least square It is based on maximum likelihood


8.
estimation. estimation.

Any change in the coefficient leads


to a change in both the direction and
Here when we plot the training
the steepness of the logistic
datasets, a straight line can be
9. function. It means positive slopes
drawn that touches maximum
result in an S-shaped curve and
plots.
negative slopes result in a Z-shaped
curve.

Linear regression is used to


estimate the dependent Whereas logistic regression is used
variable in case of a change in to calculate the probability of an
10.
independent variables. For event. For example, classify if tissue
example, predict the price of is benign or malignant.
houses.

Linear regression assumes the Logistic regression assumes the


11. normal or gaussian distribution binomial distribution of the
of the dependent variable. dependent variable.

Applications of logistic regression:


Applications of linear
• Medicine
regression:
• Credit scoring
12. • Financial risk assessment
• Hotel Booking
• Business insights
• Gaming
• Market analysis
• Text editing

Which performance measure?

Accuracy is a great measurebut only when you have symmetric datasets (false
negatives &
false positives counts are close), also, false negatives & false positives have
similar costs.
If the cost of false positives and false negatives are different then F1 is your

Module 02: 8
savior. F1 is best if you
have an uneven class distribution.
Precision is how sure you are of your true positives whilst recall is how sure you
are that you are not
missing any positives.
Choose Recallif the idea of false positives is far better than false negatives, in
other words, if the
occurrence of false negatives is unaccepted/intolerable, that you’d rather get
some extra false
positives(false alarms) over saving some false negatives, like in our diabetes
example.
You’d rather get some healthy people labeled diabetic over leaving a diabetic
person labeled healthy.
Choose Precision if you want to be more confident of your true positives. for
example, Spam
emails. You’d rather have some spam emails in your inbox rather than some
regular emails in your
spam box. So, the email company wants to be extra sure that email Y is spam
before they put it in the
spam box and you never get to see it.
Choose Specificity if you want to cover all true negatives, meaning you don’t
want any false
alarms, you don’t want any false positives. for example, you’re running a drug
test in which all people
who test positive will immediately go to jail, you don’t want anyone drug-free
going to jail. False
positives here are intolerable.
● Accuracy value of 90% means that 1 of every 10 labels is incorrect, and 9 is
correct.
● Precision value of 80% means that on average, 2 of every 10 diabetic labeled
person by our program is
healthy, and 8 is diabetic.
● Recall value is 70% means that 3 of every 10 diabetic people in reality are
missed by our program and 7
labeled as diabetic.
● Specificity value is 60% means that 4 of every 10 healthy people in reality
are miss-labeled as diabetic
and 6 are correctly labeled as healthy.

Module 02: 9
Often, we choose Model Accuracy to evaluate the model. It’s a
popular choice because it is very easy to understand and explain.
Accuracy coincides well with the general aim of building a
classification model, i.e. to predict the class of new
observations accurately.
Accuracy might not be the best model evaluation metric every time.
It can convey the health of a model well only when all the classes
have similar prevalence in the data.
Say wewerepredictingifanasteroidwill hittheearth?
If our model says NO every time, it will be highly accurate but it
would not be of much value to us. The number of asteroids that will
hit the earth is very low but missing even one of them might prove
very costly. When the classes’ distribution is imbalanced, accuracy is
not a goodmodelevaluation metric

LOGISTIC REGRESSION

Module 02: 10
Find entropy:

Module 02: 11
Module 02: 12
Module 02: 13

You might also like