SlideShare a Scribd company logo
Dr Nisha Arora
Logistic Regression using SPSS
2
Object-wise Analysis
4
Steps to select appropriate statistical
test
 Define clearly the objective of the
study
 Define the level of measurement
(metric/non-metric) of each variable
to be included in the analysis.
5
Selecting the appropriate technique
10
Bivariate techniques
Response Variable (DV)
Explanatory
Variable
(IDV)
Metric Non-metric
Metric Regression Logistic
Regression/
LDA
Non-metric Dummy Var
Reg./
Hypothesis
Test*
Chi-square
test
Make sure to check all assumptions before applying any statistical
technique.
Selecting the appropriate technique
12
Response Variable(s) (DVs)
One DV More than
one DV
Explanatory
Variable(s)
(IDVs)
One IDV
Metric Non-metric Metric
Metric Simple
Regression
Binary/Multi
Nominal
(Logistic) Reg
Path
Analysis
Non-metric t test/Anova Chi Square
Test
Manova
More
than one
IDV
All Metric Multiple Reg Multiple Logit
Reg/Multiple
Multinominal
Path
Analysis
All Non-
metric
n – way Anova Complex
Crosstab/
Log-linear
analysis
n – way
Manova
Mixed n – way
Ancova/Dumm
y var
Multiple Logit
Reg/Multiple
Multinominal
n– way
Mancova
Selecting the appropriate Technique
13
Binary (Binomial) Logistic Regression
Multi-Nominal Logistic Regression
Ordinal Logistic regression
Poisson Regression
• Response has only two 2 possible outcomes.
• E.g.: Spam or Not
Binary
• Three or more categories without ordering.
• E.g.: Predicting which food is preferred more
(Veg, Non-Veg, Vegan)
Multinominal
• Three or more categories with ordering.
• E.g.: Movie rating from 1 to 5
Ordinal
14
Types of Logistic Regression
Prediction or Classification?
15
16
Types of Classification Problems
Multi-Label
Classification
Multi-Class
Classification
Binary
Classification
17
 To predict in advance whether a product launch will be
successful or not
 An online banking service must be able to determine whether or
not a transaction being performed on the site is fraudulent
 Benign or malignant tumor
 Spam detection
 Movies genres classification
Classification Problem
Box-Tidwell Test
 In the model, include interactions between the continuous predictors and
their logs.
 If such an interaction is significant, then the assumption has been
violated.
 If any interaction is significant, try adding to the model powers of the
predictor (that is, going polynomial)
Caution:
 Not a very robust test as it gets affected by sample size.
 You should not be very concerned with a just significant interaction
when sample sizes are large.
Assumptions of Logit Regression
 Binary response variable with mutually exclusive and exhaustive
categories.
 One or more predictor variable(s)
 Independent Observations
 linear relationship between continuous independent variable(s) and
the logit transformation of the dependent variable
 This assumption can be tested by using Box-Tidwell Test.
 including in the model interactions between the continuous
predictors and their logs. If such an interaction is significant, then
the assumption has been violated.
Assumptions of Logit Regression
 Binary response variable with mutually exclusive and exhaustive
categories.
 One or more predictor variable(s)
 Independent Observations
 linear relationship between continuous independent variable(s) and
the logit transformation of the dependent variable
What about Co-linearity, Perfect co-linearity, and Multi-co-linearity?
https://stats.stackexchange.com/a/432543/79100
More Discussion on Multi-Colinearity
• What happens when you’ve Multi-Colinearity?
Multicollinearity isn't as deleterious for prediction but may affect variable’s
Significance
https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not-
checked-in-modern-statistics-machine-learning
• Can you safely ignore Multi-Colineairty?
https://statisticalhorizons.com/multicollinearity
• How to handle Multi-colinearity?
https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580e
f132ed99e1c1046fcf01
• Why not to use STEP_WISE method?
http://www.philender.com/courses/linearmodels/notes4/swprobs.html
http://www.danielezrajohnson.com/stepwise.pdf
22
Assumptions _ More Considerations
 Logistic regression typically requires a large sample size because they
use maximum likelihood estimation techniques. [maximum likelihood
estimates are less powerful at low sample sizes than ordinary least
square].
 It is also important to keep in mind that when the outcome is rare, even
if the overall dataset is large, it can be difficult to estimate a logit model.
 Empty cells or small cells: You should check for empty or small cells
by doing a crosstab between categorical predictors and the outcome
variable. If a cell has very few cases (a small cell), the model may
become unstable or it might not run at all.
26
Why can’t we use Linear
Regression for
Classification Problems?
Why not Linear Regression?
29
What is Logistic Regression?
The Logistic Regression Curve is
called as “Sigmoid Curve”, also
known as S-Curve
How to decide whether
the value is 0 or 1 from
this curve?
Set a
threshold
 Default - 0.5
 Based on group sizes (as we do in LDA)
 Based on performance evaluation matrix using cross validation
31
How to set a threshold?
Logistic Regression Equation
Rather than modeling this response Y directly,
Logistic regression models the probability that Y belongs
to a particular category.
 P(Y =1 | X) or P(X) can take values from 0 to 1
n
n X
X
X
P
X
P


 












...
)
(
1
)
(
log 1
1
0
How to
interpret
?
35
Logistic Regression Equation
 Alternatively, we can write
 Or
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
 
 
n
n
n
n
X
X
X
X
e
e
X
P 












 ...
...
1
1
0
1
1
0
1
)
(
Logistic Regression Equation
 Alternatively, we can write
 Or
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
 
 
n
n
n
n
X
X
X
X
e
e
X
P 












 ...
...
1
1
0
1
1
0
1
)
(
Understanding the Odds
Exp(B) represents the ratio-change in the odds of the event of interest for a one-unit
change in the predictor.
n
n X
X
e
X
P
X
P 

 




...
1
1
0
)
(
1
)
(
39
Odds & Odds Ratio
Understanding Odds
Logit = log (Odds) = Log (p/1-p)
= log (probability of event happening/ probability of
event not happening)
Odds Ratio/ OR =
0
0
|
_
_
_
_
_
1
|
_
_
_
_
_
X
X
Y
event
of
favor
in
Odds
X
X
Y
event
of
favor
in
Odds



Interpreting the coefficients
41
Response: default [Y/N] Predictor: [Account] balance
Estimated coefficients of the logistic regression model that predicts the
probability of default using balance.
A one-unit increase in balance is associated with
An increase in the log odds of default by 0.0055 units. OR
A change in odds by exp(0.0055), i.e., 1.0055
Interpreting the coefficients
42
Probability of default for an individual with a balance of $1, 000 is
Probability of default for an individual with a balance of $2, 000 is
%
576
.
0
00576
.
0
1 1000
*
0055
.
0
6513
.
10
1000
*
0055
.
0
6513
.
10


 



e
e
%
6
.
58
586
.
0
1 2000
*
0055
.
0
6513
.
10
2000
*
0055
.
0
6513
.
10


 



e
e
43
Parameter Estimation
Maximum Likelihood Estimation
The objective: Not to “correctly” estimate the logit, but to make better
classification.
Parameters should take values which result in such a score [probabilities or p]
which enables us to have a good cutoff.
Meaning this “score” should be high for one class and low for another
If P(Yi = 1|Xi) = P(Xi) = Pi, then
To maximize collective form of this function for all observations
Maximum Likelihood Estimation
  
i
i Y
i
Y
i
i P
P
L



1
1
*
n
i
i
L
Max
MaxL
1



Log Likelihood Function
45
Let’s See It In Action
46
Note Points
 For a standard logistic regression you should ignore
the Previous and Next buttons because they are for sequential (hierarchical)
logistic regression.
 The Method: option needs to be kept at the default value, which is Enter Method.
 The "Enter" method is the name given by SPSS Statistics to standard regression
analysis.
 SPSS Statistics requires you to define all the categorical predictor values in the
logistic regression model. It does not do this automatically.
 The default behaviour in SPSS Statistics is for the last category (numerically) to
be selected as the reference category.
 If we change the method from Enter to Forward: Wald the quality of the logistic
regression improves. Now only the significant coefficients are included in the
logistic regression equation.
47 https://statistics.laerd.com/
48
Interpretation of SPSS Output
49
50
51
52
53
We do not report this
55
Omnibus Test Output
56
Omnibus Test Output
57
58
59
60
61
62
63
64
Using ROC to find Optimal Cut-Off
65
How to report the results SPSS
A logistic regression was performed to ascertain the effects of x, y, and gender on the
likelihood that participants to have the event (positive response).
1. The logistic regression model was statistically significant, χ2(df) = 28.605, p < 0.05
[Omninus test]
2. A non-significant test result (p=0.78) of Hosmer Lemeshow test is an indicator of
good model fit.
3. The psudeo R2 measures for explained variations are: 56.4% (Cox & Snell R2) and
67.8% (Nagelkerke R2) [For validation data, psudeo R2 …]
… … … … … … … … … … … … … … … …
66
How to report the results SPSS
1. The model correctly classified 81.0% (model accuracy) of cases for the training
data set and 76 % of cases for validation data set.
[The data set was randomly divided into training & validation set with 70%
observation into training and rest of the observations into the validation set.]
2. The model specificity
3. Sensitivity
At cut-off value
2. ROC curve was used to optimize cut-off point
… … … … … … … … … … … … … … … …
67
How to report the results SPSS
The results from the "Variables in the Equation" table, including which of the
predictor variables were statistically significant and what predictions can be made
based on the use of odds ratios. E.g.,
Males were 6.02 times more likely to do this (event) than females.
Increasing x was associated with an decrease in likelihood of the event, but increasing y
was associated with a reduction in the likelihood of the event
… … … … … … … … … … … … … … … …
68
How to report the results SPSS
Box-Tidwell (1962) Test:
69
70
Source: Andy Field
72
Model Evaluation
Evaluation Matrices
 AIC
 Null & Residual Deviance
 Accuracy & Misclassification Error
 Sensitivity & Specificity
 ROC & AUC
 Precision & Recall
 Lift & Gain
 KS Statistics
 F Scores
 FDR & FOR
 FPR & FNR
 Hosmer Lemeshow Test
 Customized function specific to business requirement
https://learnerworld.tumblr.com/
How to choose
appropriate
evaluation
matrices?
ROC Curve_Applet
74
https://kennis-research.shinyapps.io/ROC-
Curves/
Other Considerations
 Categorical Predictors
 Accuracy Paradox
 Balanced, Unbalanced & Rare Event Data
 Complete or Quasi-separation
 Psudo R2 Measures
 Multinomial and Ordinal Logistic Regression Homoskedasticity
is not an
assumption in
logistic
regression
76
Psudo R2 Measures
Evaluation Matrices
 Efron’s R2
 McFadden’s R2
 McFadden’s Adjusted R2
 Cox & Snell R2
 Nagelkerke / Cragg & Uhler’s R2
 McKelvey & Zavoina R2
 Count R2
 Adjusted Count R2
 https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are-
pseudo-r-squareds/
References
78
 Field, A. P. (2013). Discovering statistics using IBM SPSS
Statistics: and sex and drugs and rock 'n' roll (fourth edition).
London: Sage publications.
 Field, A. P., Miles, J. N. V., & Field, Z. C. (2012). Discovering
statistics using R: and sex and drugs and rock 'n' roll. London: Sage
publications.
 Field, A. P. & Miles, J. N. V. (2010). Discovering statistics using
SAS: and sex and drugs and rock 'n' roll. London: Sage
publications.
 Kothri, C. R. (2004). Research methodology : methods &
techniques. New Age publications.
79
My Interesting answers/posts
To understand results of logistics regression or other classifiers
https://learnerworld.tumblr.com/post/152327498485/enjoystatisticswith
mebinaryclassifierperformance
Hypothesis testing in layman’s terms
https://learnerworld.tumblr.com/search/hypothesis
Understanding mediation effect
https://learnerworld.tumblr.com/post/146541892120/mediation-
effectenjoystatisticswtihme
80
My Interesting answers/posts
Dependence Vs Correlation
https://www.quora.com/What-is-the-difference-between-dependence-
and-correlation/answer/Nisha-Arora-9
Co-linearity & Correlation
https://www.quora.com/In-statistics-what-is-the-difference-between-
collinearity-and-correlation/answer/Nisha-Arora-9
81
My Expertise
Technical Topics:
 Python for Data Science or Data Analysis
 R Programming
 Data Visualization & Storytelling
 Machine Learning/Data Science
 Statistics [For researchers/Data Science practitioners/ university
students] _Theory/mathematical proofs/application based/using
interactive tools/playing with data using some software
 Data Analysis using SPSS
 Mathematics [Don't want to write too much but depends on what is the
requirement]
 Excel [Basic to intermediate/tools for data analysis/operations
research/operations management/specific course for academicians, etc]
To know more about these,
click here
82
My Expertise
Non-technical Topics:
 Interactive pedagogical tools/web resources
 The art of effective use of Information & Communication Tools (ICT)
 Tools/Platform for hosting online lectures/meetings/live sessions
 Effective Googling for finding the right resources (books/ research
papers/ answers)
 Leveraging online research communities, Q/A sites, groups, meet-ups
to dive deep in a particular topic of interest
 Bridging the gap between industry & academia
 Creating a personal brand by leveraging power of social media
 Getting smart with MS Office (Word, Excel, Power Point, etc.)
 Learning Google products (mentioned in the slide)
 Learning how to learn
 Learning how to teach
 Note Taking
 Effective Communication & Presentation
Dr.aroranisha@gmail.com
Follow me
http://stats.stackexchange.com/users/79100/nisha-arora
http://stackoverflow.com/users/5114585/nisha-arora
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
http://learnerworld.tumblr.com/
https://www.slideshare.net/NishaArora1
https://www.youtube.com/channel/UCniyhvrD_8AM2jXki3eEErw/videos
https://www.linkedin.com/in/drnishaarora/
Any other topic which you want to hear/learn from me?
Feel free to leave a comment on my YouTube or mail me at
dr.aroranisha@gmail.com
Thank You
7. logistics regression using spss
References
87
http://machinelearningmastery.com/
https://www.analyticsvidhya.com/
http://www.analyticbridge.com/
http://www.datasciencecentral.com/
https://www.kaggle.com/
http://stats.stackexchange.com
http://datascience.stackexchange.com/
https://www.researchgate.net
https://www.quora.com
https://github.com/
88
Reach Out to Me
http://stats.stackexchange.com/users/79100/nisha-arora
https://www.researchgate.net/profile/Nisha_Arora2/contributions
https://www.quora.com/profile/Nisha-Arora-9
http://learnerworld.tumblr.com/
nishaarora4@gmail.com
Thank You

More Related Content

PPTX
Logistic regression with SPSS examples
Gaurav Kamboj
 
PPT
Multinomial logisticregression basicrelationships
Anirudha si
 
DOCX
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
PDF
Logistic regression
Venkata Reddy Konasani
 
PPTX
Logistic regression with SPSS
LNIPE
 
PDF
Ordinal logistic regression
Dr Athar Khan
 
PDF
Binary OR Binomial logistic regression
Dr Athar Khan
 
PDF
Multinomial Logistic Regression
Dr Athar Khan
 
Logistic regression with SPSS examples
Gaurav Kamboj
 
Multinomial logisticregression basicrelationships
Anirudha si
 
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
Logistic regression
Venkata Reddy Konasani
 
Logistic regression with SPSS
LNIPE
 
Ordinal logistic regression
Dr Athar Khan
 
Binary OR Binomial logistic regression
Dr Athar Khan
 
Multinomial Logistic Regression
Dr Athar Khan
 

What's hot (20)

PDF
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
 
PPTX
Logistic regression
DrZahid Khan
 
PPTX
Logistic regression
DrZahid Khan
 
PPTX
Regression analysis
Teachers Mitraa
 
PPTX
Logistic regression
saba khan
 
PPTX
Regression Analysis
Shiela Vinarao
 
PDF
Logistic regression
VARUN KUMAR
 
PPT
Logistic regression (blyth 2006) (simplified)
MikeBlyth
 
PPT
Logistic regression
Khaled Abd Elaziz
 
PDF
Descriptive Statistics
CIToolkit
 
PPTX
Polynomial regression
naveedaliabad
 
PPTX
Simple Linear Regression: Step-By-Step
Dan Wellisch
 
PPTX
Multiple Linear Regression
Indus University
 
PPTX
Multinomial Logistic Regression Analysis
HARISH Kumar H R
 
PPTX
Regression analysis
saba khan
 
PPT
Simple Linear Regression
Yesica Adicondro
 
PPTX
Data Analysis and Statistics
T.S. Lim
 
PPTX
Regression
Sauravurp
 
PPTX
Logistic Regression
zekeLabs Technologies
 
PPTX
Presentation On Regression
alok tiwari
 
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
 
Logistic regression
DrZahid Khan
 
Logistic regression
DrZahid Khan
 
Regression analysis
Teachers Mitraa
 
Logistic regression
saba khan
 
Regression Analysis
Shiela Vinarao
 
Logistic regression
VARUN KUMAR
 
Logistic regression (blyth 2006) (simplified)
MikeBlyth
 
Logistic regression
Khaled Abd Elaziz
 
Descriptive Statistics
CIToolkit
 
Polynomial regression
naveedaliabad
 
Simple Linear Regression: Step-By-Step
Dan Wellisch
 
Multiple Linear Regression
Indus University
 
Multinomial Logistic Regression Analysis
HARISH Kumar H R
 
Regression analysis
saba khan
 
Simple Linear Regression
Yesica Adicondro
 
Data Analysis and Statistics
T.S. Lim
 
Regression
Sauravurp
 
Logistic Regression
zekeLabs Technologies
 
Presentation On Regression
alok tiwari
 
Ad

Similar to 7. logistics regression using spss (20)

PPTX
Logistic-regression.pptx
sherinjoyson
 
PPTX
basics of Logistic-regression power point presentation
DharmishthaChaudhari
 
PPT
Logistic regression and analysis using statistical information
AsadJaved304231
 
PPTX
Logistic Regression.pptx
Muskaan194530
 
PDF
the unconditional Logistic Regression .pdf
mikaelgirum
 
PDF
Logistic regression
Rupak Roy
 
PPT
RegressionwithABinaryDependentVariables.ppt
ssuser69ff25
 
PDF
Logistic-Regression-Webinar.pdf
VishaliKalra2
 
PPTX
Logistic Regression in machine learning ppt
raminder12_kaur
 
PPTX
7. The sixCategorical data analysis.pptx
AbasAhmed7
 
PPTX
Group 20_Logistic Regression devara.pptx
sriaditya070304
 
PDF
Regression-Logistic-4.pdf
jiregnaetichadako
 
PDF
Logistic regression sage
Pakistan Gum Industries Pvt. Ltd
 
PPTX
logistic regression in Data science Presentation
ARUN R S
 
PPTX
Logistics Regression Using Python.pptx
SharmilaMore5
 
PPTX
PPT_logistic regression.pptx
CoePHNNITR
 
PPTX
Linear Regression and Logistic Regression in ML
Kumud Arora
 
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
PDF
3ml.pdf
MianAdnan27
 
PPTX
Predictive analytics and Type of Predictive Analytics
Abhishek Job
 
Logistic-regression.pptx
sherinjoyson
 
basics of Logistic-regression power point presentation
DharmishthaChaudhari
 
Logistic regression and analysis using statistical information
AsadJaved304231
 
Logistic Regression.pptx
Muskaan194530
 
the unconditional Logistic Regression .pdf
mikaelgirum
 
Logistic regression
Rupak Roy
 
RegressionwithABinaryDependentVariables.ppt
ssuser69ff25
 
Logistic-Regression-Webinar.pdf
VishaliKalra2
 
Logistic Regression in machine learning ppt
raminder12_kaur
 
7. The sixCategorical data analysis.pptx
AbasAhmed7
 
Group 20_Logistic Regression devara.pptx
sriaditya070304
 
Regression-Logistic-4.pdf
jiregnaetichadako
 
Logistic regression sage
Pakistan Gum Industries Pvt. Ltd
 
logistic regression in Data science Presentation
ARUN R S
 
Logistics Regression Using Python.pptx
SharmilaMore5
 
PPT_logistic regression.pptx
CoePHNNITR
 
Linear Regression and Logistic Regression in ML
Kumud Arora
 
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Simplilearn
 
3ml.pdf
MianAdnan27
 
Predictive analytics and Type of Predictive Analytics
Abhishek Job
 
Ad

More from Dr Nisha Arora (15)

PDF
1. python for data science
Dr Nisha Arora
 
PDF
What do corporates look for in a data science candidate?
Dr Nisha Arora
 
PDF
Statistical Inference /Hypothesis Testing
Dr Nisha Arora
 
PDF
4 Descriptive Statistics with R
Dr Nisha Arora
 
PDF
3 Data Structure in R
Dr Nisha Arora
 
PDF
2 data types and operators in r
Dr Nisha Arora
 
PDF
My talk_ Using data to get business insights
Dr Nisha Arora
 
PDF
Discriminant analysis using spss
Dr Nisha Arora
 
PDF
Unsupervised learning clustering
Dr Nisha Arora
 
PDF
Cluster analysis using spss
Dr Nisha Arora
 
PPTX
5 mistakes you might be making as a teacher
Dr Nisha Arora
 
PDF
Data visualization & Story Telling with Data
Dr Nisha Arora
 
PDF
1 machine learning demystified
Dr Nisha Arora
 
PDF
1 introduction to data science
Dr Nisha Arora
 
PDF
1 installing & Getting Started with R
Dr Nisha Arora
 
1. python for data science
Dr Nisha Arora
 
What do corporates look for in a data science candidate?
Dr Nisha Arora
 
Statistical Inference /Hypothesis Testing
Dr Nisha Arora
 
4 Descriptive Statistics with R
Dr Nisha Arora
 
3 Data Structure in R
Dr Nisha Arora
 
2 data types and operators in r
Dr Nisha Arora
 
My talk_ Using data to get business insights
Dr Nisha Arora
 
Discriminant analysis using spss
Dr Nisha Arora
 
Unsupervised learning clustering
Dr Nisha Arora
 
Cluster analysis using spss
Dr Nisha Arora
 
5 mistakes you might be making as a teacher
Dr Nisha Arora
 
Data visualization & Story Telling with Data
Dr Nisha Arora
 
1 machine learning demystified
Dr Nisha Arora
 
1 introduction to data science
Dr Nisha Arora
 
1 installing & Getting Started with R
Dr Nisha Arora
 

Recently uploaded (20)

PDF
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
PPTX
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
PPTX
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
PDF
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
PPTX
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
DOCX
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
PPTX
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
DOCX
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 
Review of Related Literature & Studies.pdf
Thelma Villaflores
 
An introduction to Dialogue writing.pptx
drsiddhantnagine
 
family health care settings home visit - unit 6 - chn 1 - gnm 1st year.pptx
Priyanshu Anand
 
BÀI TẬP TEST BỔ TRỢ THEO TỪNG CHỦ ĐỀ CỦA TỪNG UNIT KÈM BÀI TẬP NGHE - TIẾNG A...
Nguyen Thanh Tu Collection
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Tips Management in Odoo 18 POS - Odoo Slides
Celine George
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
HEALTH CARE DELIVERY SYSTEM - UNIT 2 - GNM 3RD YEAR.pptx
Priyanshu Anand
 
How to Track Skills & Contracts Using Odoo 18 Employee
Celine George
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
Unit 5: Speech-language and swallowing disorders
JELLA VISHNU DURGA PRASAD
 
BASICS IN COMPUTER APPLICATIONS - UNIT I
suganthim28
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Artificial-Intelligence-in-Drug-Discovery by R D Jawarkar.pptx
Rahul Jawarkar
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
pgdei-UNIT -V Neurological Disorders & developmental disabilities
JELLA VISHNU DURGA PRASAD
 

7. logistics regression using spss

  • 1. Dr Nisha Arora Logistic Regression using SPSS
  • 2. 2
  • 3. Object-wise Analysis 4 Steps to select appropriate statistical test  Define clearly the objective of the study  Define the level of measurement (metric/non-metric) of each variable to be included in the analysis.
  • 4. 5
  • 5. Selecting the appropriate technique 10 Bivariate techniques Response Variable (DV) Explanatory Variable (IDV) Metric Non-metric Metric Regression Logistic Regression/ LDA Non-metric Dummy Var Reg./ Hypothesis Test* Chi-square test Make sure to check all assumptions before applying any statistical technique.
  • 6. Selecting the appropriate technique 12 Response Variable(s) (DVs) One DV More than one DV Explanatory Variable(s) (IDVs) One IDV Metric Non-metric Metric Metric Simple Regression Binary/Multi Nominal (Logistic) Reg Path Analysis Non-metric t test/Anova Chi Square Test Manova More than one IDV All Metric Multiple Reg Multiple Logit Reg/Multiple Multinominal Path Analysis All Non- metric n – way Anova Complex Crosstab/ Log-linear analysis n – way Manova Mixed n – way Ancova/Dumm y var Multiple Logit Reg/Multiple Multinominal n– way Mancova
  • 7. Selecting the appropriate Technique 13 Binary (Binomial) Logistic Regression Multi-Nominal Logistic Regression Ordinal Logistic regression Poisson Regression
  • 8. • Response has only two 2 possible outcomes. • E.g.: Spam or Not Binary • Three or more categories without ordering. • E.g.: Predicting which food is preferred more (Veg, Non-Veg, Vegan) Multinominal • Three or more categories with ordering. • E.g.: Movie rating from 1 to 5 Ordinal 14 Types of Logistic Regression
  • 10. 16 Types of Classification Problems Multi-Label Classification Multi-Class Classification Binary Classification
  • 11. 17  To predict in advance whether a product launch will be successful or not  An online banking service must be able to determine whether or not a transaction being performed on the site is fraudulent  Benign or malignant tumor  Spam detection  Movies genres classification Classification Problem
  • 12. Box-Tidwell Test  In the model, include interactions between the continuous predictors and their logs.  If such an interaction is significant, then the assumption has been violated.  If any interaction is significant, try adding to the model powers of the predictor (that is, going polynomial) Caution:  Not a very robust test as it gets affected by sample size.  You should not be very concerned with a just significant interaction when sample sizes are large.
  • 13. Assumptions of Logit Regression  Binary response variable with mutually exclusive and exhaustive categories.  One or more predictor variable(s)  Independent Observations  linear relationship between continuous independent variable(s) and the logit transformation of the dependent variable  This assumption can be tested by using Box-Tidwell Test.  including in the model interactions between the continuous predictors and their logs. If such an interaction is significant, then the assumption has been violated.
  • 14. Assumptions of Logit Regression  Binary response variable with mutually exclusive and exhaustive categories.  One or more predictor variable(s)  Independent Observations  linear relationship between continuous independent variable(s) and the logit transformation of the dependent variable What about Co-linearity, Perfect co-linearity, and Multi-co-linearity? https://stats.stackexchange.com/a/432543/79100
  • 15. More Discussion on Multi-Colinearity • What happens when you’ve Multi-Colinearity? Multicollinearity isn't as deleterious for prediction but may affect variable’s Significance https://stats.stackexchange.com/questions/168622/why-is-multicollinearity-not- checked-in-modern-statistics-machine-learning • Can you safely ignore Multi-Colineairty? https://statisticalhorizons.com/multicollinearity • How to handle Multi-colinearity? https://www.researchgate.net/post/how_to_deal_with_multicolinearity#view=580e f132ed99e1c1046fcf01 • Why not to use STEP_WISE method? http://www.philender.com/courses/linearmodels/notes4/swprobs.html http://www.danielezrajohnson.com/stepwise.pdf
  • 16. 22
  • 17. Assumptions _ More Considerations  Logistic regression typically requires a large sample size because they use maximum likelihood estimation techniques. [maximum likelihood estimates are less powerful at low sample sizes than ordinary least square].  It is also important to keep in mind that when the outcome is rare, even if the overall dataset is large, it can be difficult to estimate a logit model.  Empty cells or small cells: You should check for empty or small cells by doing a crosstab between categorical predictors and the outcome variable. If a cell has very few cases (a small cell), the model may become unstable or it might not run at all.
  • 18. 26 Why can’t we use Linear Regression for Classification Problems?
  • 19. Why not Linear Regression?
  • 20. 29
  • 21. What is Logistic Regression? The Logistic Regression Curve is called as “Sigmoid Curve”, also known as S-Curve How to decide whether the value is 0 or 1 from this curve? Set a threshold
  • 22.  Default - 0.5  Based on group sizes (as we do in LDA)  Based on performance evaluation matrix using cross validation 31 How to set a threshold?
  • 23. Logistic Regression Equation Rather than modeling this response Y directly, Logistic regression models the probability that Y belongs to a particular category.  P(Y =1 | X) or P(X) can take values from 0 to 1 n n X X X P X P                 ... ) ( 1 ) ( log 1 1 0 How to interpret ?
  • 24. 35
  • 25. Logistic Regression Equation  Alternatively, we can write  Or n n X X e X P X P         ... 1 1 0 ) ( 1 ) (     n n n n X X X X e e X P               ... ... 1 1 0 1 1 0 1 ) (
  • 26. Logistic Regression Equation  Alternatively, we can write  Or n n X X e X P X P         ... 1 1 0 ) ( 1 ) (     n n n n X X X X e e X P               ... ... 1 1 0 1 1 0 1 ) (
  • 27. Understanding the Odds Exp(B) represents the ratio-change in the odds of the event of interest for a one-unit change in the predictor. n n X X e X P X P         ... 1 1 0 ) ( 1 ) (
  • 28. 39 Odds & Odds Ratio
  • 29. Understanding Odds Logit = log (Odds) = Log (p/1-p) = log (probability of event happening/ probability of event not happening) Odds Ratio/ OR = 0 0 | _ _ _ _ _ 1 | _ _ _ _ _ X X Y event of favor in Odds X X Y event of favor in Odds   
  • 30. Interpreting the coefficients 41 Response: default [Y/N] Predictor: [Account] balance Estimated coefficients of the logistic regression model that predicts the probability of default using balance. A one-unit increase in balance is associated with An increase in the log odds of default by 0.0055 units. OR A change in odds by exp(0.0055), i.e., 1.0055
  • 31. Interpreting the coefficients 42 Probability of default for an individual with a balance of $1, 000 is Probability of default for an individual with a balance of $2, 000 is % 576 . 0 00576 . 0 1 1000 * 0055 . 0 6513 . 10 1000 * 0055 . 0 6513 . 10        e e % 6 . 58 586 . 0 1 2000 * 0055 . 0 6513 . 10 2000 * 0055 . 0 6513 . 10        e e
  • 33. Maximum Likelihood Estimation The objective: Not to “correctly” estimate the logit, but to make better classification. Parameters should take values which result in such a score [probabilities or p] which enables us to have a good cutoff. Meaning this “score” should be high for one class and low for another If P(Yi = 1|Xi) = P(Xi) = Pi, then To maximize collective form of this function for all observations Maximum Likelihood Estimation    i i Y i Y i i P P L    1 1 * n i i L Max MaxL 1   
  • 35. Let’s See It In Action 46
  • 36. Note Points  For a standard logistic regression you should ignore the Previous and Next buttons because they are for sequential (hierarchical) logistic regression.  The Method: option needs to be kept at the default value, which is Enter Method.  The "Enter" method is the name given by SPSS Statistics to standard regression analysis.  SPSS Statistics requires you to define all the categorical predictor values in the logistic regression model. It does not do this automatically.  The default behaviour in SPSS Statistics is for the last category (numerically) to be selected as the reference category.  If we change the method from Enter to Forward: Wald the quality of the logistic regression improves. Now only the significant coefficients are included in the logistic regression equation. 47 https://statistics.laerd.com/
  • 38. 49
  • 39. 50
  • 40. 51
  • 41. 52
  • 42. 53 We do not report this
  • 45. 57
  • 46. 58
  • 47. 59
  • 48. 60
  • 49. 61
  • 50. 62
  • 51. 63
  • 52. 64
  • 53. Using ROC to find Optimal Cut-Off 65
  • 54. How to report the results SPSS A logistic regression was performed to ascertain the effects of x, y, and gender on the likelihood that participants to have the event (positive response). 1. The logistic regression model was statistically significant, χ2(df) = 28.605, p < 0.05 [Omninus test] 2. A non-significant test result (p=0.78) of Hosmer Lemeshow test is an indicator of good model fit. 3. The psudeo R2 measures for explained variations are: 56.4% (Cox & Snell R2) and 67.8% (Nagelkerke R2) [For validation data, psudeo R2 …] … … … … … … … … … … … … … … … … 66
  • 55. How to report the results SPSS 1. The model correctly classified 81.0% (model accuracy) of cases for the training data set and 76 % of cases for validation data set. [The data set was randomly divided into training & validation set with 70% observation into training and rest of the observations into the validation set.] 2. The model specificity 3. Sensitivity At cut-off value 2. ROC curve was used to optimize cut-off point … … … … … … … … … … … … … … … … 67
  • 56. How to report the results SPSS The results from the "Variables in the Equation" table, including which of the predictor variables were statistically significant and what predictions can be made based on the use of odds ratios. E.g., Males were 6.02 times more likely to do this (event) than females. Increasing x was associated with an decrease in likelihood of the event, but increasing y was associated with a reduction in the likelihood of the event … … … … … … … … … … … … … … … … 68
  • 57. How to report the results SPSS Box-Tidwell (1962) Test: 69
  • 60. Evaluation Matrices  AIC  Null & Residual Deviance  Accuracy & Misclassification Error  Sensitivity & Specificity  ROC & AUC  Precision & Recall  Lift & Gain  KS Statistics  F Scores  FDR & FOR  FPR & FNR  Hosmer Lemeshow Test  Customized function specific to business requirement https://learnerworld.tumblr.com/ How to choose appropriate evaluation matrices?
  • 62. Other Considerations  Categorical Predictors  Accuracy Paradox  Balanced, Unbalanced & Rare Event Data  Complete or Quasi-separation  Psudo R2 Measures  Multinomial and Ordinal Logistic Regression Homoskedasticity is not an assumption in logistic regression
  • 64. Evaluation Matrices  Efron’s R2  McFadden’s R2  McFadden’s Adjusted R2  Cox & Snell R2  Nagelkerke / Cragg & Uhler’s R2  McKelvey & Zavoina R2  Count R2  Adjusted Count R2  https://stats.idre.ucla.edu/other/mult-pkg/faq/general/faq-what-are- pseudo-r-squareds/
  • 65. References 78  Field, A. P. (2013). Discovering statistics using IBM SPSS Statistics: and sex and drugs and rock 'n' roll (fourth edition). London: Sage publications.  Field, A. P., Miles, J. N. V., & Field, Z. C. (2012). Discovering statistics using R: and sex and drugs and rock 'n' roll. London: Sage publications.  Field, A. P. & Miles, J. N. V. (2010). Discovering statistics using SAS: and sex and drugs and rock 'n' roll. London: Sage publications.  Kothri, C. R. (2004). Research methodology : methods & techniques. New Age publications.
  • 66. 79 My Interesting answers/posts To understand results of logistics regression or other classifiers https://learnerworld.tumblr.com/post/152327498485/enjoystatisticswith mebinaryclassifierperformance Hypothesis testing in layman’s terms https://learnerworld.tumblr.com/search/hypothesis Understanding mediation effect https://learnerworld.tumblr.com/post/146541892120/mediation- effectenjoystatisticswtihme
  • 67. 80 My Interesting answers/posts Dependence Vs Correlation https://www.quora.com/What-is-the-difference-between-dependence- and-correlation/answer/Nisha-Arora-9 Co-linearity & Correlation https://www.quora.com/In-statistics-what-is-the-difference-between- collinearity-and-correlation/answer/Nisha-Arora-9
  • 68. 81 My Expertise Technical Topics:  Python for Data Science or Data Analysis  R Programming  Data Visualization & Storytelling  Machine Learning/Data Science  Statistics [For researchers/Data Science practitioners/ university students] _Theory/mathematical proofs/application based/using interactive tools/playing with data using some software  Data Analysis using SPSS  Mathematics [Don't want to write too much but depends on what is the requirement]  Excel [Basic to intermediate/tools for data analysis/operations research/operations management/specific course for academicians, etc] To know more about these, click here
  • 69. 82 My Expertise Non-technical Topics:  Interactive pedagogical tools/web resources  The art of effective use of Information & Communication Tools (ICT)  Tools/Platform for hosting online lectures/meetings/live sessions  Effective Googling for finding the right resources (books/ research papers/ answers)  Leveraging online research communities, Q/A sites, groups, meet-ups to dive deep in a particular topic of interest  Bridging the gap between industry & academia  Creating a personal brand by leveraging power of social media  Getting smart with MS Office (Word, Excel, Power Point, etc.)  Learning Google products (mentioned in the slide)  Learning how to learn  Learning how to teach  Note Taking  Effective Communication & Presentation [email protected]
  • 71. Any other topic which you want to hear/learn from me? Feel free to leave a comment on my YouTube or mail me at [email protected]
  • 75. 88 Reach Out to Me http://stats.stackexchange.com/users/79100/nisha-arora https://www.researchgate.net/profile/Nisha_Arora2/contributions https://www.quora.com/profile/Nisha-Arora-9 http://learnerworld.tumblr.com/ [email protected]