0% found this document useful (0 votes)

57 views41 pages

Binary Logistic Regression - Prof. Sami Day 1

The document provides an overview of Binary Logistic Regression, detailing its purpose for predicting outcomes based on categorical and continuous independent variables. It explains the coding of dependent variables, assumptions, interpretation of odds ratios, and model significance, including accuracy and goodness of fit. Additionally, it outlines the necessary sample size for analysis and compares logistic regression to linear regression.

Uploaded by

x sans

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views41 pages

Binary Logistic Regression - Prof. Sami Day 1

Uploaded by

x sans

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Binary Logistic regression

Prof Sami Abdo Radman

• Why
• 1. Descriptive - form the strength of the association between outcome
and factors of interest
• 2. Adjustment - for covariates/confounders
• 3. Predictors - to determine important risk factors affecting the outcome
• 4. Prediction - to quantify new cases (equation)→ probability

• Logistic regression predict the probability of having the outcome (0-

100%)
• Logistic regression : detect the accurate relation between y and x
(predict changes in Y when x change)
• Prediction of probability of y for a given x
Assumptions

• Dependent= categorical (dichotomous)(binary)

• Independent = categorical or continuous
Coding of the dependent
Dependent variable:
• yes =1 yes should have the bigger code
• No = 0
• the SPSS will predict the value 1 (yes)
• If yes=1 and No=2 , SPSS will predict 2 (no)
• i.e SPSS predict the bigger value (coded value)
Eg to predict systolic BP≥ 180
• Dependent = SBP≥ 180 (yes , no)
• Independent=
• race (Chinese, Indians, Malay, others)
• smoking (yes ,no)
• Age (continuous)
• Analyze →regression → binary
• Enter the dependent
• Enter the independent
• Click categorical to define categorical variables , select reference
(reference category will be =0 in the output)
output
For the independent categorical
variable we should select one
category as a reference group

Race: Reference = Chinese =0

smoking: Reference = no =0
don’t forget: the given
OR is the adjusted OR)

• Exp(B)= OR
• B = beta coefficient
• Wald = give the most important variable in the model ( age is the most important predictor variable:
Wald=11.007)

• A smoker compared to a non-smoker is 9.9 (95% CI 1.4 to 68.4) times more likely to have SBP
≥180.
• or A smoker is 9.9 (95% CI 1.4 to 68.4) times more likely to have SBP compared to a non-
smoker
• or the odds of having SBP>180 is 9.9 greater for smokers compared to nom smokers
• Since age is a quantitative numerical variable, an increase of one-year in age has a 23.3%
(95% CI 8.9% to 39.5%) increase in odds of having SBP ≥180. (Exp(B) age -1 x100/100).
The best interpretation
• Controlling for other explanatory variables: an increase in one-year in age
has a 23.3% increase in odds of having SBP ≥180. (Exp(B)-1 age (100)
/100).
• Controlling for other explanatory variables A smoker compared to a non-
smoker is 9.9 times more likely to have SBP ≥180.
• Interpretation of OR:
• exposed have n times likelihood to have the outcome compared to
non exposed

• exposed are n times more likely to have the outcome compared to

non exposed
• If OR is less than 1 (protective) for example = 0.32
• You can change the reference group or divide 1 by the OR (1/OR)

• Example : male compared to female OR= 2

• Female compared to male OR = ½= 0.5
Total model is significant (p<0.001)
If the over all model is significant this means that at least there is one significant
variable in the model (omnibus test) (chi square x2) , or there is at least one
beta not equal to zero
R square (Pseudo R square)
• The Nagelkerke R Square shows that about 50.6% of the variation in
the outcome variable (SBP ≥180) is explained by this logistic model.
• Cox and snell R square shows that 35%of the variation in the outcome
variable (SBP ≥180) is explained by this logistic model.

• 35% to 51% of the variation in the outcome variable (SBP ≥180) is

explained by this logistic model
Classification table
Classification accuracy
calibration

• The overall accuracy of this model to predict subjects having SBP ≥180 is
85.5% . (To be a good model it should be >50%)
• The sensitivity is given by 9/15 = 60%
• the specificity is 38/40 = 95%.
• Positive predictive value (PPV) = 9/11 = 81.8%
• negative predictive value (NPV) = 38/44 = 86.4%.

• The overall accuracy of this model to predict subjects having SBP ≥180 is
• Hosmer-Lemeshow goodness of fit
• a p value >0.05 is expected

• The model fits the data because p=0.555

Multicollinearity (if high → not accepted)
• 1- Check SE
• If > 5 → multicollinearity (some books say >2)
• 2- Correlation matrix
• If there is multicollinearity between two variables in the model→ we
should remove one of them (the one which has higher SE)
• Or We can remove the one which is not significant
• Or We can remove the one which is less important (if both are sig)
Conclusion
• The logistic regression model was statistically significant, χ2= 35.1, p < .001.
• The model explained 51.0% (Nagelkerke R2) of the variance in SBP.
• The model correctly classified 85.5% of cases.
• The model fit the data :Hosmer-Lemeshow goodness of fit p>0.05

• Increasing age was associated with an increased likelihood of exhibiting

SBP>180: an increase in one-year in age has a 23.3% increase in odds of
having SBP ≥180.
• smoker compared to a non-smoker is 9.9 times more likely to have SBP
≥180
• Race is not a significant predictor.
Predicting equation (to predict probability)
first calculate z (logit), then calculate the exponential
function
• z = -14.462 + 0.209 * Age + 2.292 * Smoker(1) + 0.640 * Race(1) +1.303 *
Race(2) - 0.097 * Race(3)

• e: denotes exponential function ‫وظيفة األس‬

• https://www.medcalc.org/manual/exp_function.php

• Z is the log of OR (log odds) (logit transformation)

• For example, we have a 45-year-old non-smoking Chinese,
• then nonSmoker =0
• Race(1) = Race(2) = Race(3) = 0, and
• z = -14.462 + 0.209 * 45 = -5.057
• -z= -(-5.057 )= 5.057
• e-z = 157.1 • https://www.medcalc.org/manual/exp_function.php

• Probability= 1/1+ e-z

• Probability = 1/1+157.1 = 0.006
• the Prob (SBP ≥ 180) = 0.006 (0.06%); very unlikely that this subject
has SBP ≥180
• In general a probability of less than .50 is considered unlikely
• Probability from 0 to 1
• another example, a 65-year-old Indian, smoker,
• then Smoker(1) = 1, Race(1) = 1, Race(2) = Race(3) = 0
• z = -14.462 + 0.209 * 65 + 2.292 * 1 +0.64 * 1 = 2.055
• -z= - 2.055 ‫وظيفة األس‬

• e-z =0.128
• Probability = 1/1+0.128 = 0.89
• the Prob(SBP ≥180) = 0.89= 89% → very likely that this subject has SBP
≥ 180
Hypothesis of logistic regression
• H0: 1= 2= 3 = ... = n= 0

• H1: At least one regression coefficient is not equal to zero

• The function used in logistic regression is called Sigmoid function ,
log function, z function
• Used to create probabilities.

Probability ≥0.5,class=1
Probability <0.5,class=0
Sample size for logistic regression
• Enter technique → 5-10 subjects for each variable
• Backward/ forward → 20 subjects for each variable
Comparison to linear regression

• Linear Regression predict continuous numbers

• Logistic Regression could help use predict whether the person will
have the outcome or not (probability from 0 to 1)
• Probability ≥0.5→ have the outcome
• Probability<0.5→ will not have the outcome

• Logistic regression is used to classify sample

•
In linear regression
• It fits the line using the least square
In logistic regression
• Unlike linear regression which outputs continuous number
values, logistic regression transforms its output using
the logistic sigmoid function to return a probability value

Probability having the outcome

References

• https://statistics.laerd.com/spss-tutorials/binomial-logistic-
regression-using-spss-statistics.php#procedure

• Check the file sent by email

SPSS example
• To predict Ischemic heart disease
• By
• Age
• Residency
• Level of education
• Diabetes status
notes
• Binary Logistic regression does not make any assumptions of
normality, linearity, and homogeneity of variance for the independent
variables.

Sample size needed: for each variable entered into the model we need
at least 10 participants (cases) (better to be 20) :
For example if we have 5 variables in the model we need at least 50
participants (sample size=50)
Table 5. Bivariate Analysis: Association between Breast Self
Examination (BSE) and Socio-Demographic Variables

Table 6. Results of the Binary Multiple Regression Predicting

BSE in a Sample of Urban Women in Shah Alam, Malaysia (n=
222)
Predictors of smoking among university
students
Sample size for logistic regression
• Enter technique → 5-10 subjects for each variable
• Backward/ forward → 20 subjects for each variable

Introduction to Logistic Regression
No ratings yet
Introduction to Logistic Regression
36 pages
Logistic Regression for SBP Analysis
No ratings yet
Logistic Regression for SBP Analysis
5 pages
Understanding Logistic Regression Basics
100% (1)
Understanding Logistic Regression Basics
37 pages
Logistic Regression Guide
No ratings yet
Logistic Regression Guide
19 pages
Binary Logistic Regression Overview
No ratings yet
Binary Logistic Regression Overview
48 pages
Understanding Logistic Regression in Biostatistics
No ratings yet
Understanding Logistic Regression in Biostatistics
32 pages
Introduction to Logistic Regression
100% (2)
Introduction to Logistic Regression
32 pages
Logistic Regression Analysis Guide
No ratings yet
Logistic Regression Analysis Guide
71 pages
Logistic Regression Model II Overview
No ratings yet
Logistic Regression Model II Overview
37 pages
Logistic Regression
100% (2)
Logistic Regression
47 pages
Logistic Regression for Researchers
100% (2)
Logistic Regression for Researchers
51 pages
Logistic Regression Analysis Guide
100% (1)
Logistic Regression Analysis Guide
55 pages
Multiple Logistic Regression: Dr. Wan Nor Arifin
No ratings yet
Multiple Logistic Regression: Dr. Wan Nor Arifin
45 pages
Final Cc01 Group05-1
No ratings yet
Final Cc01 Group05-1
26 pages
Logistic Regression: Concepts & Metrics
No ratings yet
Logistic Regression: Concepts & Metrics
30 pages
Logistic Regression in Malay Context
No ratings yet
Logistic Regression in Malay Context
44 pages
Logistic Regression Guide
100% (1)
Logistic Regression Guide
34 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
Logistic Regression Analysis Overview
No ratings yet
Logistic Regression Analysis Overview
27 pages
Special Topics in Regression Analysis
No ratings yet
Special Topics in Regression Analysis
28 pages
Multinomial Logistic Regression-2
No ratings yet
Multinomial Logistic Regression-2
21 pages
Logistic Regression in Biostatistics
No ratings yet
Logistic Regression in Biostatistics
19 pages
Logit Analysis
No ratings yet
Logit Analysis
8 pages
Logistic Regression in Health Research
No ratings yet
Logistic Regression in Health Research
11 pages
Multiple Linear Regression - Prof. Sami Day 1
No ratings yet
Multiple Linear Regression - Prof. Sami Day 1
58 pages
Logistic Regression for Diabetes Prediction
No ratings yet
Logistic Regression for Diabetes Prediction
5 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
25 pages
Understanding Binary Logistic Regression
No ratings yet
Understanding Binary Logistic Regression
54 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
12 pages
Logistic Regression in Data Science
No ratings yet
Logistic Regression in Data Science
20 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression Model Insights
No ratings yet
Logistic Regression Model Insights
14 pages
Laboratory 10
No ratings yet
Laboratory 10
8 pages
Logistic Regression: Theory and Applications
No ratings yet
Logistic Regression: Theory and Applications
24 pages
Binomial Logistic Regression Using SPSS
No ratings yet
Binomial Logistic Regression Using SPSS
11 pages
11th Practical Session SPSS, Biostatistics For Postgraduates 2024
No ratings yet
11th Practical Session SPSS, Biostatistics For Postgraduates 2024
17 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Residual Logistic Regression Overview
No ratings yet
Residual Logistic Regression Overview
31 pages
Logistic Regression Guide
100% (3)
Logistic Regression Guide
20 pages
Logistic Regression Analysis Explained
No ratings yet
Logistic Regression Analysis Explained
30 pages
Logistic Regression Overview and Analysis
No ratings yet
Logistic Regression Overview and Analysis
29 pages
Regression Models As A Tool in Medical Research - 1st Edition No-Wait Download
No ratings yet
Regression Models As A Tool in Medical Research - 1st Edition No-Wait Download
15 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Logistic Regression in R Guide
No ratings yet
Logistic Regression in R Guide
10 pages
Multiple Logistic Regression Analysis
No ratings yet
Multiple Logistic Regression Analysis
79 pages
Week 8 - Logistic Regression
No ratings yet
Week 8 - Logistic Regression
67 pages
Baudm - Logistic Regression
No ratings yet
Baudm - Logistic Regression
18 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
23 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
49 pages
Understanding Logistic Regression Basics
No ratings yet
Understanding Logistic Regression Basics
49 pages
FMCH 2021 001290
No ratings yet
FMCH 2021 001290
7 pages
Logistic Regression Basics and Applications
No ratings yet
Logistic Regression Basics and Applications
41 pages
Multivariate Logistic Regression Explained
100% (1)
Multivariate Logistic Regression Explained
21 pages
Multiple Regression in SPSS Explained
No ratings yet
Multiple Regression in SPSS Explained
55 pages
M8 Logreg
No ratings yet
M8 Logreg
10 pages
Garson 2008 Logistic Regression
No ratings yet
Garson 2008 Logistic Regression
33 pages
An Introduction To Data Analysis Using IBM SPSS, 1st Edition ISBN 1032891793, 9781032891798 Direct Ebook Download
No ratings yet
An Introduction To Data Analysis Using IBM SPSS, 1st Edition ISBN 1032891793, 9781032891798 Direct Ebook Download
15 pages
Basic Econometrics MBA 2024
No ratings yet
Basic Econometrics MBA 2024
104 pages
M. Tech Data Science Exam: Statistical Methods
No ratings yet
M. Tech Data Science Exam: Statistical Methods
3 pages
Financialisation of India's Commodity Markets
No ratings yet
Financialisation of India's Commodity Markets
26 pages
Emotional Intelligence & TMMS Study
No ratings yet
Emotional Intelligence & TMMS Study
13 pages
Bannier y Neubert, 2016
No ratings yet
Bannier y Neubert, 2016
6 pages
Spatial Variation of Overweightobesity and Associa
No ratings yet
Spatial Variation of Overweightobesity and Associa
20 pages
Polynomial Regression Insights in BUS 41100
No ratings yet
Polynomial Regression Insights in BUS 41100
6 pages
Understanding OLS Estimators in Econometrics
No ratings yet
Understanding OLS Estimators in Econometrics
33 pages
Capstone Presentation Final
No ratings yet
Capstone Presentation Final
14 pages
The Value of Face-To-Face Communication in The Dig
No ratings yet
The Value of Face-To-Face Communication in The Dig
19 pages
Woldemedhin Kidane
No ratings yet
Woldemedhin Kidane
106 pages
(Anderson) The Asymptotic Distributions of Autoregressive Coefficients
No ratings yet
(Anderson) The Asymptotic Distributions of Autoregressive Coefficients
24 pages
Polynomial Curve Fitting Analysis
No ratings yet
Polynomial Curve Fitting Analysis
4 pages
SPSS Basics for Data Analysis
No ratings yet
SPSS Basics for Data Analysis
31 pages
PD Modeling Inconsistencies
No ratings yet
PD Modeling Inconsistencies
5 pages
Six Sigma in Occupational Safety Management
No ratings yet
Six Sigma in Occupational Safety Management
15 pages
Loss On Ignition: Measuring Soil Organic Carbon in Soils of The Sahel, West Africa
No ratings yet
Loss On Ignition: Measuring Soil Organic Carbon in Soils of The Sahel, West Africa
8 pages
One Hot Encoding for Categorical Data
No ratings yet
One Hot Encoding for Categorical Data
8 pages
File Download PDF
No ratings yet
File Download PDF
136 pages
DKD-R 10-5 2020 Rev1 en
No ratings yet
DKD-R 10-5 2020 Rev1 en
20 pages
Variable Selection Techniques for Models
No ratings yet
Variable Selection Techniques for Models
26 pages
Digital Marketing Impact on Tanzania Telecoms
No ratings yet
Digital Marketing Impact on Tanzania Telecoms
10 pages
Heteroskedasticity Tests in R
No ratings yet
Heteroskedasticity Tests in R
62 pages
Lampiran Lampiran 1 Data Panel Roe (Returnon Assets), Debt Kebijakan Dividen
No ratings yet
Lampiran Lampiran 1 Data Panel Roe (Returnon Assets), Debt Kebijakan Dividen
10 pages
Mathematics IA Plan: Data Analysis Guide
No ratings yet
Mathematics IA Plan: Data Analysis Guide
2 pages
Ifm Project Report Group2
No ratings yet
Ifm Project Report Group2
47 pages
The Fundamentals of Machine Learning
No ratings yet
The Fundamentals of Machine Learning
150 pages
Data Science Assignment Guidelines
No ratings yet
Data Science Assignment Guidelines
3 pages
Cashless Payment Systems in India
No ratings yet
Cashless Payment Systems in India
38 pages