0% found this document useful (0 votes)
57 views41 pages

Binary Logistic Regression - Prof. Sami Day 1

The document provides an overview of Binary Logistic Regression, detailing its purpose for predicting outcomes based on categorical and continuous independent variables. It explains the coding of dependent variables, assumptions, interpretation of odds ratios, and model significance, including accuracy and goodness of fit. Additionally, it outlines the necessary sample size for analysis and compares logistic regression to linear regression.

Uploaded by

x sans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views41 pages

Binary Logistic Regression - Prof. Sami Day 1

The document provides an overview of Binary Logistic Regression, detailing its purpose for predicting outcomes based on categorical and continuous independent variables. It explains the coding of dependent variables, assumptions, interpretation of odds ratios, and model significance, including accuracy and goodness of fit. Additionally, it outlines the necessary sample size for analysis and compares logistic regression to linear regression.

Uploaded by

x sans
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Binary Logistic regression

Prof Sami Abdo Radman


• Why
• 1. Descriptive - form the strength of the association between outcome
and factors of interest
• 2. Adjustment - for covariates/confounders
• 3. Predictors - to determine important risk factors affecting the outcome
• 4. Prediction - to quantify new cases (equation)→ probability

• Logistic regression predict the probability of having the outcome (0-


100%)
• Logistic regression : detect the accurate relation between y and x
(predict changes in Y when x change)
• Prediction of probability of y for a given x
Assumptions

• Dependent= categorical (dichotomous)(binary)


• Independent = categorical or continuous
Coding of the dependent
Dependent variable:
• yes =1 yes should have the bigger code
• No = 0
• the SPSS will predict the value 1 (yes)
• If yes=1 and No=2 , SPSS will predict 2 (no)
• i.e SPSS predict the bigger value (coded value)
Eg to predict systolic BP≥ 180
• Dependent = SBP≥ 180 (yes , no)
• Independent=
• race (Chinese, Indians, Malay, others)
• smoking (yes ,no)
• Age (continuous)
• Analyze →regression → binary
• Enter the dependent
• Enter the independent
• Click categorical to define categorical variables , select reference
(reference category will be =0 in the output)
output
For the independent categorical
variable we should select one
category as a reference group

Race: Reference = Chinese =0

smoking: Reference = no =0
don’t forget: the given
OR is the adjusted OR)

• Exp(B)= OR
• B = beta coefficient
• Wald = give the most important variable in the model ( age is the most important predictor variable:
Wald=11.007)

• A smoker compared to a non-smoker is 9.9 (95% CI 1.4 to 68.4) times more likely to have SBP
≥180.
• or A smoker is 9.9 (95% CI 1.4 to 68.4) times more likely to have SBP compared to a non-
smoker
• or the odds of having SBP>180 is 9.9 greater for smokers compared to nom smokers
• Since age is a quantitative numerical variable, an increase of one-year in age has a 23.3%
(95% CI 8.9% to 39.5%) increase in odds of having SBP ≥180. (Exp(B) age -1 x100/100).
The best interpretation
• Controlling for other explanatory variables: an increase in one-year in age
has a 23.3% increase in odds of having SBP ≥180. (Exp(B)-1 age (100)
/100).
• Controlling for other explanatory variables A smoker compared to a non-
smoker is 9.9 times more likely to have SBP ≥180.
• Interpretation of OR:
• exposed have n times likelihood to have the outcome compared to
non exposed

• exposed are n times more likely to have the outcome compared to


non exposed
• If OR is less than 1 (protective) for example = 0.32
• You can change the reference group or divide 1 by the OR (1/OR)

• Example : male compared to female OR= 2


• Female compared to male OR = ½= 0.5
Total model is significant (p<0.001)
If the over all model is significant this means that at least there is one significant
variable in the model (omnibus test) (chi square x2) , or there is at least one
beta not equal to zero
R square (Pseudo R square)
• The Nagelkerke R Square shows that about 50.6% of the variation in
the outcome variable (SBP ≥180) is explained by this logistic model.
• Cox and snell R square shows that 35%of the variation in the outcome
variable (SBP ≥180) is explained by this logistic model.

• 35% to 51% of the variation in the outcome variable (SBP ≥180) is


explained by this logistic model
Classification table
Classification accuracy
calibration

• The overall accuracy of this model to predict subjects having SBP ≥180 is
85.5% . (To be a good model it should be >50%)
• The sensitivity is given by 9/15 = 60%
• the specificity is 38/40 = 95%.
• Positive predictive value (PPV) = 9/11 = 81.8%
• negative predictive value (NPV) = 38/44 = 86.4%.

• The overall accuracy of this model to predict subjects having SBP ≥180 is
• Hosmer-Lemeshow goodness of fit
• a p value >0.05 is expected

• The model fits the data because p=0.555


Multicollinearity (if high → not accepted)
• 1- Check SE
• If > 5 → multicollinearity (some books say >2)
• 2- Correlation matrix
• If there is multicollinearity between two variables in the model→ we
should remove one of them (the one which has higher SE)
• Or We can remove the one which is not significant
• Or We can remove the one which is less important (if both are sig)
Conclusion
• The logistic regression model was statistically significant, χ2= 35.1, p < .001.
• The model explained 51.0% (Nagelkerke R2) of the variance in SBP.
• The model correctly classified 85.5% of cases.
• The model fit the data :Hosmer-Lemeshow goodness of fit p>0.05

• Increasing age was associated with an increased likelihood of exhibiting


SBP>180: an increase in one-year in age has a 23.3% increase in odds of
having SBP ≥180.
• smoker compared to a non-smoker is 9.9 times more likely to have SBP
≥180
• Race is not a significant predictor.
Predicting equation (to predict probability)
first calculate z (logit), then calculate the exponential
function
• z = -14.462 + 0.209 * Age + 2.292 * Smoker(1) + 0.640 * Race(1) +1.303 *
Race(2) - 0.097 * Race(3)

• e: denotes exponential function ‫وظيفة األس‬

• https://www.medcalc.org/manual/exp_function.php

• Z is the log of OR (log odds) (logit transformation)


• For example, we have a 45-year-old non-smoking Chinese,
• then nonSmoker =0
• Race(1) = Race(2) = Race(3) = 0, and
• z = -14.462 + 0.209 * 45 = -5.057
• -z= -(-5.057 )= 5.057
• e-z = 157.1 • https://www.medcalc.org/manual/exp_function.php

• Probability= 1/1+ e-z


• Probability = 1/1+157.1 = 0.006
• the Prob (SBP ≥ 180) = 0.006 (0.06%); very unlikely that this subject
has SBP ≥180
• In general a probability of less than .50 is considered unlikely
• Probability from 0 to 1
• another example, a 65-year-old Indian, smoker,
• then Smoker(1) = 1, Race(1) = 1, Race(2) = Race(3) = 0
• z = -14.462 + 0.209 * 65 + 2.292 * 1 +0.64 * 1 = 2.055
• -z= - 2.055 ‫وظيفة األس‬

• e-z =0.128
• Probability = 1/1+0.128 = 0.89
• the Prob(SBP ≥180) = 0.89= 89% → very likely that this subject has SBP
≥ 180
Hypothesis of logistic regression
• H0: 1= 2= 3 = ... = n= 0

• H1: At least one regression coefficient is not equal to zero


• The function used in logistic regression is called Sigmoid function ,
log function, z function
• Used to create probabilities.

Probability ≥0.5,class=1
Probability <0.5,class=0
Sample size for logistic regression
• Enter technique → 5-10 subjects for each variable
• Backward/ forward → 20 subjects for each variable
Comparison to linear regression

• Linear Regression predict continuous numbers

• Logistic Regression could help use predict whether the person will
have the outcome or not (probability from 0 to 1)
• Probability ≥0.5→ have the outcome
• Probability<0.5→ will not have the outcome

• Logistic regression is used to classify sample



In linear regression
• It fits the line using the least square
In logistic regression
• Unlike linear regression which outputs continuous number
values, logistic regression transforms its output using
the logistic sigmoid function to return a probability value

Probability having the outcome


References

• https://statistics.laerd.com/spss-tutorials/binomial-logistic-
regression-using-spss-statistics.php#procedure

• Check the file sent by email


SPSS example
• To predict Ischemic heart disease
• By
• Age
• Residency
• Level of education
• Diabetes status
notes
• Binary Logistic regression does not make any assumptions of
normality, linearity, and homogeneity of variance for the independent
variables.

Sample size needed: for each variable entered into the model we need
at least 10 participants (cases) (better to be 20) :
For example if we have 5 variables in the model we need at least 50
participants (sample size=50)
Table 5. Bivariate Analysis: Association between Breast Self
Examination (BSE) and Socio-Demographic Variables

Table 6. Results of the Binary Multiple Regression Predicting


BSE in a Sample of Urban Women in Shah Alam, Malaysia (n=
222)
Predictors of smoking among university
students
Sample size for logistic regression
• Enter technique → 5-10 subjects for each variable
• Backward/ forward → 20 subjects for each variable

You might also like