Logistic Regression: 1 Applied Methods in Biostatistics - Week 2 2019
Logistic Regression: 1 Applied Methods in Biostatistics - Week 2 2019
Logistic regression
Generalization of the
Linear regression model
1
05‐11‐2020
In many studies the outcome of interest is the presence or absence of some condition.
Examples:
smoking status
responding to a treatment
presence or absence of cancer
survival status of a subject after a surgery: dead or alive
having myocardial infarction or CHD: yes/no
success (’yes’/1) and failure (’no’/0) are often used as generic terms of the two possible
responses
the interest is in quantifying the risk or odds of success or occurrence of some event of
interest
2
05‐11‐2020
Studies
• Case-control / Cross sectional
• Cohort: cumulative incidence rate
3
05‐11‐2020
Effect estimation
• exp (β1) = OR1 = Effect of variable
• Stata: logistic calculates effect estimates exp (β1) directly!
Prediction:
• Best model for predicting risk p of disease for new cases
• Stata: logit calculates parameter estimates of β0, β1, β2, …
• Rule of thumb: at least 10 cases and 10 controls for each indep. var. in model
Estimation:
Interpretation of the coefficients
Interpretations of coefficients is similar to linear regression. However since the logit is linear, the coefficients we
have an analogous interpretation on the logit or log odds scale.
Logit (πNI(Xray, Size, Age)) = β0 + β1Xray + β2 Size + β3 Age
Binary exposure (Comparing Xray examination (1 = positive finding, 0 = negative finding) for Size and Age
fixed)
| , , β0 + β1Xray + β2 Size + β3 Age
𝑂𝑅 xray 𝑒 β1
| , , β0 + β2 Size + β3 Age
4
05‐11‐2020
10
5
05‐11‐2020
11
12
6
05‐11‐2020
Inferences - Wald-test
Which factors had a significant effect on the dependent variable adjusted for all the other
independent variables?
Hypotheses: H 0 : i 0 vs . H 1 : i 0
ˆ i
Test statistics: N(0,1)-distributed
Z i
~
se ˆ i
with Z ~ -distributed,
2 2
degree of freedom=1
13 Applied Methods in Biostatistics - Week 2
13
14
7
05‐11‐2020
The idea behind: determine the parameters that maximize the probability
(likelihood) of the sample data.
From a statistical point of view, the method of maximum likelihood is considered
to be more robust and yields estimators with good statistical properties.
An efficient methods for quantifying uncertainty through confidence bounds.
Although the methodology for maximum likelihood estimation is simple, the
implementation is mathematically intense. Using today's computer power,
however, mathematical complexity is not a big obstacle.
Maximize the likelihood function L(ϑ) is equivalent to maximize the log-
Likelihood-function l(ϑ)
15
16
8
05‐11‐2020
Prediction
The logistic regression approach is suitable for predicting success probability or the outcome risk
for new cases in dependence of exposures
Example: Prostatic cancer
17