0% found this document useful (0 votes)
46 views9 pages

Logistic Regression: 1 Applied Methods in Biostatistics - Week 2 2019

1. Logistic regression is used to model binary outcome variables and extends linear regression to non-normally distributed outcomes. It is applied to outcomes such as disease presence/absence. 2. The logistic regression model relates the log-odds of the outcome (logit) to the predictor variables. It allows estimation of odds ratios to quantify the effect of predictors on the outcome. 3. An example uses logistic regression to predict lymph node metastasis in prostate cancer patients based on age, serum acid level, x-ray results, tumor size, and grade. The odds ratios estimated from the model quantify the effect of each predictor on metastasis risk.

Uploaded by

IuliaOpris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views9 pages

Logistic Regression: 1 Applied Methods in Biostatistics - Week 2 2019

1. Logistic regression is used to model binary outcome variables and extends linear regression to non-normally distributed outcomes. It is applied to outcomes such as disease presence/absence. 2. The logistic regression model relates the log-odds of the outcome (logit) to the predictor variables. It allows estimation of odds ratios to quantify the effect of predictors on the outcome. 3. An example uses logistic regression to predict lymph node metastasis in prostate cancer patients based on age, serum acid level, x-ray results, tumor size, and grade. The odds ratios estimated from the model quantify the effect of each predictor on metastasis risk.

Uploaded by

IuliaOpris
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

05‐11‐2020

Logistic regression

1 Applied Methods in Biostatistics - Week 2 2019

Generalization of the
Linear regression model

In many practical situation linear regression model is inadequate.


For example in case where: the outcome has two possible responses (binary data)
or the outcome represents count data (positive integers)
 it makes no sense to model the outcome as normally distributed
Generalized linear models (GLMs) are an extension of linear regression
Regression models to model non-normally distributed outcome variables.

2 Applied Methods in Biostatistics - Week 2

1
05‐11‐2020

Binary outcome variable

In many studies the outcome of interest is the presence or absence of some condition.
Examples:
 smoking status
 responding to a treatment
 presence or absence of cancer
 survival status of a subject after a surgery: dead or alive
 having myocardial infarction or CHD: yes/no
 success (’yes’/1) and failure (’no’/0) are often used as generic terms of the two possible
responses
the interest is in quantifying the risk or odds of success or occurrence of some event of
interest

3 Applied Methods in Biostatistics - Week 2

Example: Prostatic cancer


A study of 53 prostate cancer patients. Before surgery two continuous exposure variables (age,
serumacid, phosphatase) and three categorical (binary) exposure variables (X-ray, tumour size,
tumour grade) were measured. The patients then had surgery (laparotomy) to determine whether
there was nodal involvement, i. e. lymph node metastases (NI = 1) or not (NI = 0) in the cancer to
adopt the treatment regimen for the patient.

Pat NI Age Acid Xray Size Grade


(pos) (large) (serious)
1 0 66 0.48 0 0 0
2 0 68 0.56 0 0 0

52 1 64 0.89 1 1 0
53 1 68 1.26 1 1 1
Brown, B.W. (1980)
4 Applied Methods in Biostatistics - Week 2

2
05‐11‐2020

Risk outcome: odds

Studies
• Case-control / Cross sectional
• Cohort: cumulative incidence rate

Simple (exploratory) inference


• Confidence intervals & hypothesis tests
• comparing risks between exposed/unexposed groups
• Test for association (two or more groups)
• Chi-square-tests/ Fishers exact tests

5 Applied Methods in Biostatistics - Week 2

Logistic regression model

The model is based on:


• Relationship
• logit (p) = log (p/(1-p))
= log-odds (p) = β0 + β1 x1 + β2x2 + … + βkxk
• E.g: p not linear in βs, but logit(p) linear
• Data from binomial distribution

Inference similar to linear model


• Allows many categorical & numerical indep. variables

Estimation & inference: computer

6 Applied Methods in Biostatistics - Week 2

3
05‐11‐2020

Purposes of logstic regression

Effect estimation
• exp (β1) = OR1 = Effect of variable
• Stata: logistic calculates effect estimates exp (β1) directly!

Prediction:
• Best model for predicting risk p of disease for new cases
• Stata: logit calculates parameter estimates of β0, β1, β2, …
• Rule of thumb: at least 10 cases and 10 controls for each indep. var. in model

7 Applied Methods in Biostatistics - Week 2

Estimation:
Interpretation of the coefficients
Interpretations of coefficients is similar to linear regression. However since the logit is linear, the coefficients we
have an analogous interpretation on the logit or log odds scale.
Logit (πNI(Xray, Size, Age)) = β0 + β1Xray + β2 Size + β3 Age

Binary exposure (Comparing Xray examination (1 = positive finding, 0 = negative finding) for Size and Age
fixed)
| , , β0 + β1Xray + β2 Size + β3 Age
𝑂𝑅 xray 𝑒 β1
| , , β0 + β2 Size + β3 Age

Continous exposure variable


| , , β0 + β1Xray + β2 Size + β3 Age+1
𝑂𝑅age 𝑒 β3
| , , β0 + β2 Size + β3 Age

8 Applied Methods in Biostatistics - Week 2

4
05‐11‐2020

Estimation Example (1):


Model: Logit (πNI(Xray, Size, Age)) = Log odds (NI=1|Xray, Size,Age)
= β0 + β1Xray + β2 Size + β3 Age
. logit NI Xray Size Age

Iteration 0: log likelihood = -35.126076


Iteration 1: log likelihood = -26.176433
Iteration 2: log likelihood = -26.042916
Iteration 3: log likelihood = -26.04263
Iteration 4: log likelihood = -26.04263
Logistic regression Number of obs = 53
LR chi2(3) = 18.17
Prob > chi2 = 0.0004
Log likelihood = -26.04263 Pseudo R2 = 0.2586
-------------------------------------------------------------------------
NI | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------+-----------------------------------------------------------------
Xray | 2.175658 .7644116 2.85 0.004 .6774385 3.673877
Size | 1.596897 .7079243 2.26 0.024 .2093913 2.984403
Age | -.0604558 .054447 -1.11 0.267 -.16717 .0462584
_cons | 1.518419 3.22939 0.47 0.638 -4.811069 7.847908
------------------------------------------------------------------------

9 Applied Methods in Biostatistics - Week 2

Estimation Example (2):


Model: Logit (πNI(Xray, Size, Age)) = Log odds (NI=1|Xray, Size,Age)
= β0 + β1Xray + β2 Size + β3 Age

. logistic NI Xray Size Age


Logistic regression Number of obs = 53
LR chi2(3) = 18.17
Prob > chi2 = 0.0004
Log likelihood = -26.04263 Pseudo R2 = 0.2586
-------------------------------------------------------------------------
NI | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------+------------------------------------------------------------------
Xray | 8.807976 6.732919 2.85 0.004 1.968828 39.40437
Size | 4.937689 3.49551 2.26 0.024 1.232927 19.7747
Age | .9413353 .0512529 -1.11 0.267 .8460557 1.047345
-------------------------------------------------------------------------

10 Applied Methods in Biostatistics - Week 2

10

5
05‐11‐2020

Inferences - Testing overall regression


Hypotheses: H0 : β1 = β2 = . . . = βn = 0
(e. g., Xray, Size and Age are not of predictable value for prostatic cancer

Likelihood ratio (LR) statistic compares two models


1. minimal model = logistic regression model under H0
2. full model = logistic regression model taking account for (all) the exposure variables of interest
 for each model the maximum likelihood function L is calculated:
1. Lm := L( 𝛽 0) for the minimal model
2. Lf := L(𝛽 0 , 𝛽 1 , 𝛽 2. … 𝛽 n) for the full model

Likelihood ratio statistic


LR = 2{log(Lf ) − log(Lm)} = 2 log~ chi square distributed

11 Applied Methods in Biostatistics - Week 2

11

Estimation Example (overall test):


Model: Logit (πNI(Xray, Size, Age)) = Log odds (NI=1|Xray, Size,Age)
= β0 + β1Xray + β2 Size + β3 Age

. logistic NI Xray Size Age


Logistic regression Number of obs = 53
LR chi2(3) = 18.17
Prob > chi2 = 0.0004
Log likelihood = -26.04263 Pseudo R2 = 0.2586
-------------------------------------------------------------------------
NI | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------+------------------------------------------------------------------
Xray | 8.807976 6.732919 2.85 0.004 1.968828 39.40437
Size | 4.937689 3.49551 2.26 0.024 1.232927 19.7747
Age | .9413353 .0512529 -1.11 0.267 .8460557 1.047345
-------------------------------------------------------------------------

12 Applied Methods in Biostatistics - Week 2

12

6
05‐11‐2020

Inferences - Wald-test
Which factors had a significant effect on the dependent variable adjusted for all the other
independent variables?

 Hypotheses: H 0 :  i  0 vs . H 1 :  i  0

ˆ i
 Test statistics:  N(0,1)-distributed
Z i
 ~
se ˆ i

with Z ~  -distributed,
2 2

degree of freedom=1
13 Applied Methods in Biostatistics - Week 2

13

Estimation Example (Wald test):


Model: Logit (πNI(Xray, Size, Age)) = Log odds (NI=1|Xray, Size,Age)
= β0 + β1Xray + β2 Size + β3 Age

. logistic NI Xray Size Age


Logistic regression Number of obs = 53
LR chi2(3) = 18.17
Prob > chi2 = 0.0004
Log likelihood = -26.04263 Pseudo R2 = 0.2586
-------------------------------------------------------------------------
NI | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------+------------------------------------------------------------------
Xray | 8.807976 6.732919 2.85 0.004 1.968828 39.40437
Size | 4.937689 3.49551 2.26 0.024 1.232927 19.7747
Age | .9413353 .0512529 -1.11 0.267 .8460557 1.047345
-------------------------------------------------------------------------

14 Applied Methods in Biostatistics - Week 2

14

7
05‐11‐2020

Maximum likelihood estimation

The idea behind: determine the parameters that maximize the probability
(likelihood) of the sample data.
From a statistical point of view, the method of maximum likelihood is considered
to be more robust and yields estimators with good statistical properties.
An efficient methods for quantifying uncertainty through confidence bounds.
Although the methodology for maximum likelihood estimation is simple, the
implementation is mathematically intense. Using today's computer power,
however, mathematical complexity is not a big obstacle.
Maximize the likelihood function L(ϑ) is equivalent to maximize the log-
Likelihood-function l(ϑ)

15 Applied Methods in Biostatistics - Week 2

15

Estimation Example (ML estimation):


Model: Logit (πNI(Xray, Size, Age)) = Log odds (NI=1|Xray, Size,Age)
= β0 + β1Xray + β2 Size + β3 Age

. logistic NI Xray Size Age


Logistic regression Number of obs = 53
LR chi2(3) = 18.17
Prob > chi2 = 0.0004
Log likelihood = -26.04263 Pseudo R2 = 0.2586
-------------------------------------------------------------------------
NI | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
------+------------------------------------------------------------------
Xray | 8.807976 6.732919 2.85 0.004 1.968828 39.40437
Size | 4.937689 3.49551 2.26 0.024 1.232927 19.7747
Age | .9413353 .0512529 -1.11 0.267 .8460557 1.047345
-------------------------------------------------------------------------

16 Applied Methods in Biostatistics - Week 2

16

8
05‐11‐2020

Prediction
The logistic regression approach is suitable for predicting success probability or the outcome risk
for new cases in dependence of exposures
Example: Prostatic cancer

𝐿𝑜𝑔𝑖𝑡 𝜋𝑁𝐼 𝑋𝑟𝑎𝑦, 𝑆𝑖𝑧𝑒, 𝐴𝑔𝑒 𝛽0 𝛽1𝑋𝑟𝑎𝑦 𝛽2 𝑆𝑖𝑧𝑒 𝛽3 𝐴𝑔𝑒


1.52 2.18Xray 1.60Size-0.06Age

Xray Size Age logit 𝜋𝑁𝐼 π𝑁𝐼 𝑃(NI = 1)


0 0 68 -2.56 0.072
1 0 68 -0.38 0.515
0 1 51 0.06 0.406
1 1 57 1.88 0.868

17 Applied Methods in Biostatistics - Week 2

17

You might also like