0% found this document useful (0 votes)

293 views5 pages

Stepwise Logistic Regression in R

Stepwise logistic regression is used to select variables for a logistic regression model. It uses the Akaike information criterion (AIC) to evaluate models, penalizing those with more parameters or poorer fit. The document shows an example using stepwise regression on a dataset to predict low birth weight. Both backwards selection and forward selection are performed, ultimately arriving at the same best model containing the variables ptl, lwt, ht, racefac, smoke, and ui.

Uploaded by

Juan Pablo Aguilar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

293 views5 pages

Stepwise Logistic Regression in R

Uploaded by

Juan Pablo Aguilar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Stepwise Logistic Regression with R

Akaike information criterion: AIC = 2k - 2 log L

= 2k + Deviance, where k = number of parameters

Small numbers are better

Penalizes models with lots of parameters
Penalizes models with poor fit
> fullmod = glm(low ~ age+lwt+racefac+smoke+ptl+ht+ui+ftv,family=binomial)
> summary(fullmod)

Call:
glm(formula = low ~ age + lwt + racefac + smoke + ptl + ht +
ui + ftv, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.8946 -0.8212 -0.5316 0.9818 2.2125

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.480623 1.196888 0.402 0.68801
age -0.029549 0.037031 -0.798 0.42489
lwt -0.015424 0.006919 -2.229 0.02580 *
racefacBlack 1.272260 0.527357 2.413 0.01584 *
racefacOther 0.880496 0.440778 1.998 0.04576 *
smoke 0.938846 0.402147 2.335 0.01957 *
ptl 0.543337 0.345403 1.573 0.11571
ht 1.863303 0.697533 2.671 0.00756 **
ui 0.767648 0.459318 1.671 0.09467 .
ftv 0.065302 0.172394 0.379 0.70484
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedom

Residual deviance: 201.28 on 179 degrees of freedom
AIC: 221.28

Number of Fisher Scoring iterations: 4

> nothing <- glm(low ~ 1,family=binomial)

> summary(nothing)

Call:
glm(formula = low ~ 1, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-0.8651 -0.8651 -0.8651 1.5259 1.5259

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.790 0.157 -5.033 4.84e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

1
Null deviance: 234.67 on 188 degrees of freedom
Residual deviance: 234.67 on 188 degrees of freedom
AIC: 236.67

Number of Fisher Scoring iterations: 4

> # Here was the chosen model from earlier

> redmod1 = glm(low ~ lwt+racefac+smoke+ptl+ht,family=binomial)
>
> backwards = step(fullmod) # Backwards selection is the default
Start: AIC= 221.28
low ~ age + lwt + racefac + smoke + ptl + ht + ui + ftv

Df Deviance AIC
- ftv 1 201.43 219.43
- age 1 201.93 219.93
<none> 201.28 221.28
- ptl 1 203.83 221.83
- ui 1 204.03 222.03
- racefac 2 208.75 224.75
- lwt 1 206.80 224.80
- smoke 1 206.91 224.91
- ht 1 208.81 226.81

Step: AIC= 219.43

low ~ age + lwt + racefac + smoke + ptl + ht + ui

Df Deviance AIC
- age 1 201.99 217.99
<none> 201.43 219.43
- ptl 1 203.95 219.95
- ui 1 204.11 220.11
- racefac 2 208.77 222.77
- lwt 1 206.81 222.81
- smoke 1 206.92 222.92
- ht 1 208.81 224.81

Step: AIC= 217.99

low ~ lwt + racefac + smoke + ptl + ht + ui

Df Deviance AIC
<none> 201.99 217.99
- ptl 1 204.22 218.22
- ui 1 204.90 218.90
- smoke 1 207.73 221.73
- lwt 1 208.11 222.11
- racefac 2 210.31 222.31
- ht 1 209.46 223.46

> 217.99-201.99
[1] 16

> # backwards = step(fullmod,trace=0) would suppress step by step output.

> formula(backwards)
low ~ lwt + racefac + smoke + ptl + ht + ui

2
> summary(backwards)

Call:
glm(formula = low ~ lwt + racefac + smoke + ptl + ht + ui, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.9049 -0.8124 -0.5241 0.9483 2.1812

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.086550 0.951760 -0.091 0.92754
lwt -0.015905 0.006855 -2.320 0.02033 *
racefacBlack 1.325719 0.522243 2.539 0.01113 *
racefacOther 0.897078 0.433881 2.068 0.03868 *
smoke 0.938727 0.398717 2.354 0.01855 *
ptl 0.503215 0.341231 1.475 0.14029
ht 1.855042 0.695118 2.669 0.00762 **
ui 0.785698 0.456441 1.721 0.08519 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedom

Residual deviance: 201.99 on 181 degrees of freedom
AIC: 217.99

Number of Fisher Scoring iterations: 4

> # I would be inclined to drop ptl

> back2 = glm(low ~ lwt + racefac + smoke + ht + ui,family=binomial)
> summary(back2)

Call:
glm(formula = low ~ lwt + racefac + smoke + ht + ui, family = binomial)

Deviance Residuals:
Min 1Q Median 3Q Max
-1.7396 -0.8322 -0.5359 0.9873 2.1692

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.056276 0.937853 0.060 0.95215
lwt -0.016732 0.006803 -2.459 0.01392 *
racefacBlack 1.324562 0.521464 2.540 0.01108 *
racefacOther 0.926197 0.430386 2.152 0.03140 *
smoke 1.035831 0.392558 2.639 0.00832 **
ht 1.871416 0.690902 2.709 0.00676 **
ui 0.904974 0.447553 2.022 0.04317 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedom

Residual deviance: 204.22 on 182 degrees of freedom
AIC: 218.22

Number of Fisher Scoring iterations: 4

3
> redmod1$deviance; back2$deviance
[1] 204.8977
[1] 204.2166
> # back2 may be slightly "better," but I like redmod1 more.
> # Why? Because ptl is easier to assess than ui
>
> forwards = step(nothing,
scope=list(lower=formula(nothing),upper=formula(fullmod)), direction="forward")
Start: AIC= 236.67
low ~ 1

Df Deviance AIC
+ ptl 1 227.89 231.89
+ lwt 1 228.69 232.69
+ ui 1 229.60 233.60
+ smoke 1 229.81 233.81
+ ht 1 230.65 234.65
+ racefac 2 229.66 235.66
+ age 1 231.91 235.91
<none> 234.67 236.67
+ ftv 1 233.90 237.90

Step: AIC= 231.89

low ~ ptl

Df Deviance AIC
+ lwt 1 223.41 229.41
+ ht 1 223.58 229.58
+ age 1 224.27 230.27
+ racefac 2 222.53 230.53
+ smoke 1 224.78 230.78
+ ui 1 224.89 230.89
<none> 227.89 231.89
+ ftv 1 227.30 233.30

Step: AIC= 229.41

low ~ ptl + lwt

Df Deviance AIC
+ ht 1 215.96 223.96
+ racefac 2 217.68 227.68
+ smoke 1 220.54 228.54
+ age 1 221.05 229.05
+ ui 1 221.23 229.23
<none> 223.41 229.41
+ ftv 1 223.12 231.12

Step: AIC= 223.96

low ~ ptl + lwt + ht

Df Deviance AIC
+ racefac 2 210.85 222.85
+ ui 1 213.01 223.01
+ smoke 1 213.15 223.15
<none> 215.96 223.96
+ age 1 214.01 224.01
+ ftv 1 215.84 225.84

Step: AIC= 222.85

4
low ~ ptl + lwt + ht + racefac

Df Deviance AIC
+ smoke 1 204.90 218.90
+ ui 1 207.73 221.73
<none> 210.85 222.85
+ age 1 209.81 223.81
+ ftv 1 210.82 224.82

Step: AIC= 218.9

low ~ ptl + lwt + ht + racefac + smoke

Df Deviance AIC
+ ui 1 201.99 217.99
<none> 204.90 218.90
+ age 1 204.11 220.11
+ ftv 1 204.88 220.88

Step: AIC= 217.99

low ~ ptl + lwt + ht + racefac + smoke + ui

Df Deviance AIC
<none> 201.99 217.99
+ age 1 201.43 219.43
+ ftv 1 201.93 219.93

> formula(redmod1)
low ~ lwt + racefac + smoke + ptl + ht
> formula(backwards)
low ~ lwt + racefac + smoke + ptl + ht + ui
> formula(forwards)
low ~ ptl + lwt + ht + racefac + smoke + ui
> bothways =
+ step(nothing, list(lower=formula(nothing),upper=formula(fullmod)),
direction="both",trace=0)
> formula(forwards)
low ~ ptl + lwt + ht + racefac + smoke + ui
> formula(bothways)
low ~ ptl + lwt + ht + racefac + smoke + ui

Logistic Regression in R Guide
No ratings yet
Logistic Regression in R Guide
5 pages
Stepwise Logistic Regression in R
No ratings yet
Stepwise Logistic Regression in R
5 pages
Logistic Regression Analysis of Wells Data
No ratings yet
Logistic Regression Analysis of Wells Data
16 pages
Actuarial Internship: Gross Premium Models
No ratings yet
Actuarial Internship: Gross Premium Models
9 pages
Sensex Direction Analysis with GLM
No ratings yet
Sensex Direction Analysis with GLM
11 pages
Model - 1
No ratings yet
Model - 1
2 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
10 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
5 pages
Logistic and Probit Regression Analysis
No ratings yet
Logistic and Probit Regression Analysis
12 pages
Binomial Regression Analysis in R
No ratings yet
Binomial Regression Analysis in R
5 pages
Logistic Regression Analysis Guide
No ratings yet
Logistic Regression Analysis Guide
10 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
15 pages
Logistic Regression Analysis in R
No ratings yet
Logistic Regression Analysis in R
10 pages
Model Linear
No ratings yet
Model Linear
33 pages
Assignment3 Finaldraft
No ratings yet
Assignment3 Finaldraft
38 pages
Untitled
No ratings yet
Untitled
14 pages
Regression Modelling Term Test Analysis
No ratings yet
Regression Modelling Term Test Analysis
6 pages
Generalized Linear Models Overview
No ratings yet
Generalized Linear Models Overview
36 pages
Addiction GDP
No ratings yet
Addiction GDP
16 pages
Econometrics I: Assignment Key
No ratings yet
Econometrics I: Assignment Key
6 pages
PSQF6270 Example4a Binomial
No ratings yet
PSQF6270 Example4a Binomial
13 pages
Logit and Probit Models Explained
No ratings yet
Logit and Probit Models Explained
66 pages
Week 5
No ratings yet
Week 5
11 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Lecture Notes 6
No ratings yet
Lecture Notes 6
8 pages
GLM in R
No ratings yet
GLM in R
6 pages
AIC-Based Stepwise Model Selection
No ratings yet
AIC-Based Stepwise Model Selection
2 pages
Generalized Linear Models in R Guide
No ratings yet
Generalized Linear Models in R Guide
49 pages
PH6205 RTutorial 4
No ratings yet
PH6205 RTutorial 4
5 pages
Salary and Wage Regression Analysis
No ratings yet
Salary and Wage Regression Analysis
48 pages
Instrumental Variable Estimation 2: Implementation in R: Instructor: Yuta Toyama Last Updated: 2021-05-18
No ratings yet
Instrumental Variable Estimation 2: Implementation in R: Instructor: Yuta Toyama Last Updated: 2021-05-18
34 pages
Econometric Analysis of Smoking and BMI
No ratings yet
Econometric Analysis of Smoking and BMI
8 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Decision Trees For The Beginner: Dan Murphy CANW September 29, 2017
No ratings yet
Decision Trees For The Beginner: Dan Murphy CANW September 29, 2017
26 pages
Weekly Market Direction Analysis
No ratings yet
Weekly Market Direction Analysis
8 pages
ANOVA and Regression Analysis in R
No ratings yet
ANOVA and Regression Analysis in R
11 pages
Regression Analysis and Correlation Results
No ratings yet
Regression Analysis and Correlation Results
12 pages
Panel Data Regression Analysis Techniques
No ratings yet
Panel Data Regression Analysis Techniques
6 pages
Logistic Regression Analysis Guide
No ratings yet
Logistic Regression Analysis Guide
45 pages
CS1B April 2024
No ratings yet
CS1B April 2024
9 pages
SPSS Descriptive and Regression Analysis
No ratings yet
SPSS Descriptive and Regression Analysis
39 pages
Fraud Detection Analysis in Banking
No ratings yet
Fraud Detection Analysis in Banking
32 pages
Statistical Modelling Lab Solutions
No ratings yet
Statistical Modelling Lab Solutions
14 pages
Least Squares Regression Analysis in R
No ratings yet
Least Squares Regression Analysis in R
5 pages
PSQF6270 Example4b Continuous QuantReg
No ratings yet
PSQF6270 Example4b Continuous QuantReg
13 pages
Analyzing Count Data with Poisson Models
No ratings yet
Analyzing Count Data with Poisson Models
20 pages
Generalized Linear Model: Badr Missaoui
No ratings yet
Generalized Linear Model: Badr Missaoui
35 pages
Poisson and Negative Binomial Model Fitting
No ratings yet
Poisson and Negative Binomial Model Fitting
15 pages
17
No ratings yet
17
2 pages
Time Series Regression Analysis Techniques
No ratings yet
Time Series Regression Analysis Techniques
6 pages
R Code For Logistic Regression
No ratings yet
R Code For Logistic Regression
3 pages
MLR-handson - Jupyter Notebook
No ratings yet
MLR-handson - Jupyter Notebook
5 pages
Multicollinearity and Oaxaca - Tutorial
No ratings yet
Multicollinearity and Oaxaca - Tutorial
35 pages
General Linear Model Overview
No ratings yet
General Linear Model Overview
33 pages
CO2 Data Modeling Techniques in R
No ratings yet
CO2 Data Modeling Techniques in R
16 pages
Panel Data Regression Analysis in R
No ratings yet
Panel Data Regression Analysis in R
26 pages
Chapter 4
No ratings yet
Chapter 4
7 pages
BT MGCR 650 Sample Final Exam Solutions MBAJapan
No ratings yet
BT MGCR 650 Sample Final Exam Solutions MBAJapan
9 pages
Quantitative Techniques Answers
No ratings yet
Quantitative Techniques Answers
3 pages
Analyzing State Productivity Differences
No ratings yet
Analyzing State Productivity Differences
21 pages
AI For Network Management - A Use Case Perspective - 2023 - vIMT 2023-05-11
No ratings yet
AI For Network Management - A Use Case Perspective - 2023 - vIMT 2023-05-11
47 pages
Data Mining Applications and Trends
No ratings yet
Data Mining Applications and Trends
54 pages
Data Science for Marketing Analytics Guide
No ratings yet
Data Science for Marketing Analytics Guide
11 pages
Modern Math Syllabus Overview
No ratings yet
Modern Math Syllabus Overview
11 pages
Python MCQs All Weeks Answers With Steps
No ratings yet
Python MCQs All Weeks Answers With Steps
29 pages
Oreilly AI Driven Analytics
0% (1)
Oreilly AI Driven Analytics
38 pages
Top 45 Machine Learning Interview Questions 2024
100% (1)
Top 45 Machine Learning Interview Questions 2024
34 pages
SPSS Instruction Manual Spring2025
No ratings yet
SPSS Instruction Manual Spring2025
38 pages
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
No ratings yet
Machine Learning Basics: Lecture Slides For Chapter 5 of Deep Learning Ian Goodfellow 2016-09-26
14 pages
Ordinal Logistic Regression Stata Command
No ratings yet
Ordinal Logistic Regression Stata Command
3 pages
Fraud Detection in Immigration
No ratings yet
Fraud Detection in Immigration
47 pages
Antoine Equation Constants via Regression
No ratings yet
Antoine Equation Constants via Regression
5 pages
C1 W3 Logistic Regression
No ratings yet
C1 W3 Logistic Regression
27 pages
The Influence of Business Process Management System Implementation On An Organization S Process Orientation A Case Study of A Financial Service Provi
No ratings yet
The Influence of Business Process Management System Implementation On An Organization S Process Orientation A Case Study of A Financial Service Provi
23 pages
Nazerian 2018
No ratings yet
Nazerian 2018
23 pages
014 Kejar - Vol - 1 - No - 2
No ratings yet
014 Kejar - Vol - 1 - No - 2
18 pages
MPC 006 Previous Year Question Papers by
No ratings yet
MPC 006 Previous Year Question Papers by
67 pages
Regression Algorithms in MATLAB Guide
No ratings yet
Regression Algorithms in MATLAB Guide
2 pages
RegressionAnalysisTutorial ArcGIS10
No ratings yet
RegressionAnalysisTutorial ArcGIS10
24 pages
Python Training Curriculum Detailed
No ratings yet
Python Training Curriculum Detailed
4 pages
VAT Collection in West Shoa Zone
No ratings yet
VAT Collection in West Shoa Zone
19 pages
Statistics I
100% (2)
Statistics I
686 pages
Sjoberg Fredrik
No ratings yet
Sjoberg Fredrik
75 pages
Beer Sales Data Analysis 2003
No ratings yet
Beer Sales Data Analysis 2003
84 pages
Types of Multiple Regression Explained
No ratings yet
Types of Multiple Regression Explained
35 pages
STATA Software: Basics of Data Analysis
No ratings yet
STATA Software: Basics of Data Analysis
31 pages
Unit-I Introduction To Human Resource Analytics
No ratings yet
Unit-I Introduction To Human Resource Analytics
41 pages

Stepwise Logistic Regression in R

Uploaded by

Stepwise Logistic Regression in R

Uploaded by

Stepwise Logistic Regression with R

Akaike information criterion: AIC = 2k - 2 log L

Small numbers are better

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedom

Number of Fisher Scoring iterations: 4

> nothing <- glm(low ~ 1,family=binomial)

(Dispersion parameter for binomial family taken to be 1)

Number of Fisher Scoring iterations: 4

> # Here was the chosen model from earlier

Step: AIC= 219.43

Step: AIC= 217.99

> # backwards = step(fullmod,trace=0) would suppress step by step output.

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedom

Number of Fisher Scoring iterations: 4

> # I would be inclined to drop ptl

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 234.67 on 188 degrees of freedom

Number of Fisher Scoring iterations: 4

Step: AIC= 231.89

Step: AIC= 229.41

Step: AIC= 223.96

Step: AIC= 222.85

Step: AIC= 218.9

Step: AIC= 217.99

You might also like