0% found this document useful (0 votes)

15 views7 pages

Model Selection

StatModek

Uploaded by

kz4scq65gy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views7 pages

Model Selection

StatModek

Uploaded by

kz4scq65gy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Multiple linear regression models:

Model selection

A new example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
A multiple linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Some candidate linear regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Coefficients of multiple linear determination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Comparing Ra2 for models with the same number of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Best subset selection - exhaustive search based on Ra2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Overall best model selected according to Ra2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Best subset selection - exhaustive search based on AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Best subset selection - exhaustive search based on SBC/BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Overall best model selected according to SBC/BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Comparison between AIC and SBC/BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1
A new example
In a study for evaluating which characteristics of an house affect its sale price, the following information have been
collected on a sample of 24 houses sold in a given area during a given year:
■ Y: Sale price of the house (thousands of dollars)
■ X1: Taxes (local, county, school - thousands of dollars)
■ X2: Number of bathrooms
■ X3: Lot size (thousands of square feet)
■ X4: Living space (thousands of square feet)
■ X5: Number of garage stalls
■ X6: Number of bedrooms
■ X7: Age of of the house (years)
■ X8: Number of fireplaces

Stat. Mod. Giuliano Galimberti – 2

A multiple linear regression model

Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.4054 4.3603 3.3038 0.0048
X1 1.8164 0.8243 2.2036 0.0436
X2 7.1389 4.0135 1.7787 0.0956
X3 0.1472 0.4768 0.3087 0.7618
X4 2.7334 4.2492 0.6433 0.5298
X5 2.0652 1.3388 1.5425 0.1438
X6 -1.9124 1.7936 -1.0662 0.3032
X7 -0.0383 0.0651 -0.5887 0.5648
X8 1.4875 1.6593 0.8964 0.3842
Residual standard error: 2.878 on 15 degrees of freedom
Multiple R-squared: 0.8506, Adjusted R-squared: 0.771
F-statistic: 10.68 on 8 and 15 DF, p-value: 5.936e-05

X1 X2 X3 X4 X5 X6 X7 X8
V IFk 4.7235 2.5938 2.4376 3.8278 1.8197 2.8489 2.3201 1.4963

⇒ Which regressors should be kept in the model?

Stat. Mod. Giuliano Galimberti – 3

2
Some candidate linear regression models
Model R formula Regressors
M1 Y~X3 Lot size
M2 Y~X4 Living space
M3 Y~X4+X5 Living space, number of garage stalls
M4 Y~X2+X5 Number of bathrooms, number of garage stalls
M5 Y~X2+X4+X6 Number of bathrooms, living space, number of bedrooms
M6 Y~X3+X4+X6 Lot size, living space, number of bedrooms

Stat. Mod. Giuliano Galimberti – 4

Coefficients of multiple linear determination

Model R formula R2
M1 Y~X3 0.4194
M2 Y~X4 0.5009
M3 Y~X4+X5 0.5503
M4 Y~X2+X5 0.6001
M5 Y~X2+X4+X6 0.6058
M6 Y~X3+X4+X6 0.5947
Note that, when comparing models with the same number of regressors:
■ M2 is better than M1
■ M4 is better than M3
■ M5 is (slightly) better than M6
Furthermore:
■ the R2 values for M2 and M5 could be compared using a partial F test (M2 is nested in M5)
However:
■ the R2 for M2 and M4 should not be compared (M2 is not nested in M4)
■ the R2 for M4 and M5 should not be compared (M4 is not nested in M5)

Stat. Mod. Giuliano Galimberti – 5

3
Adjusted R2
Model R formula R2 Ra2
M1 Y~X3 0.4194 0.3930
M2 Y~X4 0.5009 0.4782
M3 Y~X4+X5 0.5503 0.5075
M4 Y~X2+X5 0.6001 0.5620
M5 Y~X2+X4+X6 0.6058 0.5467
M6 Y~X3+X4+X6 0.5947 0.5339
Note that, when comparing the R2a values for models with the same number of regressors:
■ M2 is better than M1
■ M4 is better than M3
■ M5 is (slightly) better than M6
Furthermore:
■ according R2a , M4 is better than M2 and M5

Stat. Mod. Giuliano Galimberti – 6

Comparing Ra2 for models with the same number of parameters

Consider two models, fitted on the same sample of units:
M1: y = X1 β 1 + ε 1
M2: y = X2 β 2 + ε 2
such that p1 = p2 = p (where p1 and p2 denote the numbers of columns of X1 and X2 , respectively). Then

SSE(M1) n − 1 SSE(M2) n − 1
Ra2 (M1) > Ra2 (M2) ⇔ 1− · >1− ·
SST O n−p SST O n−p

SSE(M1) n − 1 SSE(M2) n − 1
⇔ · < ·
SST O n−p SST O n−p

⇔ SSE(M1) < SSE(M2)

⇔ R2 (M1) > R2 (M2)

Stat. Mod. Giuliano Galimberti – 7

4
Best subset selection - exhaustive search based on Ra2

p Best model (R formula) R2 R2a

0.8
1 Y~1 0 0
2 Y~X1 0.7637 0.7530
3 Y~X1+X2 0.7981 0.7788

0.6
4 Y~X1+X2+X8 0.8112 0.7829
5 Y~X1+X2+X5+X6 0.8321 0.7968 R2
6 Y~X1+X2+X5+X6+X8 0.8396 0.7950 R2a

0.4
7 Y~X1+X2+X4+X5+X6+X8 0.8456 0.7911
8 Y~X1+X2+X4+X5+X6+X7+X8 0.8497 0.7839
9 Y~. 0.8506 0.7710

0.2
0.0
2 4 6 8

■ Note that the sequence of best models is not necessarily a nested sequence
(the best model for p = 4 is not nested in the best model for p = 5)
■ According to R2a , the overall best model is Y~X1+X2+X5+X6

Stat. Mod. Giuliano Galimberti – 8

Overall best model selected according to Ra2

Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.6212 3.6725 3.7090 0.0015
X1 2.4123 0.5225 4.6168 0.0002
X2 8.4589 3.3300 2.5402 0.0200
X5 2.0604 1.2235 1.6840 0.1085
X6 -2.2154 1.2901 -1.7173 0.1022
Residual standard error: 2.711 on 19 degrees of freedom
Multiple R-squared: 0.8321, Adjusted R-squared: 0.7968
F-statistic: 23.54 on 4 and 19 DF, p-value: 3.866e-07

X1 X2 X5 X6
V IFk 2.1389 2.0124 1.7127 1.6611

Stat. Mod. Giuliano Galimberti – 9

5
Best subset selection - exhaustive search based on AIC

p Best model (R formula) R2 AIC

1 Y~1 0 87.0845
2 Y~X1 0.7637 54.4587
3 Y~X1+X2 0.7981 52.6892
4 Y~X1+X2+X8 0.8112 53.0752
5 Y~X1+X2+X5+X6 0.8321 52.2572
6 Y~X1+X2+X5+X6+X8 0.8396 53.1627
7 Y~X1+X2+X4+X5+X6+X8 0.8456 54.2435
8 Y~X1+X2+X4+X5+X6+X7+X8 0.8497 55.6052
9 Y~. 0.8506 57.4532

■ Also according to AIC, the overall best model is Y~X1+X2+X5+X6

Stat. Mod. Giuliano Galimberti – 10

Best subset selection - exhaustive search based on SBC/BIC

p Best model (R formula) R2 SBC/BIC

1 Y~1 0 88.2626
2 Y~X1 0.7637 56.8148
3 Y~X1+X2 0.7981 56.2234
4 Y~X1+X2+X8 0.8112 57.7874
5 Y~X1+X2+X5+X6 0.8321 58.1474
6 Y~X1+X2+X5+X6+X8 0.8396 60.2310
7 Y~X1+X2+X4+X5+X6+X8 0.8456 62.4899
8 Y~X1+X2+X4+X5+X6+X7+X8 0.8497 65.0297
9 Y~. 0.8506 68.0557

■ According to SBC/BIC, the overall best model is Y~X1+X2

Stat. Mod. Giuliano Galimberti – 11

6
Overall best model selected according to SBC/BIC
Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.1120 2.9961 3.3750 0.0029
X1 2.7170 0.4911 5.5320 0.0000
X2 6.0985 3.2271 1.8898 0.0727
Residual standard error: 2.828 on 21 degrees of freedom
Multiple R-squared: 0.7981, Adjusted R-squared: 0.7788
F-statistic: 41.5 on 2 and 21 DF, p-value: 5.067e-08

X1 X2
V IFk 1.7366 1.7366

Stat. Mod. Giuliano Galimberti – 12

Comparison between AIC and SBC/BIC

n>8

SSE
n × ln
n
AIC

SBC/BIC
0

When n > 8, ln(n) > 2, SBC/BIC gives more weight to the parsimony term. Thus, SBC/BIC tends to select
multiple linear regression models that are simpler (with less regressors) than the ones selected by AIC

Stat. Mod. Giuliano Galimberti – 13

The Elements of Quantitative Investing
From Everand
The Elements of Quantitative Investing
Giuseppe A. Paleologo
No ratings yet
Hands-On AI Trading with Python, QuantConnect, and AWS
From Everand
Hands-On AI Trading with Python, QuantConnect, and AWS
Jiri Pik
3/5 (1)
ASSIGN 310SE Tang Kah Mun 11536048 PDF
No ratings yet
ASSIGN 310SE Tang Kah Mun 11536048 PDF
23 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
10 Tips For Schools & Teachers For The Individual Oral
100% (2)
10 Tips For Schools & Teachers For The Individual Oral
2 pages
AML Risk Management Wih Policy and Procedures 16 October 2013
75% (4)
AML Risk Management Wih Policy and Procedures 16 October 2013
85 pages
Module07 - Model Selection and Regularization
No ratings yet
Module07 - Model Selection and Regularization
46 pages
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
No ratings yet
Lecture 5 Model Selection I: STAT 441: Statistical Methods For Learning and Data Mining
17 pages
Week 10_Lecture 10
No ratings yet
Week 10_Lecture 10
59 pages
Lecture 3
No ratings yet
Lecture 3
61 pages
Model Selection-Handout PDF
No ratings yet
Model Selection-Handout PDF
57 pages
Ch5 Slide VariableSelection
No ratings yet
Ch5 Slide VariableSelection
36 pages
BCOR 3750 Multiple Linear Regression Models
No ratings yet
BCOR 3750 Multiple Linear Regression Models
9 pages
S2-Linear-Regression-LKW-9March2025
No ratings yet
S2-Linear-Regression-LKW-9March2025
23 pages
Regression Model
No ratings yet
Regression Model
30 pages
Rio Thesis _054559
No ratings yet
Rio Thesis _054559
53 pages
3rd Module EDBA Contiuation1
No ratings yet
3rd Module EDBA Contiuation1
6 pages
Chapter 06 Linear Reg
No ratings yet
Chapter 06 Linear Reg
24 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
SM Notes 2020
No ratings yet
SM Notes 2020
139 pages
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
No ratings yet
Least-Squares Data Fitting: EE263 Autumn 2015 S. Boyd and S. Lall
17 pages
Model Selection
No ratings yet
Model Selection
11 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Week8_Lecture_1_ML_SPR25
No ratings yet
Week8_Lecture_1_ML_SPR25
20 pages
L2D-Multiple Regression D 2022-03-03 21_20_03
No ratings yet
L2D-Multiple Regression D 2022-03-03 21_20_03
31 pages
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
No ratings yet
DMV Unit 3 PPT_RSK_250419_125620 jfhuehiwhu
89 pages
STA302 Week12 Full
No ratings yet
STA302 Week12 Full
30 pages
Regression_PDF
No ratings yet
Regression_PDF
33 pages
lec3
No ratings yet
lec3
69 pages
Reg07
No ratings yet
Reg07
22 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
Diagnostic Tests2
No ratings yet
Diagnostic Tests2
25 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
2 Modele lineare
No ratings yet
2 Modele lineare
43 pages
Multiple Regression
No ratings yet
Multiple Regression
100 pages
Chapter 4
No ratings yet
Chapter 4
23 pages
Multiple Regression
100% (1)
Multiple Regression
100 pages
PGDBA
No ratings yet
PGDBA
13 pages
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
No ratings yet
Chapter 3 Multiple Linear Regression: Ray-Bing Chen Institute of Statistics National University of Kaohsiung
45 pages
Chapter 6 Variable Selection and Model Building
No ratings yet
Chapter 6 Variable Selection and Model Building
32 pages
Stat 378
No ratings yet
Stat 378
73 pages
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
No ratings yet
A Universal Selection Method in Linear Regression Models: Eckhard Liebscher
10 pages
Applied Business Forecasting and Planning: Multiple Regression Analysis
No ratings yet
Applied Business Forecasting and Planning: Multiple Regression Analysis
100 pages
Flexible Model Selection Criterion For Multiple Regression: Kunio Takezawa
No ratings yet
Flexible Model Selection Criterion For Multiple Regression: Kunio Takezawa
7 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
0 Regularization PDF
No ratings yet
0 Regularization PDF
88 pages
Math2831 Course Pack
No ratings yet
Math2831 Course Pack
246 pages
Jurnal Asli Diagram Sa
No ratings yet
Jurnal Asli Diagram Sa
11 pages
Lesson 5 Model Selection
No ratings yet
Lesson 5 Model Selection
41 pages
1725857551_SMA32
No ratings yet
1725857551_SMA32
30 pages
Multiple Linear Regression-Part V Exhaustive Search: Dr. Gaurav Dixit
No ratings yet
Multiple Linear Regression-Part V Exhaustive Search: Dr. Gaurav Dixit
10 pages
Regression Models Course Notes
No ratings yet
Regression Models Course Notes
102 pages
DISC 212 Session 13
No ratings yet
DISC 212 Session 13
29 pages
Practice 01 Linear Regression
No ratings yet
Practice 01 Linear Regression
3 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Best Subsets Regression (Menu)
No ratings yet
Best Subsets Regression (Menu)
5 pages
Predictive Analytics (2)
No ratings yet
Predictive Analytics (2)
46 pages
3
No ratings yet
3
19 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Linear-Model-Selection-and-Regularization
No ratings yet
Linear-Model-Selection-and-Regularization
23 pages
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
Generative AI for Trading and Asset Management
From Everand
Generative AI for Trading and Asset Management
Hamlet Jesse Medina Ruiz
No ratings yet
Assignment 1 - Operating System
No ratings yet
Assignment 1 - Operating System
137 pages
Republic Act 9367
No ratings yet
Republic Act 9367
19 pages
M-17 - MV PANELS - PN-5143003 General Arrangement Assembly Drawing and Floor Frames
No ratings yet
M-17 - MV PANELS - PN-5143003 General Arrangement Assembly Drawing and Floor Frames
29 pages
Price List-Signature Villas
No ratings yet
Price List-Signature Villas
1 page
Rajesh Upadhyay: C-26, Sector 62, Noida, Uttar Pradesh 201309
No ratings yet
Rajesh Upadhyay: C-26, Sector 62, Noida, Uttar Pradesh 201309
4 pages
MG Full Annual Report Web
No ratings yet
MG Full Annual Report Web
68 pages
Partg - Guidelines On Fire Engineering: Ove Arup & Partners Hong Kong LTD
No ratings yet
Partg - Guidelines On Fire Engineering: Ove Arup & Partners Hong Kong LTD
27 pages
Assessing the Role of Emotions in B2B Decision Making an Exploratory Study
No ratings yet
Assessing the Role of Emotions in B2B Decision Making an Exploratory Study
23 pages
CP 03010-1972 (1999) PDF
No ratings yet
CP 03010-1972 (1999) PDF
88 pages
15 - Venue Exam Answer
No ratings yet
15 - Venue Exam Answer
6 pages
You Exec - Customer Acquisition Free
No ratings yet
You Exec - Customer Acquisition Free
8 pages
3 sbl (1)
No ratings yet
3 sbl (1)
6 pages
Operating System: MSIT (Master of Science in Information Technology)
No ratings yet
Operating System: MSIT (Master of Science in Information Technology)
9 pages
EM Product Assignment
No ratings yet
EM Product Assignment
2 pages
Parts Catalog: Model NB3200AU/A (CE SPEC.) NB3300AU-E (CE SPEC.)
No ratings yet
Parts Catalog: Model NB3200AU/A (CE SPEC.) NB3300AU-E (CE SPEC.)
15 pages
Public Disclosure Report Pakistan Jul 2023
No ratings yet
Public Disclosure Report Pakistan Jul 2023
13 pages
The Lodge Treasurer Handbook
No ratings yet
The Lodge Treasurer Handbook
24 pages
Creative Waste Properly and Correctly Colorful Infographic
No ratings yet
Creative Waste Properly and Correctly Colorful Infographic
4 pages
Registration Number: Secunderabad
No ratings yet
Registration Number: Secunderabad
2 pages
Tef2017 1000 Names-1
No ratings yet
Tef2017 1000 Names-1
2 pages
Cast Certificate
No ratings yet
Cast Certificate
2 pages
Hotel recipt
No ratings yet
Hotel recipt
1 page
Application Form
No ratings yet
Application Form
3 pages
VITAE 40 Transport and Emergency Ventilator - Brochure - en
No ratings yet
VITAE 40 Transport and Emergency Ventilator - Brochure - en
8 pages
N5808N0284 - Bs - Cat - Jade 300 - Lug22 - ENG - LR
No ratings yet
N5808N0284 - Bs - Cat - Jade 300 - Lug22 - ENG - LR
20 pages
Iatf 16949 Clause To Eqms Module
100% (3)
Iatf 16949 Clause To Eqms Module
8 pages
Es 116 Engineering Special Topics/Seminar: Reaction Paper For
No ratings yet
Es 116 Engineering Special Topics/Seminar: Reaction Paper For
6 pages

Model Selection

Uploaded by

Model Selection

Uploaded by

Multiple linear regression models:

Stat. Mod. Giuliano Galimberti – 2

A multiple linear regression model

⇒ Which regressors should be kept in the model?

Stat. Mod. Giuliano Galimberti – 3

Stat. Mod. Giuliano Galimberti – 4

Coefficients of multiple linear determination

Stat. Mod. Giuliano Galimberti – 5

Stat. Mod. Giuliano Galimberti – 6

Comparing Ra2 for models with the same number of parameters

⇔ SSE(M1) < SSE(M2)

⇔ R2 (M1) > R2 (M2)

Stat. Mod. Giuliano Galimberti – 7

p Best model (R formula) R2 R2a

Stat. Mod. Giuliano Galimberti – 8

Overall best model selected according to Ra2

Stat. Mod. Giuliano Galimberti – 9

p Best model (R formula) R2 AIC

■ Also according to AIC, the overall best model is Y~X1+X2+X5+X6

Stat. Mod. Giuliano Galimberti – 10

Best subset selection - exhaustive search based on SBC/BIC

p Best model (R formula) R2 SBC/BIC

■ According to SBC/BIC, the overall best model is Y~X1+X2

Stat. Mod. Giuliano Galimberti – 11

Stat. Mod. Giuliano Galimberti – 12

Comparison between AIC and SBC/BIC

Stat. Mod. Giuliano Galimberti – 13

You might also like