Model Selection
Model Selection
Model selection
A new example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
A multiple linear regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Some candidate linear regression models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Coefficients of multiple linear determination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Adjusted R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Comparing Ra2 for models with the same number of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Best subset selection - exhaustive search based on Ra2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Overall best model selected according to Ra2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Best subset selection - exhaustive search based on AIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Best subset selection - exhaustive search based on SBC/BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Overall best model selected according to SBC/BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Comparison between AIC and SBC/BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1
A new example
In a study for evaluating which characteristics of an house affect its sale price, the following information have been
collected on a sample of 24 houses sold in a given area during a given year:
■ Y: Sale price of the house (thousands of dollars)
■ X1: Taxes (local, county, school - thousands of dollars)
■ X2: Number of bathrooms
■ X3: Lot size (thousands of square feet)
■ X4: Living space (thousands of square feet)
■ X5: Number of garage stalls
■ X6: Number of bedrooms
■ X7: Age of of the house (years)
■ X8: Number of fireplaces
X1 X2 X3 X4 X5 X6 X7 X8
V IFk 4.7235 2.5938 2.4376 3.8278 1.8197 2.8489 2.3201 1.4963
2
Some candidate linear regression models
Model R formula Regressors
M1 Y~X3 Lot size
M2 Y~X4 Living space
M3 Y~X4+X5 Living space, number of garage stalls
M4 Y~X2+X5 Number of bathrooms, number of garage stalls
M5 Y~X2+X4+X6 Number of bathrooms, living space, number of bedrooms
M6 Y~X3+X4+X6 Lot size, living space, number of bedrooms
3
Adjusted R2
Model R formula R2 Ra2
M1 Y~X3 0.4194 0.3930
M2 Y~X4 0.5009 0.4782
M3 Y~X4+X5 0.5503 0.5075
M4 Y~X2+X5 0.6001 0.5620
M5 Y~X2+X4+X6 0.6058 0.5467
M6 Y~X3+X4+X6 0.5947 0.5339
Note that, when comparing the R2a values for models with the same number of regressors:
■ M2 is better than M1
■ M4 is better than M3
■ M5 is (slightly) better than M6
Furthermore:
■ according R2a , M4 is better than M2 and M5
SSE(M1) n − 1 SSE(M2) n − 1
Ra2 (M1) > Ra2 (M2) ⇔ 1− · >1− ·
SST O n−p SST O n−p
SSE(M1) n − 1 SSE(M2) n − 1
⇔ · < ·
SST O n−p SST O n−p
4
Best subset selection - exhaustive search based on Ra2
0.8
1 Y~1 0 0
2 Y~X1 0.7637 0.7530
3 Y~X1+X2 0.7981 0.7788
0.6
4 Y~X1+X2+X8 0.8112 0.7829
5 Y~X1+X2+X5+X6 0.8321 0.7968 R2
6 Y~X1+X2+X5+X6+X8 0.8396 0.7950 R2a
0.4
7 Y~X1+X2+X4+X5+X6+X8 0.8456 0.7911
8 Y~X1+X2+X4+X5+X6+X7+X8 0.8497 0.7839
9 Y~. 0.8506 0.7710
0.2
0.0
2 4 6 8
■ Note that the sequence of best models is not necessarily a nested sequence
(the best model for p = 4 is not nested in the best model for p = 5)
■ According to R2a , the overall best model is Y~X1+X2+X5+X6
X1 X2 X5 X6
V IFk 2.1389 2.0124 1.7127 1.6611
5
Best subset selection - exhaustive search based on AIC
6
Overall best model selected according to SBC/BIC
Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.1120 2.9961 3.3750 0.0029
X1 2.7170 0.4911 5.5320 0.0000
X2 6.0985 3.2271 1.8898 0.0727
Residual standard error: 2.828 on 21 degrees of freedom
Multiple R-squared: 0.7981, Adjusted R-squared: 0.7788
F-statistic: 41.5 on 2 and 21 DF, p-value: 5.067e-08
X1 X2
V IFk 1.7366 1.7366
n>8
SSE
n × ln
n
AIC
SBC/BIC
0
When n > 8, ln(n) > 2, SBC/BIC gives more weight to the parsimony term. Thus, SBC/BIC tends to select
multiple linear regression models that are simpler (with less regressors) than the ones selected by AIC