0% found this document useful (0 votes)

28 views

HW2 Applied Questions: 1 Problem 6

1. The document discusses linear regression models to predict miles per gallon (mpg) from horsepower for an automotive data set. The regression finds a significant negative relationship between horsepower and mpg. 2. Bivariate correlations between variables in the automotive data are examined, finding many strong correlations between variables like cylinders, displacement, weight, and horsepower. 3. A multiple linear regression model is fitted to predict mpg from all other variables except name. Weight, year, and origin are found to be significant predictors of mpg in the full model.

Uploaded by

Maxillas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

HW2 Applied Questions: 1 Problem 6

Uploaded by

Maxillas

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

HW2 Applied Questions

魏羿晖 2020200732

1 Problem 6

1.1 a

setwd("D:/files/study/ISL")
auto = read.csv("Auto.csv", header = T, na.strings = "?")
auto = na.omit(auto)

library(MASS)
library(ISLR)
lm.fit = lm(mpg~horsepower, data = auto)
summary(lm.fit)

##
## Call:
## lm(formula = mpg ~ horsepower, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.5710 -3.2592 -0.3435 2.7630 16.9240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.935861 0.717499 55.66 <2e-16 ***

1
1 PROBLEM 6 2

## horsepower -0.157845 0.006446 -24.49 <2e-16 ***

## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.906 on 390 degrees of freedom
## Multiple R-squared: 0.6059, Adjusted R-squared: 0.6049
## F-statistic: 599.7 on 1 and 390 DF, p-value: < 2.2e-16
predict(lm.fit, data.frame(horsepower=c(98)), interval="confidence")

## fit lwr upr

## 1 24.46708 23.97308 24.96108
predict(lm.fit, data.frame(horsepower=c(98)), interval="prediction")

## fit lwr upr

## 1 24.46708 14.8094 34.12476

回归结果中 horsepower 的系数为负，表明预测变量与响应变量之间存在着

负相关关系。相对应的 F 统计量远大于 1，p 值几乎为零，表明预测变量与
响应变量之间的相关关系十分显著。R 方为 0.6059，表示以 horsepower 作
为预测变量的线性回归能够解释 mpg 的 60.59% 的变异性。horsepower=98
时的预测值、置信区间与预测区间见输出结果。

1.2 b

plot(auto$horsepower, auto$mpg)
abline(lm.fit)
1 PROBLEM 6 3

40
auto$mpg

30
20
10

50 100 150 200

auto$horsepower

1.3 c

par(mfrow = c(2,2))
plot(lm.fit)
2 PROBLEM 7 4

Standardized residuals
Residuals vs Fitted Normal Q−Q

0 2 4
323 330323
15
Residuals

334 330 334

−15 0

−3
5 10 15 20 25 30 −3 −2 −1 0 1 2 3

Fitted values Theoretical Quantiles

Standardized residuals

Standardized residuals
Scale−Location Residuals vs Leverage
323
330
334
117
14
9

2
1.0

−2
Cook's distance
0.0

5 10 15 20 25 30 0.000 0.010 0.020 0.030

Fitted values Leverage

残差图的拟合线为 U 型，可以看出残差与预测值存在相关关系，表明数据
具有非线性。Scale-Location 图表明残差项可能有着非恒定的方差。

2 Problem 7

2.1 a

auto$name = as.factor(auto$name)
pairs(auto)
2 PROBLEM 7 5

3 5 7 50 200 10 20 1.0 2.5

mpg

10
7

cylinders
3

displacement

100
horsepower
50

weight

1500
acceleration
10

70 82
year

origin
1.0

0 300
name

10 30 100 400 1500 4500 70 76 82 0 150

2.2 b

cor(subset(auto, select = -name))

## mpg cylinders displacement horsepower weight

## mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
## cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
## displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
## horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
## weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
## acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
## year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
## origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
## acceleration year origin
## mpg 0.4233285 0.5805410 0.5652088
## cylinders -0.5046834 -0.3456474 -0.5689316
## displacement -0.5438005 -0.3698552 -0.6145351
2 PROBLEM 7 6

## horsepower -0.6891955 -0.4163615 -0.4551715

## weight -0.4168392 -0.3091199 -0.5850054
## acceleration 1.0000000 0.2903161 0.2127458
## year 0.2903161 1.0000000 0.1815277
## origin 0.2127458 0.1815277 1.0000000

2.3 c

lm.fit1 = lm(mpg~.-name, data = auto)

summary(lm.fit1)

##
## Call:
## lm(formula = mpg ~ . - name, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.5903 -2.1565 -0.1169 1.8690 13.0604
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.218435 4.644294 -3.707 0.00024 ***
## cylinders -0.493376 0.323282 -1.526 0.12780
## displacement 0.019896 0.007515 2.647 0.00844 **
## horsepower -0.016951 0.013787 -1.230 0.21963
## weight -0.006474 0.000652 -9.929 < 2e-16 ***
## acceleration 0.080576 0.098845 0.815 0.41548
## year 0.750773 0.050973 14.729 < 2e-16 ***
## origin 1.426141 0.278136 5.127 4.67e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.328 on 384 degrees of freedom
2 PROBLEM 7 7

## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182

## F-statistic: 252.4 on 7 and 384 DF, p-value: < 2.2e-16

F 统计量远大于 1，表示拒绝所有回归系数均为 0 的原假设，因此预测变量

与响应变量之间存在相关关系。displacement，weight，year 和 origin 与响
应变量有显著关系。year 的回归系数为 0.750773，表示控制其余变量，车龄
每增加一年，油耗会增加大约 0.75mpg。

2.4 d

par(mfrow = c(2,2))
plot(lm.fit1) Standardized residuals

Residuals vs Fitted Normal Q−Q

3323
327 327323
Residuals

26 326
2
5

−2
−10

10 15 20 25 30 35 −3 −2 −1 0 1 2 3

Fitted values Theoretical Quantiles

Standardized residuals

Scale−Location Residuals vs Leverage

2.0

3323
32726 327 0.5
394
2
1.0

−2

14
Cook's distance
0.0

10 15 20 25 30 35 0.00 0.05 0.10 0.15

Fitted values Leverage

plot(predict(lm.fit1), rstudent(lm.fit1))
2 PROBLEM 7 8

rstudent(lm.fit1)

3
0
−3

10 15 20 25 30 35

predict(lm.fit1)

残差图的拟合线为 U 型，可以看出残差与预测值仍存在一定的相关关系。
观测点 14 为高杠杆点。存在异常大的离群点，因为一部分点的学生化残差
大于 3。

2.5 e

lm.fit2 = lm(mpg~cylinders*displacement, data = auto)

lm.fit3 = lm(mpg~weight + displacement + displacement:weight, data = auto)
summary(lm.fit2)

##
## Call:
## lm(formula = mpg ~ cylinders * displacement, data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.0432 -2.4308 -0.2263 2.2048 20.9051
##
2 PROBLEM 7 9

## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.22040 2.34712 20.545 < 2e-16 ***
## cylinders -2.41838 0.53456 -4.524 8.08e-06 ***
## displacement -0.13436 0.01615 -8.321 1.50e-15 ***
## cylinders:displacement 0.01182 0.00207 5.711 2.24e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.454 on 388 degrees of freedom
## Multiple R-squared: 0.6769, Adjusted R-squared: 0.6744
## F-statistic: 271 on 3 and 388 DF, p-value: < 2.2e-16
summary(lm.fit3)

##
## Call:
## lm(formula = mpg ~ weight + displacement + displacement:weight,
## data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.8664 -2.4801 -0.3355 1.8071 17.9429
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.372e+01 1.940e+00 27.697 < 2e-16 ***
## weight -8.931e-03 8.474e-04 -10.539 < 2e-16 ***
## displacement -7.831e-02 1.131e-02 -6.922 1.85e-11 ***
## weight:displacement 1.744e-05 2.789e-06 6.253 1.06e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.097 on 388 degrees of freedom
2 PROBLEM 7 10

## Multiple R-squared: 0.7265, Adjusted R-squared: 0.7244

## F-statistic: 343.6 on 3 and 388 DF, p-value: < 2.2e-16

选取相关系数最大的两组变量分别进行有交互项的线性回归，均存在统计
显著的交互作用。

2.6 f

lm.fit4 = lm(mpg~log(displacement) + sqrt(weight) + year + I(year^2), data = auto)

summary(lm.fit4)

##
## Call:
## lm(formula = mpg ~ log(displacement) + sqrt(weight) + year +
## I(year^2), data = auto)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.9700 -1.8228 -0.0033 1.6218 13.5212
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 434.93957 75.22099 5.782 1.52e-08 ***
## log(displacement) -3.42454 0.90388 -3.789 0.000176 ***
## sqrt(weight) -0.51231 0.06243 -8.206 3.43e-15 ***
## year -10.40999 1.97815 -5.262 2.36e-07 ***
## I(year^2) 0.07339 0.01301 5.641 3.26e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.123 on 387 degrees of freedom
## Multiple R-squared: 0.8416, Adjusted R-squared: 0.8399
## F-statistic: 513.9 on 4 and 387 DF, p-value: < 2.2e-16
3 PROBLEM 8 11

par(mfrow=c(2,2))
plot(lm.fit4)

Standardized residuals
Residuals vs Fitted Normal Q−Q
5 15

323 245323
Residuals

245
327 327

2
−2
−10

10 15 20 25 30 35 −3 −2 −1 0 1 2 3

Fitted values Theoretical Quantiles

Standardized residuals

Standardized residuals
Scale−Location Residuals vs Leverage
0.0 1.0 2.0

245323
327 4
0 387

Cook's
112distance
335
−4

10 15 20 25 30 35 0.00 0.02 0.04 0.06

Fitted values Leverage

选取 lm.fit1 中统计显著的几个预测变量，进行不同变换后的回归结果仍是
统计显著的。残差图中的点几乎没有规律，表明模型对数据的拟合度较好。
杠杆图中点的杠杆值均较小。但 Normal Q-Q 图则表示残差可能不是正态
分布的。

3 Problem 8

3.1 a

summary(Weekly)

## Year Lag1 Lag2 Lag3

## Min. :1990 Min. :-18.1950 Min. :-18.1950 Min. :-18.1950
## 1st Qu.:1995 1st Qu.: -1.1540 1st Qu.: -1.1540 1st Qu.: -1.1580
## Median :2000 Median : 0.2410 Median : 0.2410 Median : 0.2410
3 PROBLEM 8 12

## Mean :2000 Mean : 0.1506 Mean : 0.1511 Mean : 0.1472

## 3rd Qu.:2005 3rd Qu.: 1.4050 3rd Qu.: 1.4090 3rd Qu.: 1.4090
## Max. :2010 Max. : 12.0260 Max. : 12.0260 Max. : 12.0260
## Lag4 Lag5 Volume Today
## Min. :-18.1950 Min. :-18.1950 Min. :0.08747 Min. :-18.1950
## 1st Qu.: -1.1580 1st Qu.: -1.1660 1st Qu.:0.33202 1st Qu.: -1.1540
## Median : 0.2380 Median : 0.2340 Median :1.00268 Median : 0.2410
## Mean : 0.1458 Mean : 0.1399 Mean :1.57462 Mean : 0.1499
## 3rd Qu.: 1.4090 3rd Qu.: 1.4050 3rd Qu.:2.05373 3rd Qu.: 1.4050
## Max. : 12.0260 Max. : 12.0260 Max. :9.32821 Max. : 12.0260
## Direction
## Down:484
## Up :605
##
##
##
##
cor(Weekly[,-9])

## Year Lag1 Lag2 Lag3 Lag4

## Year 1.00000000 -0.032289274 -0.03339001 -0.03000649 -0.031127923
## Lag1 -0.03228927 1.000000000 -0.07485305 0.05863568 -0.071273876
## Lag2 -0.03339001 -0.074853051 1.00000000 -0.07572091 0.058381535
## Lag3 -0.03000649 0.058635682 -0.07572091 1.00000000 -0.075395865
## Lag4 -0.03112792 -0.071273876 0.05838153 -0.07539587 1.000000000
## Lag5 -0.03051910 -0.008183096 -0.07249948 0.06065717 -0.075675027
## Volume 0.84194162 -0.064951313 -0.08551314 -0.06928771 -0.061074617
## Today -0.03245989 -0.075031842 0.05916672 -0.07124364 -0.007825873
## Lag5 Volume Today
## Year -0.030519101 0.84194162 -0.032459894
## Lag1 -0.008183096 -0.06495131 -0.075031842
## Lag2 -0.072499482 -0.08551314 0.059166717
## Lag3 0.060657175 -0.06928771 -0.071243639
3 PROBLEM 8 13

## Lag4 -0.075675027 -0.06107462 -0.007825873

## Lag5 1.000000000 -0.05851741 0.011012698
## Volume -0.058517414 1.00000000 -0.033077783
## Today 0.011012698 -0.03307778 1.000000000
pairs(Weekly[,-9])

−15 0 −15 0 −15 0 −15 0

Year

1990
Lag1
−15

Lag2

−15
Lag3
−15

Lag4

−15
Lag5
−15

6
Volume

0
Today
−15

1990 2005 −15 0 −15 0 0 4 8

相关系数矩阵和散点图表示，大部分预测变量之间的相关系数接近于 0，
前几天的投资回报率与当天的回报率相关性很小。唯一的特例是 year 和
volume。可以看到 volume 随着 year 的增加逐渐增长。

3.2 b

glm.fit = glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data = Weekly, family = binomi

summary(glm.fit)

##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 +
3 PROBLEM 8 14

## Volume, family = binomial, data = Weekly)

##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6949 -1.2565 0.9913 1.0849 1.4579
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.26686 0.08593 3.106 0.0019 **
## Lag1 -0.04127 0.02641 -1.563 0.1181
## Lag2 0.05844 0.02686 2.175 0.0296 *
## Lag3 -0.01606 0.02666 -0.602 0.5469
## Lag4 -0.02779 0.02646 -1.050 0.2937
## Lag5 -0.01447 0.02638 -0.549 0.5833
## Volume -0.02274 0.03690 -0.616 0.5377
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1496.2 on 1088 degrees of freedom
## Residual deviance: 1486.4 on 1082 degrees of freedom
## AIC: 1500.4
##
## Number of Fisher Scoring iterations: 4

Lag2 为统计显著的预测变量，其 p 值为 0.0296。

3.3 c

glm.probs = predict(glm.fit, type = "response")

glm.pred = rep("Down",length(glm.probs))
glm.pred[glm.probs > .5] = "Up"
3 PROBLEM 8 15

table(glm.pred, Weekly$Direction)

##
## glm.pred Down Up
## Down 54 48
## Up 430 557
mean(glm.pred == Weekly$Direction)

## [1] 0.5610652

当模型预测市场下跌时，有 54/(48+54)=52.9% 的概率预测正确，当模型预

测市场上涨时，有 557/(557+430)=56.4% 的概率预测正确。整体预测准确
率约为 56.11%。

3.4 d 逻辑斯蒂回归

tr = (Weekly$Year < 2009)

te = Weekly[!tr,]
glm.fit1 = glm(Direction~Lag2, data = Weekly, family = binomial, subset = tr)
summary(glm.fit1)

##
## Call:
## glm(formula = Direction ~ Lag2, family = binomial, data = Weekly,
## subset = tr)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.536 -1.264 1.021 1.091 1.368
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.20326 0.06428 3.162 0.00157 **
3 PROBLEM 8 16

## Lag2 0.05810 0.02870 2.024 0.04298 *

## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1354.7 on 984 degrees of freedom
## Residual deviance: 1350.5 on 983 degrees of freedom
## AIC: 1354.5
##
## Number of Fisher Scoring iterations: 4
glm.probs1 = predict(glm.fit1, te, type = "response")
glm.pred1 = rep("Down",length(glm.probs1))
glm.pred1[glm.probs1 > .5] = "Up"
table(glm.pred1, Weekly$Direction[!tr])

##
## glm.pred1 Down Up
## Down 9 5
## Up 34 56
mean(glm.pred1 == Weekly$Direction[!tr])

## [1] 0.625

3.5 e LDA

lda.fit = lda(Direction~Lag2, data = Weekly, subset = tr)

summary(lda.fit)

## Length Class Mode

## prior 2 -none- numeric
## counts 2 -none- numeric
## means 2 -none- numeric
3 PROBLEM 8 17

## scaling 1 -none- numeric

## lev 2 -none- character
## svd 1 -none- numeric
## N 1 -none- numeric
## call 4 -none- call
## terms 3 terms call
## xlevels 0 -none- list
lda.pred = predict (lda.fit, te)
names(lda.pred)

## [1] "class" "posterior" "x"

lda.class = lda.pred$class
table(lda.class, Weekly$Direction[!tr])

##
## lda.class Down Up
## Down 9 5
## Up 34 56
mean(lda.class == Weekly$Direction[!tr])

## [1] 0.625

3.6 f QDA

qda.fit = qda(Direction~Lag2, data = Weekly, subset = tr)

summary(qda.fit)

## Length Class Mode

## prior 2 -none- numeric
## counts 2 -none- numeric
## means 2 -none- numeric
## scaling 2 -none- numeric
## ldet 2 -none- numeric
3 PROBLEM 8 18

## lev 2 -none- character

## N 1 -none- numeric
## call 4 -none- call
## terms 3 terms call
## xlevels 0 -none- list
qda.pred = predict (qda.fit, te)
names(qda.pred)

## [1] "class" "posterior"

qda.class = qda.pred$class
table(qda.class, Weekly$Direction[!tr])

##
## qda.class Down Up
## Down 0 0
## Up 43 61
mean(qda.class == Weekly$Direction[!tr])

## [1] 0.5865385

3.7 g KNN: K = 1

library(class)
train.X = as.matrix(Weekly$Lag2[tr])
test.X = as.matrix(Weekly$Lag2[!tr])
train.Direction = Weekly$Direction[tr]
set.seed(1)
knn.pred = knn(train.X, test.X, train.Direction, k = 1)
table(knn.pred, Weekly$Direction[!tr])

##
## knn.pred Down Up
## Down 21 30
3 PROBLEM 8 19

## Up 22 31
mean(knn.pred == Weekly$Direction[!tr])

## [1] 0.5

综上，逻辑斯蒂回归和 LDA 的整体预测准确率最高，均为 0.625。因此目

前为止这两种方法的结果最好。

3.8 i

3.8.1 KNN ：K = 3

##
## knn.pred Down Up
## Down 16 20
## Up 27 41
mean(knn.pred == Weekly$Direction[!tr])

## [1] 0.5480769

3.8.2 KNN ：K = 4

library(class)
train.X = as.matrix(Weekly$Lag2[tr])
3 PROBLEM 8 20

test.X = as.matrix(Weekly$Lag2[!tr])
train.Direction = Weekly$Direction[tr]
set.seed(1)
knn.pred = knn(train.X, test.X, train.Direction, k = 4)
table(knn.pred, Weekly$Direction[!tr])

##
## knn.pred Down Up
## Down 20 17
## Up 23 44
mean(knn.pred == Weekly$Direction[!tr])

## [1] 0.6153846

3.8.3 KNN ：K = 5

##
## knn.pred Down Up
## Down 16 21
## Up 27 40
mean(knn.pred == Weekly$Direction[!tr])

## [1] 0.5384615
3 PROBLEM 8 21

3.8.4 KNN ：K = 10

##
## knn.pred Down Up
## Down 17 21
## Up 26 40
mean(knn.pred == Weekly$Direction[!tr])

## [1] 0.5480769

在 KNN 分类器中调整 K 的取值，我们发现 k 值取 4 时整体预测准确率最

高，约为 0.615。因为在 b 小问中除 Lag1 和 Lag2 之外的预测变量 p 值均
较高，所以我们在其他模型中采用 Lag1 和 Lag2 的组合。

3.8.5 调整后的逻辑斯蒂回归

tr = (Weekly$Year < 2009)

te = Weekly[!tr,]
glm.fit1 = glm(Direction~Lag1*Lag2, data = Weekly, family = binomial, subset = tr)
summary(glm.fit1)

##
## Call:
## glm(formula = Direction ~ Lag1 * Lag2, family = binomial, data = Weekly,
## subset = tr)
3 PROBLEM 8 22

##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.573 -1.259 1.003 1.086 1.596
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.211419 0.064589 3.273 0.00106 **
## Lag1 -0.051505 0.030727 -1.676 0.09370 .
## Lag2 0.053471 0.029193 1.832 0.06700 .
## Lag1:Lag2 0.001921 0.007460 0.257 0.79680
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1354.7 on 984 degrees of freedom
## Residual deviance: 1346.9 on 981 degrees of freedom
## AIC: 1354.9
##
## Number of Fisher Scoring iterations: 4
glm.probs1 = predict(glm.fit1, te, type = "response")
glm.pred1 = rep("Down",length(glm.probs1))
glm.pred1[glm.probs1 > .5] = "Up"
table(glm.pred1, Weekly$Direction[!tr])

##
## glm.pred1 Down Up
## Down 7 8
## Up 36 53
mean(glm.pred1 == Weekly$Direction[!tr])

## [1] 0.5769231
3 PROBLEM 8 23

3.8.6 调整后的 LDA

lda.fit = lda(Direction~Lag1*Lag2, data = Weekly, subset = tr)

summary(lda.fit)

## Length Class Mode

## prior 2 -none- numeric
## counts 2 -none- numeric
## means 6 -none- numeric
## scaling 3 -none- numeric
## lev 2 -none- character
## svd 1 -none- numeric
## N 1 -none- numeric
## call 4 -none- call
## terms 3 terms call
## xlevels 0 -none- list
lda.pred = predict (lda.fit, te)
names(lda.pred)

## [1] "class" "posterior" "x"

lda.class = lda.pred$class
table(lda.class, Weekly$Direction[!tr])

##
## lda.class Down Up
## Down 7 8
## Up 36 53
mean(lda.class == Weekly$Direction[!tr])

## [1] 0.5769231
3 PROBLEM 8 24

3.8.7 调整后的 QDA

qda.fit = qda(Direction~Lag1*Lag2, data = Weekly, subset = tr)

summary(qda.fit)

## Length Class Mode

## prior 2 -none- numeric
## counts 2 -none- numeric
## means 6 -none- numeric
## scaling 18 -none- numeric
## ldet 2 -none- numeric
## lev 2 -none- character
## N 1 -none- numeric
## call 4 -none- call
## terms 3 terms call
## xlevels 0 -none- list
qda.pred = predict (qda.fit, te)
names(qda.pred)

## [1] "class" "posterior"

qda.class = qda.pred$class
table(qda.class, Weekly$Direction[!tr])

##
## qda.class Down Up
## Down 23 36
## Up 20 25
mean(qda.class == Weekly$Direction[!tr])

## [1] 0.4615385

与调整后的其他方法相比，仍是 KNN 方法中 k 值取 4 时整体预测准确率

最高。

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6417)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (640)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1173)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (992)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1853)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1016)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1138)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5143)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (460)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2126)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (279)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4355)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2001)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2787)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2876)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
1210 Statistical Tools For Procedure Validation Usp
No ratings yet
1210 Statistical Tools For Procedure Validation Usp
14 pages
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (835)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (918)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
MQM100 MultipleChoice Chapter10
71% (7)
MQM100 MultipleChoice Chapter10
28 pages
T Test and Z Test
100% (1)
T Test and Z Test
47 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Design of Experiments Application, Concepts, Examples: State of The Art
No ratings yet
Design of Experiments Application, Concepts, Examples: State of The Art
20 pages
A New Questionnaire For Measuring Quality of Life - The Stark Qol
No ratings yet
A New Questionnaire For Measuring Quality of Life - The Stark Qol
7 pages
Copulas and Their Alternatives
No ratings yet
Copulas and Their Alternatives
9 pages
Analysing Plant Trails by Comparing Recovery-Grade Regression Lines
No ratings yet
Analysing Plant Trails by Comparing Recovery-Grade Regression Lines
10 pages
Mplus 2
No ratings yet
Mplus 2
12 pages
ISO Optics and Optical Instruments - Field Procedures For Testing
No ratings yet
ISO Optics and Optical Instruments - Field Procedures For Testing
26 pages
CH07 NOTES Lecture
No ratings yet
CH07 NOTES Lecture
124 pages
Solutions Chapter6
No ratings yet
Solutions Chapter6
19 pages
Gwprkanip
No ratings yet
Gwprkanip
6 pages
Printed-Prof.sachin Gupta-ASM_Lecture_1 to 8 Combined
No ratings yet
Printed-Prof.sachin Gupta-ASM_Lecture_1 to 8 Combined
154 pages
E 876-89-Estadistica-1
No ratings yet
E 876-89-Estadistica-1
17 pages
Panel Data Method-Baltagi
100% (1)
Panel Data Method-Baltagi
51 pages
DataPower Training Tutorial
No ratings yet
DataPower Training Tutorial
394 pages
HR Project
No ratings yet
HR Project
68 pages
CH 8 Response Surface Methods (Central Composite Designs, CCDS)
No ratings yet
CH 8 Response Surface Methods (Central Composite Designs, CCDS)
34 pages
Introduction To Data Analytics: Sampling Distributions
No ratings yet
Introduction To Data Analytics: Sampling Distributions
31 pages
F Test Table
No ratings yet
F Test Table
7 pages
Chapter 12 Tests of Goodness of Fit and Independence
No ratings yet
Chapter 12 Tests of Goodness of Fit and Independence
25 pages
Structural Equation Modeling With Lisrel Application in Tourism PDF
No ratings yet
Structural Equation Modeling With Lisrel Application in Tourism PDF
18 pages
Bonus Assignment
No ratings yet
Bonus Assignment
11 pages
Complete Download Semiparametric regression 1st Edition David Ruppert PDF All Chapters
100% (3)
Complete Download Semiparametric regression 1st Edition David Ruppert PDF All Chapters
87 pages
Torque Wrench Calibration Presentation
100% (2)
Torque Wrench Calibration Presentation
25 pages
[EBOOK PDF] Download complete Statistics for The Behavioral and Social Sciences A Brief Course 5th Edition Arthur Aron ebook
No ratings yet
[EBOOK PDF] Download complete Statistics for The Behavioral and Social Sciences A Brief Course 5th Edition Arthur Aron ebook
79 pages
Student's T Distribution
No ratings yet
Student's T Distribution
6 pages
Chi-Square Test: DR Ramakanth
No ratings yet
Chi-Square Test: DR Ramakanth
38 pages
Statistics: An Introduction Using R by M.J. Crawley Exercises
No ratings yet
Statistics: An Introduction Using R by M.J. Crawley Exercises
34 pages
Degree of Freedom Statistics
100% (1)
Degree of Freedom Statistics
2 pages

HW2 Applied Questions: 1 Problem 6

Uploaded by

HW2 Applied Questions: 1 Problem 6

Uploaded by

HW2 Applied Questions

## horsepower -0.157845 0.006446 -24.49 <2e-16 ***

## fit lwr upr

## fit lwr upr

回归结果中 horsepower 的系数为负，表明预测变量与响应变量之间存在着

50 100 150 200

334 330 334

Fitted values Theoretical Quantiles

5 10 15 20 25 30 0.000 0.010 0.020 0.030

Fitted values Leverage

3 5 7 50 200 10 20 1.0 2.5

10 30 100 400 1500 4500 70 76 82 0 150

cor(subset(auto, select = -name))

## mpg cylinders displacement horsepower weight

## horsepower -0.6891955 -0.4163615 -0.4551715

lm.fit1 = lm(mpg~.-name, data = auto)

## Multiple R-squared: 0.8215, Adjusted R-squared: 0.8182

F 统计量远大于 1，表示拒绝所有回归系数均为 0 的原假设，因此预测变量

Residuals vs Fitted Normal Q−Q

Fitted values Theoretical Quantiles

Scale−Location Residuals vs Leverage

10 15 20 25 30 35 0.00 0.05 0.10 0.15

Fitted values Leverage

lm.fit2 = lm(mpg~cylinders*displacement, data = auto)

## Multiple R-squared: 0.7265, Adjusted R-squared: 0.7244

lm.fit4 = lm(mpg~log(displacement) + sqrt(weight) + year + I(year^2), data = auto)

Fitted values Theoretical Quantiles

10 15 20 25 30 35 0.00 0.02 0.04 0.06

Fitted values Leverage

## Year Lag1 Lag2 Lag3

## Mean :2000 Mean : 0.1506 Mean : 0.1511 Mean : 0.1472

## Year Lag1 Lag2 Lag3 Lag4

## Lag4 -0.075675027 -0.06107462 -0.007825873

−15 0 −15 0 −15 0 −15 0

1990 2005 −15 0 −15 0 0 4 8

glm.fit = glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data = Weekly, family = binomi

## Volume, family = binomial, data = Weekly)

Lag2 为统计显著的预测变量，其 p 值为 0.0296。

glm.probs = predict(glm.fit, type = "response")

当模型预测市场下跌时，有 54/(48+54)=52.9% 的概率预测正确，当模型预

tr = (Weekly$Year < 2009)

## Lag2 0.05810 0.02870 2.024 0.04298 *

lda.fit = lda(Direction~Lag2, data = Weekly, subset = tr)

## Length Class Mode

## scaling 1 -none- numeric

## [1] "class" "posterior" "x"

qda.fit = qda(Direction~Lag2, data = Weekly, subset = tr)

## Length Class Mode

## lev 2 -none- character

## [1] "class" "posterior"

综上，逻辑斯蒂回归和 LDA 的整体预测准确率最高，均为 0.625。因此目

在 KNN 分类器中调整 K 的取值，我们发现 k 值取 4 时整体预测准确率最

tr = (Weekly$Year < 2009)

3.8.6 调整后的 LDA

lda.fit = lda(Direction~Lag1*Lag2, data = Weekly, subset = tr)

## Length Class Mode

## [1] "class" "posterior" "x"

3.8.7 调整后的 QDA

qda.fit = qda(Direction~Lag1*Lag2, data = Weekly, subset = tr)

## Length Class Mode

## [1] "class" "posterior"

与调整后的其他方法相比，仍是 KNN 方法中 k 值取 4 时整体预测准确率

You might also like