W3 - Linear Regression
W3 - Linear Regression
Some definitions
Typical questions :
- What is the intercept? The intercept is 31.6 which is the exam_grade of someone who did
not study at all x = study_time = 0.
- What is the slope? The slope is 1.83 and represents the effect of study_time on exam_grade.
In this case for every hour you study, on average your grade increases by 1.83 points.
- What is the expected grade of a student who studied 10 hours? -> Change X in the equation:
y=31.6 +1.83∗7−2=42.41
To do: Extra example: Let’s say the regression equation is : ^y (Price)=25+11 ,50 X ¿ Imagine your
friend found out a very good deal for a room in Enschede. This room is 14 m^2 and the real cost of
the rent is 250 euros. Do you think this is really a good deal? If so, why? By how much?
Y = 25+11,50(14) = 186 is the expected, your friend is stupid. (250 would be the observed)
Testing whether the relationship is significant (Data from Unit 531): We are interested in the
relationship between weight and height. The taller you are the more you weigh? Imagine you expect
a positive relationship
Linear equation:
Hypothesis:
R output:
Call:
lm(formula = weight ~ height, data = .)
Residuals:
Min 1Q Median 3Q Max
-36.700 -9.362 -1.771 7.162 63.421
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -68.54888 3.37147 -20.33 <2e-16 ***
height 0.84342 0.01938 43.52 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
b1 0.843
Test statistic: t= = =43.52
SE b 1 0,019
- P-Value < or > than alpha (0,05) -> Reject or not Reject H0.
- In our example P-value <0.05, so we reject H0 and we can “accept” H1 which is that there is a
significant positive effect of height on weight.
- What is the slope and what does it means in this context? For every cm taller, your weight
increases on average by 0.84
Same R output:
Call:
lm(formula = weight ~ height, data = .)
Residuals:
Min 1Q Median 3Q Max
-36.700 -9.362 -1.771 7.162 63.421
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -68.54888 3.37147 -20.33 <2e-16 ***
height 0.84342 0.01938 43.52 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
New questions:
By hand:
Using R:
2.5 % 97.5 %
(Intercept) -75.1583964 -61.9393557
height 0.8054237 0.8814085
**NB: You can also reject the null hypothesis using your confidence interval:
- If the 95% C.I does NOT includes zero -> Reject H0
- If the 95% C.I DOES includes zero -> Don’t Reject H0
- In our example since 0 is not within the interval [0.805 ; 0.881] we can reject H0 and say that
the effect of height on weight is significant.