0% found this document useful (0 votes)
6 views4 pages

W3 - Linear Regression

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

W3 - Linear Regression

Uploaded by

z13612909240
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

W3 (Units 530-531) – Regression

Some definitions

Linear equation for the population (using Greek letters): Y = β0 + β 1∗X i

In the example below: Exam grade =β 0 + β 1∗Study timei

Least square equation (using sample data you


can fill in the numbers): ^y =b0 +b 1 X 1

b0: Intercept to “Y axis” when x = 0


b1: Slope of the variable X1
X1: Independent variable (X-axis) ->
Study_time
Y : Dependent variable (Y-axis) ->
Exam_grade

How to read R output :

^y =b0 +b 1 X 1 So -> ^y (grade)=31.6+1.83 X (Study time )

Typical questions :

- What is the intercept? The intercept is 31.6 which is the exam_grade of someone who did
not study at all x = study_time = 0.

- What is the slope? The slope is 1.83 and represents the effect of study_time on exam_grade.
In this case for every hour you study, on average your grade increases by 1.83 points.

- What is the expected grade of a student who studied 10 hours? -> Change X in the equation:

^y =31.6 +1.83 X 1 -> ^y =31.6 +1.83∗10 = 49.9


- It seems that the residual for a student is -2pts. What is the real/observed grade of a
student who studied 7 hours ? -> Change X in the regression equation + or – The residual.

y=31.6 +1.83∗7−2=42.41
To do: Extra example: Let’s say the regression equation is : ^y (Price)=25+11 ,50 X ¿ Imagine your
friend found out a very good deal for a room in Enschede. This room is 14 m^2 and the real cost of
the rent is 250 euros. Do you think this is really a good deal? If so, why? By how much?

Y = 25+11,50(14) = 186 is the expected, your friend is stupid. (250 would be the observed)

Testing whether the relationship is significant (Data from Unit 531): We are interested in the
relationship between weight and height. The taller you are the more you weigh? Imagine you expect
a positive relationship

Linear equation:

Y = β0 + β 1∗X i → Weight=β 0 + β 1∗Height i

Hypothesis:

- HO: β 1=0(Thereis not significant effect)


- H1: β 1 ≠ 0 or β 1> 0(If you expect a positive relationship)

R output:

Call:
lm(formula = weight ~ height, data = .)

Residuals:
Min 1Q Median 3Q Max
-36.700 -9.362 -1.771 7.162 63.421

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -68.54888 3.37147 -20.33 <2e-16 ***
height 0.84342 0.01938 43.52 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.35 on 5120 degrees of freedom


Multiple R-squared: 0.27, Adjusted R-squared: 0.2699
F-statistic: 1894 on 1 and 5120 DF, p-value: < 2.2e-16

b1 0.843
Test statistic: t= = =43.52
SE b 1 0,019

Conclusion about the hypothesis: Is the effect of Height on weigh significant ?

- P-Value < or > than alpha (0,05) -> Reject or not Reject H0.
- In our example P-value <0.05, so we reject H0 and we can “accept” H1 which is that there is a
significant positive effect of height on weight.

Typical questions: (to do)

- Write the linear equation: y(Weight) = bo + b1x(Height) -> Y = -68.54 + 0.84X


- Suppose someone who is 180cm has a residual of +10. What is his real/observed weight?
o First: substitute X in formula : Y = -68.54 + 0.84(180) = 82,66.
o Second: Add or subtract the residual Y = + 10 = 82,66 + 10 = 92,66.

- What is the slope and what does it means in this context? For every cm taller, your weight
increases on average by 0.84

Same R output:

Call:
lm(formula = weight ~ height, data = .)

Residuals:
Min 1Q Median 3Q Max
-36.700 -9.362 -1.771 7.162 63.421

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -68.54888 3.37147 -20.33 <2e-16 ***
height 0.84342 0.01938 43.52 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 13.35 on 5120 degrees of freedom


Multiple R-squared: 0.27, Adjusted R-squared: 0.2699
F-statistic: 1894 on 1 and 5120 DF, p-value: < 2.2e-16

New questions:

95% Confidence interval for the slope

By hand:

- C.I = Estimate ± Margin of error


¿
- C.I = b 1 ± t ∗SE(b 1)
o We know b1 and SE(b1)
o We can assume t ¿=1.96 since our sample size is large enough (n=5122)
o More precisely, you can find it with R:
o critical_t <- qt(p=.05/2, df=5120, lower.tail=FALSE) = 1.9604
¿
- C.I = 0.84 ± 1.9604 ∗0.019=[0.805; 0.881]

Using R:

confint(L_model1, level = 0.95)

2.5 % 97.5 %
(Intercept) -75.1583964 -61.9393557
height 0.8054237 0.8814085

**NB: You can also reject the null hypothesis using your confidence interval:
- If the 95% C.I does NOT includes zero -> Reject H0
- If the 95% C.I DOES includes zero -> Don’t Reject H0
- In our example since 0 is not within the interval [0.805 ; 0.881] we can reject H0 and say that
the effect of height on weight is significant.

You might also like