07 Linear Regression Jan30
07 Linear Regression Jan30
Chen)
Lecture 7
January 30, 2024
Linear Regression
THESE SLIDES ARE PROVIDED AS A COURTESY AND STUDY AID FOR YOUR PERSONAL USE.
DO NOT REPOST OR REDISTRIBUTE ANY PART OF THESE SLIDES WITHOUT YOUR INSTRUCTOR’S PERMISSION.
Today’s Topics
• Interpreting and using r values
• Calculating the regression line
2
The equation for calculating Pearson r using z-
scores is
OR…
Don’t forget the
square root!
Pagano, p. 133 3
In general, for relationships found in the
behavioural sciences:
If r is… Interpretation
Equal to 0 No relationship
Leading zeros
(before the
decimal point)
Between 0 and .10 Trivial
are not used in
APA style when Between .10 and .30 Small to medium
reporting
Pearson r values
Between .30 and .50 Medium to large
A. X causes Y
B. Y causes X
C. A third variable causes both X and Y
D. Any of the above
E. None of the above
5
A strong positive relationship (r = .73) exists
between the variables X and Y. This relationship
could exist because:
A. X causes Y
B. Y causes X
C. A third variable causes both X and Y
D. Any of the above
E. None of the above
6
7
https://www.statology.org/correlation-does-not-imply-causation-examples
8
https://dayoftheshirt.com/shirts/157048/correlation-does-not-imply-causation-snorgtees 9
https://xkcd.com/552/
https://thequalityadvisor.blogs
pot.com/2016/03/correlation-
does-not-equal-causation.html
10
https://www.cnn.com/2014/09/04/health/no-sleep-brain-size/index.html
https://www.cbsnews.com/news/sugar-rush-to-prison-study-says-lots-of- 11
candy-could-lead-to-violence/
https://www.cnn.com/2014/09/04/health/no-sleep-brain-size/index.html
12
Correlation ≠ Causation, but…
Sometimes we care more about prediction than
causation.
For example, if you want to know how hot it is in
Vancouver, it’s more useful to know…
14
Pagano, p. 124
Y = bX + a
a = Y intercept
b = slope
Find a and b.
A. 500; 1000
B. 0.40; 500
C. 0; 500
D. 500; 0.40
E. None of the above
15
Pagano, p. 124
Y = bX + a
a = Y intercept
b = slope
Find a and b.
A. 500; 1000
B. 0.40; 500
C. 0; 500
D. 500; 0.40
E. None of the above
17
Pagano, p. 124
• Most variables in psychology will not be
perfectly correlated.
• However, as long as the relationship is linear, a
“line of best fit” can still be calculated.
• This line can help us make predictions about Yi
given Xi.
18
Pagano, p. 131
Which regression line would give the smallest
errors when predicting Yi given Xi?
A B C
19
Which regression line would give the smallest
errors when predicting Yi given Xi?
A B C
or in technical language…
25
Most of the variability in the number of fingers
people have can be accounted for by…
A. Genetics
B. Environmental factors
C. I know you’re trying to trick me, but I’m not sure
how.
27
r Proportion
of variability
r2 = proportion of explained
the variability of Y .10 .01
accounted for by X .20 .04
.30 .09
Pagano, p.140
(Same information as previous slide
but expressed in percentages)
29
In a sample of students,
height and weight are
correlated with r = .65.
What percentage of the
variability in weight is
accounted for by height
in this sample?
A. .65
B. .42
C. 65.00 Pagano, p.140
(Same information as previous slide
D. 42.25 but expressed in percentages)
30
In a sample of students,
height and weight are
correlated with r = .65.
What percentage of the
variability in weight is
accounted for by height
in this sample?
A. .65
B. .42
C. 65.00 Pagano, p.140
(Same information as previous slide
D. 42.25 but expressed in percentages)
31
Pearson r is used for describing linear
relationships, when X and Y are both measured
on interval or ratio scales
32
If the relationship is curvilinear, the correlation
coefficient eta (η) can be used to describe the
strength of the relationship
33
Other linear correlation coefficients:
– If one or both variables are measured on an
ordinal scale, the Spearman rank order correlation
coefficient rho (rs) can be used
– If one of the variables is interval or ratio and the
other is dichotomous, the biserial correlation
coefficient (rb) can be used
– If both variables are dichotomous, the phi
coefficient (Φ) can be used
35
Today’s Topics
• Interpreting and using r values
• Calculating the regression line
36
Let’s say we want to use IQ to predict GPA
How do we
calculate the line
of best fit?
37
Pagano, p. 161
Finding the line of best fit
Pagano, p. 162
We want to find a line that minimizes the total error (deviation of each
of the points from the line).
Same logic as for calculating variance – square the deviations first so 38
that the positive and negative values don’t just cancel each other out
Least-squares regression line
The least-squares regression line is the
prediction line that minimizes the total error of
prediction, according to the least-squares
criterion of ∑ (Y −Y ')2
39
Least-squares regression line
Like any line, the regression line can be defined
with an equation in the general form
Y = bX + a
Let’s first
calculate by
40
Pagano, p. 163
Least-squares regression line
Calculate by
Use when r, sy, and sx
with one of have already been
these two calculated
formulas:
OR
41
Pagano, p. 163, p.173
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r
Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
42
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r
Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
43
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r
Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
Step 2: Calculate the sum (Σ) for all columns
44
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r
Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
Step 2: Calculate the sum (Σ) for all columns
45
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r
Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
Step 2: Calculate the sum (Σ) for all columns
Step 3: You’re ready to use your formula!
46
Pagano, p. 134
Least-squares regression line
47
Least-squares regression line
X=
∑ X
N
Y=
∑ Y
Already
N calculated!
49
Least-squares regression line
For your assignments
and exams, report
regression constants
(ay and by) to 4 decimal
places.
Pagano, p. 164 50
Recommended Homework
Problems at end of textbook Ch. 7 (p. 179-182):
– 1-9, 13
– For extra practice: 10-11, 14, 16-18