0% found this document useful (0 votes)
15 views51 pages

07 Linear Regression Jan30

The document discusses linear regression and correlation. It provides information on interpreting Pearson's r values, calculating the regression line and slope, and how r values indicate the proportion of variability in the outcome (Y) that is accounted for by the predictor (X). Pearson's r is best used for describing linear relationships between two interval/ratio variables, while eta (η) can describe strength of curvilinear relationships.

Uploaded by

Vanessa Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views51 pages

07 Linear Regression Jan30

The document discusses linear regression and correlation. It provides information on interpreting Pearson's r values, calculating the regression line and slope, and how r values indicate the proportion of variability in the outcome (Y) that is accounted for by the predictor (X). Pearson's r is best used for describing linear relationships between two interval/ratio variables, while eta (η) can describe strength of curvilinear relationships.

Uploaded by

Vanessa Wong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

PSYC 218 006 (Dr.

Chen)
Lecture 7
January 30, 2024
Linear Regression

THESE SLIDES ARE PROVIDED AS A COURTESY AND STUDY AID FOR YOUR PERSONAL USE.
DO NOT REPOST OR REDISTRIBUTE ANY PART OF THESE SLIDES WITHOUT YOUR INSTRUCTOR’S PERMISSION.
Today’s Topics
• Interpreting and using r values
• Calculating the regression line

2
The equation for calculating Pearson r using z-
scores is

OR…
Don’t forget the
square root!

Pagano, p. 133 3
In general, for relationships found in the
behavioural sciences:
If r is… Interpretation
Equal to 0 No relationship
Leading zeros
(before the
decimal point)
Between 0 and .10 Trivial
are not used in
APA style when Between .10 and .30 Small to medium
reporting
Pearson r values
Between .30 and .50 Medium to large

Greater than .50 Large to very large


4
A strong positive relationship (r = .73) exists
between the variables X and Y. This relationship
could exist because:

A. X causes Y
B. Y causes X
C. A third variable causes both X and Y
D. Any of the above
E. None of the above

5
A strong positive relationship (r = .73) exists
between the variables X and Y. This relationship
could exist because:

A. X causes Y
B. Y causes X
C. A third variable causes both X and Y
D. Any of the above
E. None of the above

6
7
https://www.statology.org/correlation-does-not-imply-causation-examples

8
https://dayoftheshirt.com/shirts/157048/correlation-does-not-imply-causation-snorgtees 9
https://xkcd.com/552/

https://thequalityadvisor.blogs
pot.com/2016/03/correlation-
does-not-equal-causation.html

10
https://www.cnn.com/2014/09/04/health/no-sleep-brain-size/index.html
https://www.cbsnews.com/news/sugar-rush-to-prison-study-says-lots-of- 11
candy-could-lead-to-violence/
https://www.cnn.com/2014/09/04/health/no-sleep-brain-size/index.html
12
Correlation ≠ Causation, but…
Sometimes we care more about prediction than
causation.
For example, if you want to know how hot it is in
Vancouver, it’s more useful to know…

whether it’s July or than whether it’s sunny


December (calendar or cloudy right now
month does not cause (sunshine does cause
heat, but is highly heat, but is only
correlated to moderately correlated to
temperature)… temperature).
13
Correlation and Prediction
Let’s start with an example of two perfectly correlated
variables. How would you predict Yi given Xi?

14
Pagano, p. 124
Y = bX + a

a = Y intercept
b = slope

Find a and b.

A. 500; 1000
B. 0.40; 500
C. 0; 500
D. 500; 0.40
E. None of the above

15
Pagano, p. 124
Y = bX + a

a = Y intercept
b = slope

Find a and b.

A. 500; 1000
B. 0.40; 500
C. 0; 500
D. 500; 0.40
E. None of the above

For calculating the slope,


“Rise” = 900 - 500 = 400
“Run” = 1000 - 0 = 1000 16
Pagano, p. 124
Y = 0.4000X + 500.0000

You can use this formula


to predict Y from any
given value of X.
Regression constants
(ay and by) should be
reported to 4 decimal
places. This helps
retain accuracy in
your final answer
when using the
equation to predict Y.

17
Pagano, p. 124
• Most variables in psychology will not be
perfectly correlated.
• However, as long as the relationship is linear, a
“line of best fit” can still be calculated.
• This line can help us make predictions about Yi
given Xi.

18
Pagano, p. 131
Which regression line would give the smallest
errors when predicting Yi given Xi?

A B C

19
Which regression line would give the smallest
errors when predicting Yi given Xi?

A B C

Note that this relationship


also has the highest r value
20
Pearson r tells us how helpful the regression line
will be in predicting Yi given Xi.

When r = 1, the When r = 0, For r values in


regression line the regression between 0 and 1,
will produce line will not the regression
perfect help at all in line will produce
predictions (no predicting Yi moderate errors
errors) 21
Pearson r also tells us the extent to which
differences in Y can be explained
(mathematically) by differences in X.

or in technical language…

Pearson r also tells us something about how


much of the variability in Y is accounted for by
(the variability in) X.
22
Example: A large cheese pizza costs $20. Each
additional topping costs $2.

$20 $22 $24 $26

The number of pizza toppings accounts for 100%


of the differences (variability) in pizza
prices…but NOT 100% of the total price.
23
Pagano, p. 124
Merchandise sold ($) accounts
for 100% of the differences
(variability) in salary, but not
100% of the total salary 24
Most of the variability in the number of fingers
people have can be accounted for by…
A. Genetics
B. Environmental factors
C. I know you’re trying to trick me, but I’m not sure
how.

25
Most of the variability in the number of fingers
people have can be accounted for by…
A. Genetics
B. Environmental factors
C. I know you’re trying to trick me, but I’m not sure
how.

Although genes determine the


average (modal) number of fingers
that humans have, environmental
factors (injuries; accidents) account
for most of the variability. 26
“Proportion of variability accounted for” is a
statement about a correlation.
– It is not necessarily a statement about a causal
relationship.
– It is a statement about variability (differences
between values in a dataset), not average values.

27
r Proportion
of variability
r2 = proportion of explained
the variability of Y .10 .01
accounted for by X .20 .04
.30 .09

See p. 137-139 of your textbook for a


.40 .16
derivation. (But, you won’t be tested .50 .25
on how this formula is derived.)
.60 .36
.70 .49
.80 .64
.90 .81
1.00 1.00
28
r2 = proportion of
the variability of Y
accounted for by X

See p. 137-139 of your textbook for a


derivation. (But, you won’t be tested
on how this formula is derived.)

Pagano, p.140
(Same information as previous slide
but expressed in percentages)

29
In a sample of students,
height and weight are
correlated with r = .65.
What percentage of the
variability in weight is
accounted for by height
in this sample?

A. .65
B. .42
C. 65.00 Pagano, p.140
(Same information as previous slide
D. 42.25 but expressed in percentages)

30
In a sample of students,
height and weight are
correlated with r = .65.
What percentage of the
variability in weight is
accounted for by height
in this sample?

A. .65
B. .42
C. 65.00 Pagano, p.140
(Same information as previous slide
D. 42.25 but expressed in percentages)

31
Pearson r is used for describing linear
relationships, when X and Y are both measured
on interval or ratio scales

32
If the relationship is curvilinear, the correlation
coefficient eta (η) can be used to describe the
strength of the relationship

33
Other linear correlation coefficients:
– If one or both variables are measured on an
ordinal scale, the Spearman rank order correlation
coefficient rho (rs) can be used
– If one of the variables is interval or ratio and the
other is dichotomous, the biserial correlation
coefficient (rb) can be used
– If both variables are dichotomous, the phi
coefficient (Φ) can be used

SPSS calculates the correlation coefficients rb or Φ automatically


using modified versions of the formula for Pearson r
34
You do NOT need to know how to calculate the
Spearman rank order correlation coefficient (p.
141), or the other types of correlation
coefficients, by hand

You should be able to recognize these other


correlation coefficients, understand when
they’re used, generate them using SPSS (for your
next assignment), and report them properly

35
Today’s Topics
• Interpreting and using r values
• Calculating the regression line

36
Let’s say we want to use IQ to predict GPA

The line of best fit


is the one that
minimizes the
overall prediction
error for GPA

How do we
calculate the line
of best fit?

37
Pagano, p. 161
Finding the line of best fit

Pagano, p. 162
We want to find a line that minimizes the total error (deviation of each
of the points from the line).
Same logic as for calculating variance – square the deviations first so 38
that the positive and negative values don’t just cancel each other out
Least-squares regression line
The least-squares regression line is the
prediction line that minimizes the total error of
prediction, according to the least-squares
criterion of ∑ (Y −Y ')2

For any linear relationship, there is only one line


that will minimize ∑ (Y −Y ')2

39
Least-squares regression line
Like any line, the regression line can be defined
with an equation in the general form
Y = bX + a

Let’s first
calculate by
40
Pagano, p. 163
Least-squares regression line
Calculate by
Use when r, sy, and sx
with one of have already been
these two calculated
formulas:

OR

Use with raw


data

41
Pagano, p. 163, p.173
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r

Step 1: Calculate X2, Y2, and XY, as needed, for all raw values

42
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r

Step 1: Calculate X2, Y2, and XY, as needed, for all raw values

43
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r

Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
Step 2: Calculate the sum (Σ) for all columns
44
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r

Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
Step 2: Calculate the sum (Σ) for all columns
45
Pagano, p. 134
To calculate by from raw data:
Use the same strategy that we used for calculating Pearson r

Step 1: Calculate X2, Y2, and XY, as needed, for all raw values
Step 2: Calculate the sum (Σ) for all columns
Step 3: You’re ready to use your formula!
46
Pagano, p. 134
Least-squares regression line

After calculating by,


we can calculate ay

47
Least-squares regression line

X=
∑ X
N

Y=
∑ Y
Already
N calculated!

*Practice on your own: textbook p.164-169*


48
Least-squares regression line

Now plug by and ay back


into your formula…

49
Least-squares regression line
For your assignments
and exams, report
regression constants
(ay and by) to 4 decimal
places.

…and you have your


regression line!

Pagano, p. 164 50
Recommended Homework
Problems at end of textbook Ch. 7 (p. 179-182):
– 1-9, 13
– For extra practice: 10-11, 14, 16-18

You might also like