0% found this document useful (0 votes)
3 views9 pages

PSYC2012 Module 12 Correlation and Regression

Lecture notes

Uploaded by

thea.eveml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views9 pages

PSYC2012 Module 12 Correlation and Regression

Lecture notes

Uploaded by

thea.eveml
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

PSYC2012 Module 12 Correlation and Regression

Evaluates the linear association between two “continuous”


variables.
- Population denotation:
o ρ XY
- Sample-based estimate of ρ XY :
o r XY
- We are looking at the Pearson product-moment correlation
coefficient.

- Correlation coefficient captures the straight-line relation between


two variables.

Example:
- 12 participants spirituality and longevity.
o The Pearson product-moment correlation between spirituality
and longevity was .59.
HOW TO CALCULATE:
- Sum of products:
o First step: how far and in what direction do scores vary from
the mean on each variable?
 X −X
 Y −Y
o Sample estimate denotation:
 X −μ X
 Y −μ Y
- A positive correlation suggests that individuals who score high on X
(above the X mean – X > X ) also score high on Y (above the Y mean –
Y >Y )
o So if ( X > X ) >0, we would expect (Y >Y ¿> 0
 Likewise, those below X should also be below Y
o So for a positive correlation, if we multiply these deviation
scores, it should be positive.
 ( X −X ) ( Y −Y )> 0
- Sum of products:
o SP XY =∑ ( X− X ) (Y −Y )

Sum of products increases with N


- We don’t want this for correlation coefficient – we want to factor in
sample size so that our correlation shows that an association of
N=12 for a data set has the same SIZE association as N=24 for the
same data set with points duplicated.
o Factor this in – COVARIANCE.

Covariance:
SP XY
- c^ov XY =
N −1
- This is the sample-based estimate of the population covariance (
SP XY
cov XY = )
N
o As sample size increases, SP XY increases but population
covariance remains the same.
o As sample size increases, estimate of covariance (sample)
gets closer to population covariance.
- PROBLEM:
o Covariance changes with the scale of either variable (because
SP XY changes when the scale changes).
 E.g., changing the scale from milliseconds to seconds
shouldn’t change the association between that variable
(time) and the other.
 Covariance does change – issue.
o Solution:
 Standardise the covariance. When we do this, we get
the correlation coefficient.

Correlation coefficient:
- Divide by the product of the (estimated) standard deviations of X
and Y.
o Population:
cov XY
 ρ XY =
σXσY
o The sample-based estimate is:
c^ ov XY
 r XY =
σ^ X σ^ Y
- HOW TO CALCULATE:
c^ ov XY
o r XY =
σ^ X σ^ Y

 To find c^
ov XY =
∑ ( X− X ) (Y −Y )
N−1
 To find σ^ X =
√∑
∑ ( X −X )2
N −1


2
 To find σ^ Y =( Y −Y )
N −1

o r XY =
∑ ( X −X ) ( Y −Y ) = SP XY

∑ ( X−X )2 ∑ ( Y −Y )2 √ SS X SSY
Properties of a correlation coefficient:
- Ranges from +1 to -1.
- Assumes a linear relation between X and Y.
- WE must have enough variability on X and Y – SSx and SSy will be 0.
o Range restriction: if variance is restricted (by restricting the
range of possible values on one variable), its correlation with
another variable is likely to be reduced.
 E.g., an easy test where everyone is getting 100%.

Making inferences about ρ XY from r XY


- Correlations between spirituality and longevity:
o r XY =0.5938
o ρ XY =?
- Ho: ρ XY =0
- Ha: ρ XY ≠ 0
o We can attach a p-value to a correlation (given its sample
size). Using software, we would find in this case that p=.042.
 Conclude: those who live longer tend to have
significantly higher levels of spirituality (r=.59, N=12),
p=.042.

Sampling distribution becomes increasingly skewed as ρ XY


approaches the extremes (due to boundaries).

-
o How do we deal with this skew?

Fisher z transformation:
- Use a non-linear transformation to make distribution of r XY normal.
- After transforming both r and ρ , we calculate a z statistic.
' '
z r −z ρ
z=


o 1
N −3
- Example:
o Our null hypothesis value ρ XY =0
o Our sample correlation was r XY =0.59
 We need to find transformed values for both of these.
o

' '
z r −z ρ
z=


- 1
N −3
o z 'r is the transformed value for observed r
 r XY =.59 → z 'r =0.678
o z 'ρis the transformed value for hypothesised ρ
 ρ XY =0 → z 'ρ=0
o

1 is the standard error of '
N −3
 (N=12)
zr

o Then it is just a z-test.


' '
z r −z ρ 0.678−0 0.678−0
z= = = =2.03

√ √
- 1 1 0.3333
N −3 12−3
o Go to z tables and find an associated p value.
o This is an observed z value. How do we evaluate it?
 Use z tables - .05 two tailed gives critical z of +-1.96.
- Observed z is more extreme than critical z, so Reject Ho.
o Conclusion:
 Those who live longer tend to have significantly higher
levels of spirituality (r = .59, N = 12), z=2.03, p<.05.
Confidence intervals:
-
1
( 1−α ) ×100 % C . I .=z 'r ± z c
N −3 √
o NOTE: This formular will produce confidence intervals on the
Fisher’s z’ scale. YOU MUST REMEMBER to transform the upper
and lower limits back to r.
- z’r is the transformed value for observed r
o rxy = 0.59 -> z’r = 0.678.
- zc is the critical z for the desired level of confidence.
o zc =+-1.96 for a 95% CI.
-
√ 1
N −3
is the standard error of z’r =0.3333 for N=12.
o Example:
 95% CI = 0.678+-1.96 x 0.3333 = (0.025 < z’p < 1.331).
 NEED TO TRANFORM BACK TO r…
o Use z’ table transformation of r.
 (.025 < ρ XY < .87).

Regression:
- We looked at correlation as a method of examining a bivariate
association (association between two variables).
o We can get more information by instead using a linear
regression.
- Simple linear regression:
o Coefficient of determination: correlation coefficient squared (r-
2
)
 r2 refers to the proportion of variability in Y that can be
accounted for (or predicted/explained) given knowledge
about scores on X (or vice versa).

o
- Example:
o The correlation between pain interference and depression
is .72. What proportion of variability in depression is
accounted for by pain interference?
 r = .72.
 r2 = (.72)2 = .523.
o This can now be interpreted as a percentage:
 About 52.3% of the variability in depression is
accounted for by pain interference.

o
- Correlation is useful for providing a standardised estimate of the
linear relation between two continuous variables:
o BUT if we want to more specifically describe the linear relation
OR to make explicit predictions about Y using X, we use
regression analysis.
Simple linear regression RESTRICTIONS:
- 1. We only examine one independent variable (this is the simple).
- 2. We only examine straight-line relations (this is the linear).
o Use the general equation of a straight line:
 Y =a+bX
 Where a is the y ‘intercept’.
 B is the ‘slope’ – change in y as x increases by 1.
The regression model:
- Incorporates errors in prediction (ei) onto the general equation for a
straight line:
o Y i=a+b X i + ei
 In the population
o Y^i= a^ + b^ X i + ei
 Is the sample-based model.
 i in this case is a property of an individual – individual 1,
2, 3 etc.

o
- Yi is the actual score on the dependent variable for the ith person.
- Xi is the actual score on the predictor (or independent variable) for
the ith person.
- ei is the error when predictor scores on the dependent variable for
the ith person.
o Y^i=a+b X i
 Y^i is the predicted dependent variable score for a person
with a given score on the independent variable X i
 Y i−Y^i=e i
- Example:
o Y^i= a^ + b^ X i
o a^ = -6.222 (intercept)
o b^ = -3.662 (slope)
 Y^i=−6.222+ 3.662 X i
 Predicted depression score = -6.222 + 3.662 x
pain interference score
- Sample conclusion using simple linear regression results:
o It appears that higher pain interference is associated with
higher depression scores, such that depression score is
predicted to increase by 3.662 points for every additional pain
interference unit, b = 3.66, t(23) = 5.02, p <.001.
- Standardised regression coefficient:
o Usually denoted ^β
 = predicted number of standard deviations change in Y
for a 1 standard deviation increase in X.
 In simple linear regression, ^β=r XY
o Standardised regression equation:
 ^z Y =β × z X
 E.g., if I am 1.5 standard deviations below the
mean on pain interference (i.e., z X = -1.5), what
depression score am I predicted to have?
o ^z depression=.732×−1.5
 -1.10.
o i.e., predicted to be 1.1 standard deviations
below the mean depression score.
- Unstandardised vs Standardised?
o Unstandardised: As pain interference increased by 1 point,
depression is predicted to increase by 3.662 points.
o Standardised: AS pain interference increased by 1 standard
deviation, depression is predicted to increase by 0.723
standard deviations.
 Should we interpret unstandardised (b) or standardised (
β )?
- NO good rule.
o Guidelines:
 When there’s more than one predictor in the model, β
for different predictors can be directly compared by not
b.
 If there is uncertainty about the meaning of a “unit
increase” on the predictor, it is safer to interpret β .
 Otherwise, interpreting b is preferable, as it allows
interpretations to be expression in terms of the original
units of the dependent variable and predictor.

Partitioning variance:
- We take the variance of an independent variable and examine how
much of it is explained and unexplained.
o Why do scores on our dependent variables vary? Usually the
independent variables in our statistical modelling, and
whatever is left over (within-group variability/error).
 In regression we call this left over variance
residual variance.

-
o SST = sums of squares total.
o SSP = sums of squares regression.
o SSR = sums of squares residual.
- The regression line is the ‘line of best fit’ because it minimises errors
in prediction – no other straight line through the scatterplot will
provide a smaller SSR.

Effect size:
- What proportion of variability in depression is accounted for by pain
interference?
o R2 = SSP/SST

You might also like