0% found this document useful (0 votes)
126 views102 pages

Correlation Analysis

This document discusses measures of relationship, specifically correlation analysis. Correlation analysis examines the linear relationship between two variables. It describes the nature (positive or negative), strength (weak, moderate, strong), and significance of the relationship. A scatter plot can show the pattern of relationships between paired variables. Pearson's correlation coefficient (r) quantifies the strength and direction of the linear relationship between two continuous variables. Calculating r involves determining the covariance relative to the variances. The value of r ranges from -1 to 1, with values farther from 0 indicating a stronger linear relationship.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
126 views102 pages

Correlation Analysis

This document discusses measures of relationship, specifically correlation analysis. Correlation analysis examines the linear relationship between two variables. It describes the nature (positive or negative), strength (weak, moderate, strong), and significance of the relationship. A scatter plot can show the pattern of relationships between paired variables. Pearson's correlation coefficient (r) quantifies the strength and direction of the linear relationship between two continuous variables. Calculating r involves determining the covariance relative to the variances. The value of r ranges from -1 to 1, with values farther from 0 indicating a stronger linear relationship.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

Measures

of
Relationship

Gabino P. Petilos, Ph.D.2


Education Supervisor II
Commission on Higher Education VIII
Athletic Road, Tacloban City
Instead of comparing populations, a
researcher may be interested in finding
out relationship between two variables.
For instance, he may be interested in
knowing whether:

• mental ability is related to school


performance;

• work performance of employees is


significantly related to their level of morale;
or

• study habits is significantly related to


grades in mathematics;

• Religion is related to political affiliation.


In all these problems, the researcher can
choose a research design that will help him
establish such relationships. He can study
the pattern of values of the paired variables
under investigation. The statistical
technique that will help the researcher
establish the relationship between the
paired variables is called correlation
analysis.
Correlation analysis is concerned with the
linear relationship between two variables. It
aims to:

• describe the nature of relationship


between the variables (whether positive or
negative);

• describe the strength linear association


between two variables (whether weak,
moderate, or strong); and

• establish the significance of the


relationship between the variables.
• If the entire data in the population are
analyzed, there is no need to establish
the significance of the obtained nature
and strength of relationship of the
variables.

• However, if a sample is used, there is a


need to establish the significance of the
relationship to find out if there is evidence
to show that the variables are in fact
related in the population from which the
sample was obtained.
POPULATION

=?
X Y

SAMPLE
r=?
X Y

Here, r is the estimate of 


• A positive relationship between two variables
occurs when an increase in value of one variable
corresponds to an increase in value of the other
variable. (or that a decrease in value of one
variable corresponds to a decrease in value of the
other variable.)

• A negative relationship between two variables


exists if high scores on one variable tend to be
accompanied by low scores on the other and
conversely
• For example, it has been shown that IQ
and academic performance are
positively related.

• This means that a person with high IQ


would tend to post good academic
performance in school and in turn a
person's good academic performance is
usually associated with his high IQ.
Examples of variables that are negatively
related are:

• Academic achievement and hours per


week of watching TV

• Time spent in typing practice and number


of typing errors

• Absenteeism rate and job satisfaction


• The nature and strength of linear
correlation between variables may be
described using a scatter plot.

• A scatter plot or scatter diagram is a


graphical device used to determine the
nature of relationship between two
variables and the strength of correlation.





PERFECT POSITIVE CORRELATION


 
 
 

 

HIGH (STRONG) POSITIVE CORRELATION













 

MODERATE POSITIVE CORRELATION


 


 






 




 



LOW (WEAK) POSITIVE CORRELATION








PERFECT NEGATIVE CORRELATION










HIGH NEGATIVE CORRELATION

 

 
 

  


 

MODERATE NEGATIVE CORRELATION



 
 
 
   
 
  
 

  

 

LOW (WEAK) NEGATIVE CORRELATION
 
  
   
 
 
  
 

ZERO CORRELATION
 
  
   
 
   
 
 
   

ZERO CORRELATION
COMMON MEASURES OF CORRELATION
Measurement Scale Measurement Scale
of Variable X of Variable Y Measure

Interval/Ratio Interval/Ratio Pearson’s r

Ordinal Ordinal Spearman’s Rho


CHI-SQUARED BASED
MEASURES OF CORRELATION
Measurement Scale Measurement Scale
of Variable X of Variable Y Measure
Nominal Nominal
Phi Coefficient
(2 Categories) (2 Categories)
Nominal Nominal Contingency
(r Categories) (r categories) Coefficient
Nominal Nominal
Cramer’s V
(r Categories) (c Categories
Proportional Reduction in Error (PRE) BASED
MEASURES OF CORRELATION
Measurement
Measurement Scale
of Variable X
Scale of Variable Measure
Y

NOMINAL NOMINAL
(UNORDERED) (UNORDERED)
Lambda

NOMINAL NOMINAL
(ORDERED) (ORDERED)
Gamma
OTHER MEASURES OF CORRELATION
Measurement Scale Measurement Scale
of Variable X of Variable Y Measure

Categorical
Point-Biserial
(2 categories) Interval/Ratio
(Independent Variable)
(Dependent Variable) (Pearson’s r)

Categorical
(3 or more Interval/Ratio Eta Correlation
categories) (Dependent Variable)
(Independent Variable)
PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT
Assumptions to be satisfied before one can validly
use the Pearson’s r:
1. Both variables x and y must be measured in at least
the interval scale;
2. Observations are sampled from a bivariate normal
distribution; and
3. The variables are linearly related.

 

 
 
 
Bivariate Normal Distribution
Linearly Related Variables
Illustration: The table below shows experimental data
for the observed pairs (x, y).

x 2 3 7 4 6 8 5
y 3 5 8 5 7 10 5
Y

8 
7 
6
5   
4
3 
2
1

1 2 3 4 5 6 7 8 x

Scatter Plot of the Data


WORKSHEET:

x y x2 y2 xy
2 3 4 9 6
3 5 9 25 15
7 8 49 64 56
4 5 16 25 20
6 7 36 49 42
8 10 64 100 80
5 4 25 16 20
 x  35  y  42  x 2  203  y 2  288  x y  239
COMPUTATION OF r:

n  xy  (  x )(  y )
r
n  x 2 2

 ( x) n  y  ( y )
2 2

( 7 )( 239 )  (35)( 42 )

7 ( 203)  35 7 ( 288)  42 
2 2

( 7 )( 239 )  (35)( 42 )

1421  12252016  1764 
 0 . 91
To run Pearson’s r, Click Analyze,
Correlate, Bivariate
To run Pearson’s r, Click “Analyze”, “Correlate”,
“Bivariate”. The window at the left appears.

Transfer the variables to be correlated to the other box by


highlighting them and clicking the arrow button. Click “OK”
SPSS Outputs:

From the computer output, the two variables are highly


correlated (r = .91) and this correlation coefficient is highly
significant (p = .004)
• It is often difficult to decide what value of r
indicates low, moderate, or high degree of
correlation.

• This decision involves the size of r. In


general, however, values close to 1.0
indicate high or strong correlation between
the two variables.

• On the other hand, values close to 0


indicate low or weak correlation while
values that cluster around 0.5 indicate
moderate correlation.
There are some books however that offer a table for
interpreting the values of r. The table below for
instance can be found in Best & Khan (1989)*.

Coefficient Interpretation
0 - .20 Negligible
 .20 - .40 Low
 .40 - .60 Moderate

 .60 - .80 Substantial


 .80 -  1.0 High to Very High
*John W. Best & James V. Khan, Research in Education, 1989
• The correlation coefficient may also be
interpreted using the concept of coefficient of
determination.

• The coefficient of determination is defined as


the square of r or r2.

• This value gives us a measure of the amount of


variability in one variable that can be attributed
to the variation of the other variable and vice
versa.

• Thus, if r = 0.91, r2 = 0.8281 or 82.81% which


means that 82.81% of the variance in one
variable is accounted for by the variation of the
other variable and versa.
TEST OF SIGNIFICANCE (t-test):

n2
t  r
1 r 2

d.f. = n - 2
The Spearman's Rank Order Correlation (rS)

• The Spearman's measure of


correlation is used when both
variables are measured in at least
the ordinal scale.

• This measure does not make


assumptions about normality of
distribution of the data.
rs  1 
6 d 2

n( n  1 )
2

where:

d is the difference between the ranks of paired data;

n is the number of paired cases/data.


Example:

x y Rank(x) Rank(y) d d2
86 87 2 2 0 0
78 80 5 4 1 1
79 78 4 5 -1 1
85 86 3 3 0 0
87 90 1 1 0 0
n = 5, Sum 2
6( 2)
rs  1 
5( 25  1)
12
 1
120
1
 1
10
 1  0 .1  0 .9 0
To run Pearson’s r, Click Analyze,
Correlate, Bivariate
To run Spearman’s Rho, uncheck “Pearson” and
check “Spearman”.

Transfer the variables to be correlated to the other box by


highlighting them and clicking the arrow button. Click “OK”
SPSS Outputs:

From the computer output, the two variables are highly


correlated (r = .90) and this correlation coefficient is
significant (p = .037)
TEST OF SIGNIFICANCE (t-test):

n2
t  rs 
1  rs
2

d.f. = n - 2
 2
 Where: 2 is the computed Chi-Square Statistic
N N = Grand Total

Example: Is Research Productivity Independent of Educational


Qualification?
Engaged in
Educational Research? Total
Qualification
Yes No
Ph.D. 9 23 32
Non-Ph.D. 88 118 206
Total 97 141 238
238  (9)(118) - (88)(23) 
2
χ 
2
(97)(141)( 206)(32)

2
238(962 - 119)
  2.443
90158784

2 2.443
   .010264601 = .10
N 238
SHORT CUT FORMULA FOR PHI-COEFFICIENT

Engaged in
Educational Research? Total
Qualification
Yes No
Ph.D. A=9 B = 23 G = 32
Non-Ph.D. C = 88 D = 118 H = 206
Total E = 97 F = 141 N=238

( BC  AD) ( 88  23  9 118 )
Phi   = .10
EFGH 97 141  32  206

Raw data Summarized Data


To run Phi coefficient, Click “Analyze”,
“Descriptive”, “Crosstab”
To run Phi coefficient, Click “Analyze”, “Descriptive”,
“Crosstab” and you will be shown the window at the
left.

Transfer the row variable (Engaged in Research) and


column variable (Educational Qualification).
Click “Statistics”, check “Phi and Cramer’s V”; Click
“Continue”.

Since the data are summarized, make sure that the “Weighted
Cases” command is “on” before clicking “OK
SPSS Outputs:
SPSS Outputs:
χ2 Where: 2 is the computed Chi-Square
Co  Statistic & N = Grand Total
χ2  N

INTEREST SOCIAL CLASS TOTA


IN SPORTS L
WORKING MIDDLE UPPER
HIGH 12 45 7 64
(17.6) (30.6) (15.8)
MODERAT 24 40 21 85
E (23.4) (40.7) (20.9)
LOW 21 14 23 58
(16.0) (27.7) (14.3)
TOTAL 57 99 51 207
EXAMPLE: Contingency Coefficient

Research Question:
Is interest in sports related to social class?

Null Hypothesis:
There is no significant relationship
between interest in sports and social class.

Alternative Hypothesis:
There is a significant relationship between
interest in sports and social class.
(12  17.6) 2
(45  30.6) 2
(7  15.8) 2
( 24  23 .4 ) 2
χ 
2
  
17.6 30.6 15.8 23 .4
(40  40.7) 2
(21  20.9) (21  16)
2
(14  27.7)
2 2
   
40.7 20.9 16 27.7
(23  14.3)2

14.3

= 27.160
At .05 level of significance and d.f. = (3-1)(3-1) = 4, the tabular
value is 9.49. Since the computed chi-square exceeded the
tabular value, the null hypothesis is rejected. We conclude that
interest is sports and social class are significantly related.
To estimate the strength of association or
relationship, we use the contingency coefficient.

Note that 2= 27.160 and N = 207. Hence , we have

2 27 .160 27 .160
Co     0.116  0.34
 N
2
27 .160  207 234 .16

The value 0.34 is interpreted as weak. We conclude that there


is a weak relationship between interest in sports and social
class (Co = 0.34) and the relationship is significant based on
the computed Chi-square value of 27.160.
Running “Contingency Coefficient” is the same as
running Phi coefficient. Instead of checking “Phi or
Cramer’s V”, check “Contingency Coefficient”
SPSS Outputs:
SPSS Outputs (Chi-Square Value and Contingency Coefficient):

Hence, there is a weak relationship between


interest in sports and social class (Co = 0.34)
and the relationship is highly significant, 2 =
27.16, p< .001.
CRAMER’S COEFFICIENT

 2
Cr 
N(L - 1)

2 is the computed Chi-square value;


N is the grand total in the contingency table, and
L is either the number of rows or the number of
columns, whichever is smaller
Sample Data

Political Year in College Total


Liberalism Freshman Sophomore Junior Senior
Very Liberal 2 3 7 10 22
Moderately 8 12 9 7 36
Liberal
Not Liberal 10 5 4 3 22
Total 20 20 20 20 80
EXAMPLE: Cramer’s Coefficient
Research Question:
Is political liberalism related to years in
college?

Null Hypothesis:
There is no significant relationship between
level of political liberalism and years in
college.

Alternative Hypothesis:
There is a significant relationship between
level of political liberalism and years in college
SPSS Outputs:
SPSS Outputs:

Hence, based on the results of the analysis, there is a


moderate relationship between level of political
liberalism and year in college (Cr = .44) and the
relationship is significant, 2 = 15.149, p= .019.
y o  y1 no  n1
rpb  
sd y n ( n  1)
y0 = mean of group coded 0
y1 = mean of group coded 1
sdy = s.d. of all scores combined
n0 = sample size of group coded 0
n1 = sample size of group coded 1
n = n0 + n1
No Parental With Parental
Support Support
(Coded 0) (Coded 1)
6 8
7 7
5 9
4 8
8
7
x 5.5 7.83
n 4 6
ntotal = 10
sdy = 1.524

7 .83  5 .5 46
rpb    0 .7895  0 .79
1 .524 10 (10  1)
To run point-biserial using SPSS, first recast the data
as follows
Parental Support Score
0 6
0 7
0 5
0 4
1 8
1 7
1 9
1 8
1 8
1 7
TEST OF SIGNIFICANCE (t-test):

n2
t  rpb 
1  rpb
2

d.f. = n - 2
EXAMPLE: ETA CORRELATION

Research Question:
Do teachers’ emotional exhaustion vary across
categories of marital status?

Null Hypothesis:
There is no significant difference in the level of
emotional exhaustion among teachers with different
marital status.

Alternative Hypothesis:
There is no significant difference in the level of
emotional exhaustion among teachers with different
marital status.
single married widowed/separated
X1 X2 X3
34 34 13
38 22 21
35 30 28
31 16 13
34 30 14
34 14 16
34 22 25
27 30 24
34 35 19
34 14 14
35 23 17
34 35 14
31 16
30 25
29 17
31
26
18
15
n1 = 12 n2 = 15 n3 = 19

x 1  404 x 2  395 x 3  366

x 2
1  13676 x 2
2  11153  x 32  7614
x1 = 33.7  x2 = 26.3  x2 = 19.3
S12 = 6.79 S22 = 53.67 S32 = 31.32
COMPUTATION OF ETA CORRELATION COEFFICIENT

Source Sum of df Mean Computed F


Squares Square
Between 1548.42 2 744.21 23.9559
Within 1389.68 43 32.32
Total 2938.10 45
F.05(2, 43) = 3.215

SSB 1548 .42


ETA    0.527  0.73
SST 2938 .10
To compute Eta Correlation using SPSS: Click “Analyze”, “Compare
Means”, “Means”…
Transfer the Dependent Variable and Independent Variable by clicking the arrow
button. Click Options, check “ANOVA table and eta”, Click “Continue, Click “OK”
SPSS Outputs:

Based on the SPSS output, there is a substantial correlation


between marital status and emotional exhaustion among
teachers (Eta = 0.73) and the relationship is highly significant,
F = 23.596, p < 0.001.
Gamma is used when the two variables
A & B are categorical and the
categories can be meaningfully
ordered, ie.,

variable A has k categories A 1, A2, A3, …, Ak


where A1 < A2 < …< Ak.

variable B has r categories and the


categories are ordered in a similar manner,
ie., B1 < B2 < …< Br.
THE DATA ARE CAST INTO A CONTINGENCY TABLE
AS FOLLOWS:

A1 A2 . . . Ak Total
B1 n11 n12 . . . n1k R1
B2 n21 n22 . . . n2k R2
. . . . .
. . . . .
Br nr1 nr2 . . . nrk Rr
Total C1 C2 . . . Ck N
COMPUTATION OF GAMMA G:

P Q
G
PQ

WHERE: P is the number of concordant pairs;


Q is the number of discordant pairs
COMPUTATION OF P:

A1 A2 . Aj . Ak Tota
l
B1 n11 n12 . . . n1k R1
B2 n21 n22 . . . n2k R2
.
N+
Bi . . nij . .
. . . . .
Br nr1 nr2 . . . nrk Rr
Total C1 C2 . . . Ck N
P   nij N 
i, j
COMPUTATION OF Q:

A1 A2 . Aj . Ak Tota
l
B1 n11 n12 . . . n1k R1
B2 n21 n22 . . . n2k R2
.
N-
Bi . . nij . .
. . . . .
Br nr1 nr2 . . . nrk Rr
Total C1 C2 . . . Ck N
Q   nij N 
i, j
Example: Political Liberalism by Year in College

Political Year in College Total


Liberalism
Freshman Sophomore Junior Senior
Very Liberal 2 3 7 10 22
Moderately 8 12 9 7 36
Liberal
Not Liberal 10 5 4 3 22
Total 20 20 20 20 80
The same contingency table but Political
Example:
Liberalism arranged from lowest to highest

Political Year in College Total


Liberalism Freshman Sophomore Junior Senior
Not Liberal 10 5 4 3 22
Moderately 8 12 9 7 36
Liberal
Very Liberal 2 3 7 10 22
Total 20 20 20 20 80
Computation of Gamma G:
Political Year in College Total
Liberalism
Freshman Sophomore Junior Senior
Not Liberal 10 5 4 3 22

Moderately 8 12 9 7 36
Liberal
Very Liberal 2 3 7 10 22
Total 20 20 20 20 80

P  10(12  9  7  3  7  10)  5(9  7  7  10)


 4(7  10)  8(3  7  10)  12(7  10)  9(10)
P  10(48)  5(33)  4(17)  8(20)  12(17)  9(10)
P  480  165  68  160  204  90  1167
Computation of Gamma G:
Political Year in College Total
Liberalism
Freshman Sophomore Junior Senior
Not Liberal 10 5 4 3 22

Moderately 8 12 9 7 36
Liberal
Very Liberal 2 3 7 10 22
Total 20 20 20 20 80

Q  3(8  12  9  2  3  7)  4(8  12  2  3)
 5(8  2)  7(2  3  7)  9(2  3)  12(2)
Q  3(41)  4(25)  5(10)  7(12)  9(5)  12(2)
Q  123  100  50  84  45  24  426
P Q
G
PQ

1167  426 741


 
1167  426 1593
 0.465  0.47
SPSS Outputs:
Test of Significance of G:

H0:: G = 0, i.e., the variables are not significantly related.

PQ
Test Statistic: z  G
N (1  G )
2
Lambda is used when the two variables A
& B are categorical and the categories
cannot be meaningfully ordered, ie.,

Example: A = Marital Status


(Single, Married, Widowed)

B = Employment Status
(Employed, Not employed)
THE DATA ARE CAST INTO A CONTINGENCY
TABLE AS FOLLOWS:

A1 A2 . . . Ak Total
B1 n11 n12 . . . n1k R1
B2 n21 n22 . . . n2k R2
. . . . .
. . . . .
Br nr1 nr2 . . . nrk Rr
Total C1 C2 . . . Ck N
COMPUTATION OF LAMBDA  :

λ
 max imum frequency( A)  max imum frequency( B)
N  max imum frequency( B)

WHERE: A is the assumed independent variable;


B is the assumed dependent variable.
Example: Employment Status by Marital Status

Employment Marital Status Total


Status
Never Married Divorce Widowed
Married d
Employed 21 60 11 6 98
Not 14 65 4 19 102
Employed
Total 35 125 15 25 200
Computation of Lambda:
Employment Marital Status Total
Status
Never Married Divorce Widowed
Married d
Employed 21 60 11 6 98
Not 14 65 4 19 102
Employed
Total 35 125 15 25 200
Treating Marital Status as Independent Variable:

(21  65  11  19)  102 14


λ   0.14
200  102 98
SPSS Outputs:
Thank You!
EXERCISE: The following random sample gives
the number of hours of study for X and score
for Y, in an examination in statistics for 8
students:

X 3 3 4 5 6 6 7 8
Y 45 60 60 70 75 80 75 85

1. Make a scatter plot and describe the nature of


relationship between X and Y.
2. Compute the value of r and interpret the result
X Y X^2 Y^2 XY
3 45 9 2025 135
3 60 9 3600 180
4 60 16 3600 240
5 70 25 4900 350
6 75 36 5625 450
6 80 36 6400 480
7 75 49 5625 525
8 85 64 7225 680
42 550 244 39000 3040
1. Scatter Diagram

90
80
70
60
Score

50
40
30
20
10
0
0 2 4 6 8 10
Study Hours

2. r = .913 OR 0.91 (Very high positive correlation)

You might also like