Pearson’s R
Presented by Caringal, Catada
and Delerio
The Pearson correlation
coefficient (r) is the most
common way of measuring a linear
correlation. It is a number between
–1 and 1 that measures the
strength and direction of the
relationship between two
variables.
TYPES OF
CORRELATION TYPE
Positive correlation
When one variable changes, the other variable
changes in the same direction.
Pearson correlation coefficient (r): between 0 and 1
EXAMPLE:
Baby length & weight:
The longer the baby, the heavier their weight.
No correlation
There is no relationship between the variables.
Pearson correlation coefficient (r): 0
EXAMPLE:
Car price & width of windshield wipers:
The price of a car is not related to the width of its windshield
wipers.
Negative correlation
When one variable changes, the other variable
changes in the opposite direction.
Pearson correlation coefficient (r): 0 and -1
EXAMPLE:
Elevation & air pressure:
The higher the elevation, the lower the air pressure.
What is the Pearson
correlation coefficient?
The Pearson correlation coefficient (r) is the
most widely used correlation coefficient
and is known by many names:
Pearson’s r
Bivariate correlation
Pearson product-moment correlation
coefficient (PPMCC)
The correlation coefficient
What is the Pearson
correlation coefficient?
The Pearson correlation coefficient is a descriptive
statistic, meaning that it summarizes the
characteristics of a dataset. Specifically, it describes
the strength and direction of the linear relationship
between two quantitative variables.
Although interpretations of the relationship strength
(also known as effect size) vary between disciplines,
the table below gives general rules of thumb:
TABLE:
Pearson correlation coefficient (r) value Strength Direction
Greater than .5 STRONG POSITIVE
Between .3 and .5 MODERATE POSITIVE
Between 0 and .3 WEAK POSITIVE
0 NONE NONE
Between 0 and –.3 WEAK NEGATIVE
Between –.3 and –.5 MODERATE NEGATIVE
Less than –.5 STRONG NEGATIVE
Visualizing the
Pearson correlation
coefficient
Visualizing the Pearson
correlation coefficient
Another way to think of the Pearson correlation
coefficient (r) is as a measure of how close the
observations are to a line of best fit.
The Pearson correlation coefficient also tells you
whether the slope of the line of best fit is negative or
positive. When the slope is negative, r is negative.
When the slope is positive, r is positive.
Visualizing the Pearson
correlation coefficient
When r is 1 or –1, all the points fall exactly on the line
of best fit:
Visualizing the Pearson
correlation coefficient
When r is greater than .5 or less than –.5, the points
are close to the line of best fit:
Visualizing the Pearson
correlation coefficient
When r is between 0 and .3 or between 0 and –.3,
the points are far from the line of best fit:
Visualizing the Pearson
correlation coefficient
When r is 0, a line of best fit is not helpful in
describing the relationship between the variables:
When to use the
Pearson correlation
coefficient
When to use the Pearson
correlation coefficient
The Pearson correlation coefficient (r) is one of
several correlation coefficients that you need to
choose between when you want to measure a
correlation. The Pearson correlation coefficient is a
good choice when all of the following are true:
When to use the Pearson
correlation coefficient
Both variables are quantitative: You will need to use a different method
if either of the variables is qualitative.
The variables are normally distributed: You can create a histogram of
each variable to verify whether the distributions are approximately
normal. It’s not a problem if the variables are a little non-normal.
The data have no outliers: Outliers are observations that don’t follow the
same patterns as the rest of the data. A scatterplot is one way to check
for outliers—look for points that are far away from the others.
The relationship is linear: “Linear” means that the relationship between
the two variables can be described reasonably well by a straight line.
You can use a scatterplot to check whether the relationship between
two variables is linear.
Pearson vs. Spearman’s rank
correlation coefficients
Spearman’s rank correlation coefficient is another
widely used correlation coefficient. It’s a better
choice than the Pearson correlation coefficient when
one or more of the following is true:
The variables are ordinal.
The variables aren’t normally distributed.
The data includes outliers.
The relationship between the variables is non-
linear and monotonic.
Calculating the
Pearson correlation
coefficient
Calculating the Pearson
correlation coefficient
Below is a formula for calculating the Pearson
correlation coefficient (r):
r = Pearson Coefficient
n= number of pairs of the stock
∑xy = sum of products of the paired stocks
∑x = sum of the x scores
∑y= sum of the y scores
∑x2 = sum of the squared x scores
∑y2 = sum of the squared y scores
EXAMPLE
Imagine that you’re studying the
relationship between newborns’ weight
and length. You have the weights and
lengths of the 10 babies born last
month at your local hospital. After you
convert the imperial measurements to
metric, you enter the data in a table:
WEIGHT (kg) LENGTH (cm)
3.64 53.1
3.02 49.7
3.82 48.4
3.42 54.2
3.59 54.9
2.87 43.7
3.03 47.2
3.46 45.2
3.36 54.4
3.3 50.4
EXAMPLE
Step 1: Calculate the sums of x and y
Start by renaming the variables to “x” and “y.” It doesn’t matter
which variable is called x and which is called y—the formula will
give the same answer either way.
Next, add up the values of x and y. (In the formula, this step is
indicated by the Σ symbol, which means “take the sum of”.
EXAMPLE
Step 2: Calculate x² and y² and their sums
x y x² y²
3.63 53.1 (3.61)² = 13.18 (53.1)² = 2819.6
3.02 49.7 9.12 2470.1
Create two new
3.82 48.4 14.59 2342.6
columns that
3.42 54.2 11.7 2937.6
contain the
3.59 54.9 12.89 3014 squares of x and y.
2.87 43.7 8.24 1909.7 Take the sums of
3.03 47.2 9.18 2227.8
the new columns.
3.46 45.2 11.97 2043
3.36 54.4 11.29 2959.4
3.3 50.4 10.89 2540.2
Step 2: Calculate x² and y² and their sums
Step 3: Calculate the cross product and its sum
x y x² y² xy(x*y)
3.63 53.1 (3.61)² = 13.18 (53.1)² = 2819.6 3.63*53.1= 192.8
3.02 49.7 9.12 2470.1 150.1
In a final column,
3.82 48.4 14.59 2342.6 184.9
multiply together x
3.42 54.2 11.7 2937.6 185.4
and y (this is called
3.59 54.9 12.89 3014 197.1 the cross product).
2.87 43.7 8.24 1909.7 125.4 Take the sum of
3.03 47.2 9.18 2227.8 143
the new column.
3.46 45.2 11.97 2043 156.4
3.36 54.4 11.29 2959.4 182.8
3.3 50.4 10.89 2540.2 166.3
Step 3: Calculate the cross product and its sum
Step 4: Calculate r
Use the formula and the numbers you calculated in the
previous steps to find r.
n = 10
∑x = 33.5
∑y = 501.2
∑x² = 113.05
∑y² = 25.264
∑xy = 1684.2
Testing for the
significance of the
Pearson correlation
coefficient
Testing for the significance of the
Pearson correlation coefficient
The Pearson correlation coefficient can also be used to test whether the
relationship between two variables is significant.
The Pearson correlation of the sample is r. It is an estimate of rho (ρ), the
Pearson correlation of the population. Knowing r and n (the sample size),
we can infer whether ρ is significantly different from 0.
Null hypothesis (H0): ρ = 0
Alternative hypothesis (Ha): ρ ≠ 0
To test the hypotheses, you can either use software like R or Stata or you
can follow the three steps
Step 1: Calculate the t-value
Calculate the t value (a test statistic) using this formula:
Step 1: Calculate the t-value
example:
The weight and length of 10 newborns has a Pearson
correlation coefficient of .47. Since we know that n = 10 and r
= .47, we can calculate the t value:
Step 2: Find the critical value of t
You can find the critical value of t (t*) in a t table. To use the table, you
need to know three things:
The degrees of freedom (df): For Pearson correlation tests, the
formula is df = n – 2.
Significance level (α): By convention, the significance level is
usually .05.
One-tailed or two-tailed: Most often, two-tailed is an appropriate
choice for correlations.
Example: For a two-tailed test of significance at α = .05 and df = 8, the
critical value of t (t*) is 2.306.
Step 3: Compare the t value to the
critical value
Determine if the absolute t value is greater than the critical value of t.
“Absolute” means that if the t value is negative you should ignore the minus
sign.
Example: Comparing the t value to the critical value of t (t*)
t = 1.506
(t*) = 2.306
The t value is less than the critical value of t.
Step 4: Decide whether to reject the null
hypothesis
If the t value is greater than the critical value, then the relationship is
statistically significant (p < α). The data allows you to reject the null
hypothesis and provides support for the alternative hypothesis.
If the t value is less than the critical value, then the relationship is not
statistically significant (p > α). The data doesn’t allow you to reject
the null hypothesis and doesn’t provide support for the alternative
hypothesis.
Step 4: Decide whether to reject the null
hypothesis
Example: Deciding whether to reject the null hypothesis
For the correlation between weight and height in a sample of 10
newborns, the t value is less than the critical value of t. Therefore, we
don’t reject the null hypothesis that the Pearson correlation
coefficient of the population (ρ) is 0. There is no significant
relationship between weight and height (p > .05).
(Note that a sample size of 10 is very small. It’s possible that you
would find a significant relationship if you increased the sample size.)
Example #2
There are 2 stocks – A and B. A saleslady wants to study the
relationship between the two stocks of 5 makeup products.
Their share prices on particular days are as follows:
Step 1: Calculate the sum of x and y
Σx = 45+ 50 + 53 + 58 + 60
Σx = 266
Σy = 9 + 8 + 8 + 7 + 5
Σy = 37
Step 2: Calculate x² and y² and their sums
x y x² y²
45 9 (45)² = 2025 (9)² = 81
50 8 2500 64
53 8 2809 64
58 7 3364 49
60 5 3600 25
Σx² = 2025 + 2500 + 2809 + 3364 + 3600
Σx² = 14298
Σy² = 81 + 64 + 64 + 49 + 25
Σy² = 283
Step 3: Calculate the cross product and its sum
x y x² y² xy(x*y)
45 9 (45)² = 2025 (9)² = 81 45*9= 405
50 8 2500 64 400
53 8 2809 64 424
58 7 3364 49 406
60 5 3600 25 300
Σxy = 405 + 400 + 424 + 406 + 300
Σxy = 1935
Step 4: Calculate r
n=5
∑x = 266
∑y = 37
∑x² = 14298
∑y² = 283
∑xy = 1935
Testing for the significance
of the Pearson correlation
coefficient
Step 1: Calculate the t-value
continuation
Step 2: Find the critical value of t
df = n – 2
=5-2
=3
a = 0.05
Critical value: 3.182
Step 3: Compare the t value to the critical
value
t = 3.75
(t*)= 3.182
The t value is greater than the critical value
of t.
Step 4: Decide whether to reject the null
hypothesis
For the correlation between stock A and stock B in a
sample of 5 makeup products, the t value is greater
than the critical value of t. Therefore, we reject the null
hypothesis that the Pearson correlation coefficient of
the population (ρ) is 0. There is a significant relationship
between stock A and stock B (p < .05).