0% found this document useful (0 votes)

5 views57 pages

L 10 Correlation

The document discusses correlation, focusing on the relationship between two or more variables through bivariate data analysis. It explains the correlation coefficient, its types (positive, negative, and zero correlation), and provides examples of each type. Additionally, it covers the importance of correlation in statistics, including Pearson's and Spearman's rank correlation coefficients.

Uploaded by

abdullahallmahmud000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views57 pages

L 10 Correlation

Uploaded by

abdullahallmahmud000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Correlation

DR. K.M. EARFAN ALI

PROFESSOR
DEPARTMENT OF STATISTICS
Introduction
 In the previous two topics, we concentrated
entirely on distributions and measures of one
variable;
 but in reality, we normally collect data on
several items at once. We are interested in
links, or relationships, between the different
variables (or, sometimes, between variables
and attributes).
Bivariate Data
Definition: When we come across a large
number of problems involving the use of two or
more than two variables with the help of which
their relationship are studied then it is called
bivariate quantitative data
Example
 the number of fish and their feeds

 fish production technique used

 weather conditions prevailing

 the surface temperature of the farm areas

 the effect of other farmers operating nearby.

Scatter Plots and Correlation
 A scatter plot (or scatter diagram) is used to
show the relationship between two variables
 Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
– Only concerned with strength of the
relationship
– No causal effect is implied
Correlation Coefficient
The correlation coefficient is a measure of the
strength and the direction of a linear
relationship between two variables. The symbol
𝑟 represents the sample correlation coefficient.
The formula for 𝑟 is
𝑛Σ𝑋𝑌 − Σ𝑋 Σ𝑌
𝑟= .
𝑛Σ𝑋 2 − 𝑋 2 𝑛Σ𝑌 2 − 𝑌 2
The range of the correlation coefficient is 1 to
1. If 𝑥 and 𝑦 have a strong positive linear
correlation, 𝑟 is close to 1.
If 𝑥 and 𝑦 have a strong negative linear
correlation, 𝑟 is close to 1. If there is no linear
correlation or a weak linear correlation, 𝑟 is
close to 0.
Scatter Plot Examples

Strong relationships Weak relationships

y y

x x

y y

x x
No relationship

x
Scatter Plot Examples
 Rectangular coordinate
 Two quantitative variables
 One variable is called independent (X) and the
second is called dependent (Y)
 Points are not joined
 No frequency table
Scatter diagram
It is the simplest way of the diagrammatic
representation of bivariate data. Thus for the
bivariate distribution (𝑥𝑖, 𝑦𝑖 ); 𝑖 = 𝑗 = 1,2, … 𝑛,

If the values of the variables 𝑋 and 𝑌 be plotted

along the 𝑋-axis and 𝑌-axis respectively in the
𝑥𝑦-plane, the diagram of dots so obtained is
known as scatter diagram.
Definition
 Correlation is the study of statistical
relationship between two or more variables.
 In other words, correlation is the degree or
intensity of association or inter-relationship
between two (or more) variables.
 The correlation is a measure of how close the
relationship between 𝑥 and 𝑦 is to a straight
line.
Karl Pearson’s Correlation Coefficient
 A measure of intensity or degree of linear
relationship between two variables is called
coefficient of correlation. Correlation is
measured by the coefficient of correlation
which is denoted by ρ.
 It is also called Pearson's correlation or
product moment correlation coefficient.
 It measures the nature and strength between
two variables of the quantitative type.
Mathematical definition
 If 𝑥 and 𝑦 be two random variables of a bivariate
population, then the correlation coefficient
between these variables is defined as 𝜌𝑥𝑦
 or 𝜌 and that between the random variables 𝑥
and 𝑦 of a sample is denoted by 𝑟𝑥𝑦 𝑜𝑟 𝑟.
 The sign of 𝑟 denotes the nature of association
 while the value of 𝑟 denotes the strength of
association.
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟𝑥𝑦 = , Theoretical formula
𝑣 𝑥 × 𝑣(𝑦)
𝑛
𝑖=1 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑟𝑥𝑦 = 𝑛
Mathetical formula
𝑥𝑖 − 𝑥 2 𝑦𝑖 − 𝑦 2
𝑖=1

Σ𝑥Σ𝑦
Σ𝑥𝑦 −
= 𝑛 Calculated formula
Σ𝑥 2 Σ𝑦 2
2
Σ𝑥 − 2
Σ𝑦 −
𝑛 𝑛
𝑆𝑃(𝑥, 𝑦)
𝑟𝑥𝑦 =
𝑆𝑆 𝑥 × 𝑆𝑆(𝑦)
Types of correlation
Mainly, there are three types of correlation.
Depending on its extent and direction there are five
types of correlation. Each type of correlation
described mathematically and graphically below:
 Positive Correlation
• i). Perfect Positive Correlation
• ii) Partial Positive Correlation
 Negative Correlation
• i). Perfect Negative Correlation
• ii) Partial Negative Correlation
 Zero Correlation
Perfect Positive correlation

Y axis

X axis
Fig 1: Perfect positive (r = +1)
Perfect Positive correlation
 If the two variables deviate in the same direction
in one unit. i.e., if the increase in one variable one
unit results in a corresponding increase one unit in
the other variable, correlation is said to be perfect
positive correlation.
 In this, the two variables denoted by X and Y are
directly proportional and fully correlated with
each other. The correlation coefficient r = +1.
 i.e., both variables rise or fall in the same
proportion.
Example
Perfect correlation are not found in nature but
some approaching to that extent are there such
as height and weight, age and height, age and
weight of cattle to a certain age.

𝑋 varies directly and proportionately to 𝑌 ,

(𝑋 ∞𝑌). If all the data points lie exactly on an
upward sloping line, then 𝑟 will be +1; (in figure
1)
Perfect Negative Correlation

Y axis

X axis
Fig 2: Perfect negative (r = -1)
Perfect Negative correlation
If the two variables constantly deviate in the
opposite direction in one unit.

i.e., if increase in one variable in one unit results

in corresponding decrease one unit opposite
direction in the other variable, correlation is said
to be perfect negative correlation.
Example
 Perfect negative correlation are not found in
nature but some approaching to that extent are
there such as mean weekly temperature and
number of colds in winter;

 pressure and volume gas at a particular

temperature, etc. X varies as (X ∞ ).
Partial positive & negative correlation

Y axis Y axis

X axis X axis
Fig 4: Partial positive (0< r < 1) Fig 5: Partial positive (-1 > r >0)
Partial positive correlation
If the two variables deviate in the same
direction, 𝑖. 𝑒. , if the increase (or decrease) in
one variable results in a corresponding increase
(or decrease) in the other variable, correlation is
said to be partial or moderately positive.
In this case, the non-zero values of coefficient
(𝑟) lie between 0 and +1,
𝑖. 𝑒. , 0 < 𝑟 < 1.
Example:
1. Infant mortality rate and overcrowding
2. Temperature and pulse rate;
3. Age and weight of fishes;
4. Plasma volume in ml and total circulating
albumin in gm.
5. Prices and supply of fish feed
6. Feed and yield of fish
Partial negative correlation
 If the two variables constantly deviate in the
opposite direction
 i.e., if increase (or decrease) in one variable
results in corresponding decrease (or increase)
in the other variable, correlation is said to be
inverse or negative.
Example
 Age and vital capacity in adults cattle;
 Income and infant mortality rate of cow;
 Rainfall and grass
In such moderately negative correlation, the
scatter diagram will be of the same type but
mean imaginary line will rise from the extreme
values of one variable in following figure.
Example
The following data on boats operating and catch
obtained shows the scatter diagram weights of
fish and number of boat operating.
Number of boats Weight of fish (in kg)
67 120
69 125
85 140
83 160
74 130
81 180
97 150
92 140
114 200
85 130
847 1475
Scatter diagram of weight of fish and number of
boats
Weight of fish (in kg)
250

200

150

Weight of fish (in kg)

100

0
0 20 40 60 80 100 120
Weight of fish (in kg)
250

200

150

Weight of fish (in kg)

Linear (Weight of fish (in kg) )
100

0
0 20 40 60 80 100 120
No relation
Uncorrelated or Zero Correlation:

Y axis

Fig 3: Zero correlation (r = 0)

No or Zero Correlation
If there is no relationship between the two
variables such that the value of one variable
change and the other variable remain constant
is called no or zero correlation.
Example
 There is no correlation between a fish height
and the amount they earn.
 Height and pulse rate of fish;
Assumptions
 The concerned variables are linearly related.
i.e., by plotting them on a graph paper, a
straight line would be obtained.
 There exists cause and effect relationship
between the (concerned) related variables
 A large number of independent causes are
operating both the correlated variables so as
produce a normal distribution.
 Both the variables are random

 Since the variables are independent, there

exists regression of one variable on the other.
Prosperities of Correlation Coefficient
1. Correlation coefficient is independent of
change of origin and scale.

2. The value of correlation coefficient lies

between -1 and +1 i.e., -1 ≤ r ≤ +1.

3. Correlation coefficient is the geometric

mean of two regression coefficients.
4. Correlation coefficient is symmetric with
respect to the dependence of the variables.

5. The value of correlation coefficient is very

much influenced by large items, if they are
present in data.
Necessity of Studying Correlation
1. The Pearson correlation coefficient is used
for assessing the linear (straight line)
association between an 𝑋 and a 𝑌 variable,
and requires interval or ratio measurement.

2. Symbol for the sample correlation coefficient

is 𝑟, which is the sample estimate of that can
be obtained from a sample of pairs (𝑋, 𝑌) of
values for 𝑋 and 𝑌.
3. The correlation varies from negative one to
positive one (– 1 ≤ 𝑟 ≤ +1).
4. Correlation of +1 or –1 refers to a perfect
positive or negative 𝑋 , 𝑌 relationship,
respectively. Data falling exactly on a straight
line indicates that |𝑟| = 1.
Interpret r
 The value of 𝑟 ranges between ( -1) and ( +1)
 The value of r denotes the strength of the
association as illustrated by the following
diagram.
strong intermediate weak weak intermediate strong

-1 -0.75 -0.25 0 0.25 0.75 1

indirect Direct
perfect
correlation
no relation
Interpret r
 If 𝑟 is very close to +1, we say there is a strong
positive correlation
 𝑦 increases as 𝑥 increases, and the
relationship is good.
 If 𝑟 is close to -1, there is a strong negative
correlation: 𝑦 decreases as 𝑥 increases.
 When 𝑟 is close to zero (either positive or
negative) there is very little relationship
between the two variables.
Spearman’s rank correlation
 Sometimes we come across statistical series in
which the variables under consideration are
not capable of quantitative measurement but
can be arranged in serial order.

 This happens when we are dealing with

qualitative characteristics (attributes) such as
honesty, beauty, character, morality, etc.,
 Let the random variables X and Y denote the
ranks of the individuals in the characteristics A
and B respectively.

 If we assume that there is no tie, i.e., if no two

individuals get the same rank in a
characteristic then, obviously, X and Y assume
numerical values ranging from 1 to N.
Spearman Rank Correlation Coefficient (rs)
1. It is a non-parametric measure of correlation.
2. This procedure makes use of the two sets of
ranks that may be assigned to the sample values
of x and Y.
3. Spearman Rank correlation coefficient could be
computed in the following cases:
4. Both variables are quantitative.
5. Both variables are qualitative ordinal.
6. One variable is quantitative and the other is
qualitative ordinal.
Example
Calculate Spearman’s rank correlation
coefficient between advertisement cost and
sales of fish from the following data:
Advertiseme
nt cost 39 65 62 90 82 75 25 98 36 78
(‘000Tk.):
Sales (lakhs
47 53 58 86 62 68 60 91 51 84
Tk.):
Solution:
Let denote the advertisement cost (‘000 Tk.)
and denote the sales (lakhs Tk.).
𝑿𝒊 𝒀𝒊 Rank of Rank of 𝒅𝒊 = 𝒙𝒊 - 𝑑 2
𝑿𝒊 (𝒙𝒊 ) 𝒀𝒊 (𝒚𝒊 ) 𝒚𝒊 𝑖

39 47 8 10 -2 4
65 53 6 8 -2 4
62 58 7 7 0 0
90 86 2 2 0 0
82 62 3 5 -2 4
75 68 5 4 1 1
25 60 10 6 4 16
98 91 1 1 0 0
36 51 9 9 0 0
78 84 4 3 1 1
10 10 - -
Here =10
6 𝑛𝑖=1 𝑑𝑖2
𝑟𝑠 = 1 −
𝑛 𝑛2 − 1
6 × 30
=1− = 0.82
10 × 99
• The value of 𝑟𝑠 denotes the magnitude and
nature of association giving the same
interpretation as simple 𝑟.
Comment:
There is an indirect weak correlation between
level of education and income.
Problem
A psychologist wanted to compare two methods
𝐴 & 𝐵 of teaching. He selected a random sample of
22 students.

He grouped them into 11 pairs so that the students

in a pair have approximately equal scores in an
intelligence test.

In each pair one student was taught by method A

and the other by method B and examined after the
course.
The marks obtained by them as follows.
Pair: 1 2 3 4 5 6 7 8 9 10 11
A: 24 29 19 14 30 19 27 30 20 28 11
B: 37 35 16 26 23 27 19 20 16 11 21
Solutions
A B RA RB D D2
24 37 6 1 5 25
29 35 3 2 1 1
19 16 8.5 9.5 -1 1
14 26 10 4 6 36
30 23 1.5 5 -3.5 12.25
19 27 8.5 3 5.5 30.25
27 19 5 8 -3 9
30 20 1.5 7 -5.5 30.25
20 16 7 9.5 -2.5 6.25
28 11 4 11 -7 49
11 21 11 6 5 25
In A series the items 19 &30 are repeated twice
and in B series16 is repeated twice
Apply the following formula
6 (di) 2
rs  1 
n(n  1)
2

The value of 𝑟𝑠 denotes the magnitude and

nature of association giving the same
interpretation as simple 𝑟.
Comment:
There is an indirect weak correlation between
level of education and income.
Uses of correlation
1. It is used in physical and social sciences.
2. It is useful for economists to study the
relationship between variables like price, quantity
etc. Businessmen estimates costs, sales, price etc.
using correlation.
3. It is helpful in measuring the degree of
relationship between the variables like income
and expenditure, price and supply, supply and
demand etc.
4. Sampling error can be calculated.
5. It is the basis for the concept of regression.

Measures of Correlation
No ratings yet
Measures of Correlation
23 pages
Correlation Analysis
No ratings yet
Correlation Analysis
50 pages
Correlation Analysis
No ratings yet
Correlation Analysis
16 pages
Correlation: (For M.B.A. I Semester)
100% (2)
Correlation: (For M.B.A. I Semester)
46 pages
Chapter Four Correlation Analysis: Positive or Negative
No ratings yet
Chapter Four Correlation Analysis: Positive or Negative
15 pages
Correlation Regression
No ratings yet
Correlation Regression
20 pages
Lecture Sheet H
No ratings yet
Lecture Sheet H
17 pages
Correlation Ansd Simple Regression
No ratings yet
Correlation Ansd Simple Regression
27 pages
Fds Unit III Notes
No ratings yet
Fds Unit III Notes
23 pages
Correlation
No ratings yet
Correlation
30 pages
Correlation
No ratings yet
Correlation
20 pages
Correlation Coefficient
No ratings yet
Correlation Coefficient
22 pages
Correlation
No ratings yet
Correlation
5 pages
Lecture 5
No ratings yet
Lecture 5
30 pages
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
No ratings yet
Lecture-25 CORRELATION - 34861774 - 2024 - 05 - 04 - 23 - 38
4 pages
Correction and Regression
No ratings yet
Correction and Regression
30 pages
Correlation and Regression - Intro
No ratings yet
Correlation and Regression - Intro
24 pages
Correlation & Regression Analysis Guide
No ratings yet
Correlation & Regression Analysis Guide
23 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
Modelling and Forecast
No ratings yet
Modelling and Forecast
19 pages
Correlation
No ratings yet
Correlation
14 pages
Correlation Notes
No ratings yet
Correlation Notes
15 pages
Correlation Analysis-Students NotesMAR 2023
No ratings yet
Correlation Analysis-Students NotesMAR 2023
24 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
Course Pack Correlation
No ratings yet
Course Pack Correlation
12 pages
Correlation
No ratings yet
Correlation
19 pages
STAT-111 - (C1) Corelations and Regression
No ratings yet
STAT-111 - (C1) Corelations and Regression
10 pages
Chapter 4 (Correlation Part)
No ratings yet
Chapter 4 (Correlation Part)
16 pages
Correlation SBC
No ratings yet
Correlation SBC
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
71 pages
Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
Correlation
No ratings yet
Correlation
6 pages
Correlation
No ratings yet
Correlation
31 pages
Unit-1 Correlation and Regression
No ratings yet
Unit-1 Correlation and Regression
46 pages
Correlation Analysis Guide
No ratings yet
Correlation Analysis Guide
16 pages
Correlation Analysis Guide
No ratings yet
Correlation Analysis Guide
83 pages
05correlation Lecture
No ratings yet
05correlation Lecture
14 pages
Coo Relation
No ratings yet
Coo Relation
6 pages
Correlation
No ratings yet
Correlation
4 pages
Correlation Notes
No ratings yet
Correlation Notes
9 pages
Correlation and Regression
No ratings yet
Correlation and Regression
11 pages
Online Class Etiquettes and Precautions For The Students
No ratings yet
Online Class Etiquettes and Precautions For The Students
49 pages
Correlation and Regression
No ratings yet
Correlation and Regression
22 pages
The Significance of Correlation
No ratings yet
The Significance of Correlation
6 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Correlation
No ratings yet
Correlation
17 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
29 pages
Scatter Plot Linear Correlation
No ratings yet
Scatter Plot Linear Correlation
4 pages
Correlation of Experimental Data
No ratings yet
Correlation of Experimental Data
8 pages
Unit 17 Correlation and Regression
100% (1)
Unit 17 Correlation and Regression
13 pages
Correlation Introduction
No ratings yet
Correlation Introduction
17 pages
Statistics Module 3hejeiehhwwhgsysysudhhdbb
No ratings yet
Statistics Module 3hejeiehhwwhgsysysudhhdbb
44 pages
Correlation
No ratings yet
Correlation
41 pages
Correlationandregression1 200905162711
No ratings yet
Correlationandregression1 200905162711
32 pages
Detailed Lesson Plan in Math 10 (Finding The Unknown Variables in An Arithmetic Sequence)
100% (4)
Detailed Lesson Plan in Math 10 (Finding The Unknown Variables in An Arithmetic Sequence)
4 pages
BCS2213 - Async Interface
No ratings yet
BCS2213 - Async Interface
21 pages
Recuperator Heat Exchanger Design
No ratings yet
Recuperator Heat Exchanger Design
7 pages
JMST Template v2
No ratings yet
JMST Template v2
2 pages
CP213: Tutorial Notebook 3
No ratings yet
CP213: Tutorial Notebook 3
1 page
Short-Run Cost Output Relationship
No ratings yet
Short-Run Cost Output Relationship
5 pages
Circles
No ratings yet
Circles
2 pages
Science, Technology, Engineering, Mathematics (STEM) As Mathematics Learning Approach in 21 ST Century
No ratings yet
Science, Technology, Engineering, Mathematics (STEM) As Mathematics Learning Approach in 21 ST Century
7 pages
2021 Article
No ratings yet
2021 Article
17 pages
Chapter 9
No ratings yet
Chapter 9
34 pages
Mechanical Engineering (GATE 2020)
No ratings yet
Mechanical Engineering (GATE 2020)
2 pages
Chapter 6 Shear and Moments in Beams Updting 2020
No ratings yet
Chapter 6 Shear and Moments in Beams Updting 2020
19 pages
DC 21EC51 Module 5 Notes
No ratings yet
DC 21EC51 Module 5 Notes
103 pages
F2 Night Before Notes
No ratings yet
F2 Night Before Notes
11 pages
SSC GD: Previous Paper
No ratings yet
SSC GD: Previous Paper
37 pages
Comparative - Superlatives
No ratings yet
Comparative - Superlatives
3 pages
RRB NTPC Syllabus 2024, Subjects, Topics and Pattern For CBT 1, 2
No ratings yet
RRB NTPC Syllabus 2024, Subjects, Topics and Pattern For CBT 1, 2
20 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
11 pages
TOA Course Outline
No ratings yet
TOA Course Outline
3 pages
Unit 1 Mod 2 Acid-Base Eqm
No ratings yet
Unit 1 Mod 2 Acid-Base Eqm
13 pages
Ethiopian & Gregorian Digital Calendar Design
No ratings yet
Ethiopian & Gregorian Digital Calendar Design
33 pages
Anurag Tyagi Differentiations
No ratings yet
Anurag Tyagi Differentiations
10 pages
Algorithms For Data Compression in Wireless Computing Systems
No ratings yet
Algorithms For Data Compression in Wireless Computing Systems
7 pages
History of Business Statistics
No ratings yet
History of Business Statistics
3 pages
The Secant Method
No ratings yet
The Secant Method
7 pages
Tridiagonal System Solver Guide
No ratings yet
Tridiagonal System Solver Guide
2 pages
Pavement Condition Assessment Using Soft Computing Techniques
No ratings yet
Pavement Condition Assessment Using Soft Computing Techniques
18 pages
Form 5 Matrix Exercises
No ratings yet
Form 5 Matrix Exercises
4 pages
10 Spheres
No ratings yet
10 Spheres
2 pages
CH-1, Work Sheet
No ratings yet
CH-1, Work Sheet
2 pages