0% found this document useful (0 votes)

615 views46 pages

Unit-1 Correlation and Regression

1. The document discusses correlation analysis, which examines the relationship between two variables. It introduces Karl Pearson and Charles Spearman, who made important contributions to the development of correlation coefficients. 2. There are different types of correlation - positive correlation means the variables increase or decrease together, while negative correlation means they change in opposite directions. 3. The document covers the definition and uses of correlation analysis. It aims to help students understand correlation coefficients and how to calculate and interpret them to analyze bivariate data.

Uploaded by

Shyam Sundar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

615 views46 pages

Unit-1 Correlation and Regression

Uploaded by

Shyam Sundar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

CHAPTER

CORRELATION
4 ANALYSIS

Karl Pearson (1857-1936) was Charles Edward Spearman

a English Mathematician and (1863-1945) was an English
Biostatistician. He founded psychologist and ,after
the world’s first university serving 15 years in Army
statistics department at he joined to study PhD in
University College, London Experimental Psychology
Karl Pearson Charles Spearman
in 1911. The linear correlation and obtained his degree in
coefficient is also called Pearson product moment 1906. Spearman was strongly influenced by the
correlation coefficent. It was developed by Karl work of Galton and developed rank correlation
Pearson with a related idea by Francis Galton (see in 1904.He also pioneered factor analysis in
Regression analysis - for Galton’s contribution). It is statistics.
the first of the correlation measures developed and
commonly used.

“When the relationship is of a quantitative nature, the appropriate statistical tool for discovering
the existence of relation and measuring the intensity of relationship is known as correlation”
—CROXTON AND COWDEN

LEARNING OBJECTIVES

The student will be able to

learn the meaning, definition and the uses of correlation.
identify the types of correlation.
understand correlation coefficient for different types of measurement scales.
differentiate different types of correlation using scatter diagram.
calculate Karl Pearson’s coefficient of correlation, Spearman’s rank correlation coefficient
and Yule’s coefficient of association.
interpret the given data with the help of coefficient of correlation.

12th Std Statistics 106

12th_Statistics_EM_Unit_4.indd 106 3/4/2019 1:36:35 PM

Introduction

“Figure as far as you can, then add judgment”

The statistical techniques discussed so far are for only one variable. In many research
situations one has to consider two variables simultaneously to know whether these two variables
are related linearly. If so, what type of relationship that exists between them. This leads to
bivariate (two variables) data analysis namely correlation analysis. If two quantities vary in such a
way that movements ( upward or downward) in one are accompanied by the movements( upward
or downward) in the other, these quantities are said to be co-related or correlated.
The correlation concept will help to answer the following types of questions.
• Whether study time in hours is related with marks scored in the examination?
• Is it worth spending on advertisement for the promotion of sales?
• Whether a woman’s age and her systolic blood pressure are related?
• Is age of husband and age of wife related?
• Whether price of a commodity and demand related?
• Is there any relationship between rainfall and production of rice?

4.1 DEFINITION OF CORRELATION

Correlation is a statistical measure which helps in analyzing the interdependence of two
or more variables. In this chapter the dependence between only two variables are considered.
1. A.M. Tuttle defines correlation as:
“An analysis of the co-variation of two or more variables is usually called correlation”
2. Ya-kun-chou defines correlation as:
“The attempts to determine the degree of relationship between variables”.
Correlation analysis is the process of studying the strength of the relationship between
two related variables. High correlation means that variables have a strong linear relationship
with each other while a low correlation means that the variables are hardly related. The type and
intensity of correlation is measured through the correlation analysis. The measure of correlation
is the correlation coefficient or correlation index. It is an absolute measure.
Uses of correlation

• Investigates the type and strength of the relationship that exists between the two variables.
• Progressive development in the methods of science and philosophy has been characterized by
the rich knowledge of relationship.

4.2 TYPES OF CORRELATION

1. Simple (Linear) correlation (2 variables only) : The correlation between the given two variables.
It is denoted by rxy
2. Partial correlation (more than 2 variables): The correlation between any two variables while
removing the effect of other variables. It is denoted by rxy.z …

107 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 107 3/4/2019 1:36:35 PM

3. Multiple correlation (more than 2 variables) : The correlation between a group of variables and
a variable which is not included in that group. It is denoted by Ry.(xz…)

In this chapter, we study simple correlation only, multiple correlation and partial correlation
involving three or more variables will be studied in higher classes .

4.2.1 Simple correlation or Linear correlation

Here, we are dealing with data involving two related variables and generally we assign a
symbol ‘x ’ to scores of one variable and symbol ‘y ’ to scores of the other variable. There are five
types in simple correlation. They are
1. Positive correlation (Direct correlation)

2. Negative correlation (Inverse correlation)

3. Uncorrelated

4. Perfect positive correlation

5. Perfect negative correlation

Posive or Direct Correlaon
1) Positive correlation: (Direct correlation)

The variables are said to be positively correlated if

larger values of x are associated with larger values of y and
smaller values of x are associated with smaller values of X Y X Y
y. In other words, if both the variables are varying in the
same direction then the correlation is said to be positive. Things move in the same direcon

In other words, if one variable increases, the other variable (on an average) also increases or if one
variable decreases, the other (on an average)variable also decreases.
For example,
i) Income and savings
ii) Marks in Mathematics and Marks in Statistics. (i.e.,Direct relationship pattern exists).

Y -Height posion of this li

X -Height of goods
Height of the Li increases / decreases according The starng posion of wring depends on the height of
to the Height of goods increases / decreases. the writer.

12th Std Statistics 108

12th_Statistics_EM_Unit_4.indd 108 3/4/2019 1:36:36 PM

2) Negative correlation: (Inverse correlation) Negave or Inverse relaonship

The variables are said to be negatively correlated if

smaller values of x are associated with larger values of y or
larger values x are associated with smaller values of y. That
is the variables varying in the opposite directions is said to Down Up Up Down
be negatively correlated. In other words, if one variable Things move in opposite direcon
increases the other variable decreases and vice versa.

For example,
i) Price and demand
ii) Unemployment and purchasing power
3) Uncorrelated:

The variables are said to be uncorrelated if smaller values of x are associated with smaller
or larger values of y and larger values of x are associated with larger or smaller values of y. If the
two variables do not associate linearly, they are said to be uncorrelated. Here r = 0.
Important note: Uncorrelated does not
imply independence. This means “do not interpret
as the two variables are independent instead
interpret as there is no specific linear pattern exists
but there may be non linear relationship”.
X Y X Y
4) Perfect Positive Correlation

If the values of x and y increase or decrease proportionately then they are said to have
perfect positive correlation.
5) Perfect Negative Correlation

If x increases and y decreases proportionately or if x decreases and y increases

proportionately, then they are said to have perfect negative correlation.
Correlation Analysis

The purpose of correlation analysis is to find the existence of linear relationship between
the variables. However, the method of calculating correlation coefficient depends on the types of
measurement scale, namely, ratio scale or ordinal scale or nominal scale.

109 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 109 3/4/2019 1:36:36 PM

Statistical tool selection

Methods to find correlation NOTE

1. Scatter diagram For higher order dimension of
2. Karl Pearson’s product moment correlation nominal or categorical variables in
coefficient : ‘r’ a contingency table, use chi-square
3. Spearman’s Rank correlation coefficient: ‘ ρ ’ test for independence of attributes.
(Refer Chapter 2)
4. Yule’s coefficient of Association: ‘Q’

4.3 SCATTER DIAGRAM

A scatter diagram is the simplest way of the diagrammatic representation of bivariate data. One
variable is represented along the X-axis and the other variable is represented along the Y-axis. The pair
of points are plotted on the two dimensional graph. The diagram of points so obtained is known as
scatter diagram. The direction of flow of points shows the type of correlation that exists between the
two given variables.
1) Positive correlation Y

If the plotted points in the plane form a band and they show the
rising trend from the lower left hand corner to the upper right hand corner,
X
the two variables are positively correlated. In this case 0 < r < 1

2) Negative correlation
Y
If the plotted points in the plane form a band and they show the falling
trend from the upper left hand corner to the lower right hand corner, the two
X
variables are negatively correlated. In this case -1 < r < 0

3) Uncorrelated Y

If the plotted points spread over in the plane then the two variables
are uncorrelated.
X
In this case r = 0
4) Perfect positive correlation
Y
If all the plotted points lie on a straight line from lower left hand
corner to the upper right hand corner then the two variables have perfect
positive correlation. X
In this case r = +1

12th Std Statistics 110

12th_Statistics_EM_Unit_4.indd 110 3/4/2019 1:36:37 PM

5) Perfect Negative correlation Y

If all the plotted points lie on a straight line falling from upper left
hand corner to lower right hand corner, the two variables have perfect
negative correlation. In this case r = -1
X

4.3.1 Merits and Demerits of scatter diagram

Merits
• It is a simple and non-mathematical method of studying correlation between the variables.
• It is not influenced by the extreme items
• It is the first step in investigating the relationship between two variables.
• It gives a rough idea at a glance whether there is a positive correlation, negative correlation or
uncorrelated.
Demerits
• We get an idea about the direction of correlation but we cannot establish the exact strength of
correlation between the variables.
• No mathematical formula is involved.

4.4 KARL PEARSON’S CORRELATION COEFFICIENT

When there exists some relationship between two measurable variables, we compute the
degree of relationship using the correlation coefficient.
Co-variance
Let (X,Y) be a bivariable normal random variable where V(X) and V(Y) exists. Then,
covariance between X and Y is defined as
cov(X,Y) = E[(X-E(X))(Y-E(Y))] = E(XY) – E(X)E(Y)
If (xi,y i), i=1,2, ..., n is a set of n realisations of (X,Y), then the sample covariance between
X and Y can be calculated from
1 n 1 n
cov X ,Y (xi x )( yi y ) xi yi x y
n i 1 n i 1

4.4.1 Karl Pearson’s coefficient of correlation

When X and Y are linearly related and (X,Y) has a bivariate normal distribution, the
co-efficient of correlation between X and Y is defined as
cov( X ,Y )
r X ,Y
V ( X )V (Y )
This is also called as product moment correlation co-efficient which was defined by Karl Pearson.
Based on a given set of n paired observations (xi,y i), i=1,2, ... n the sample correlation
co-efficient between X and Y can be calculated from
1 n
x y x y
n i 1 i i
r X ,Y
1 n 2 1 n 2

n i 1
x i x 2

n i 1
yi y 2

111 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 111 3/4/2019 1:36:38 PM

or, equivalently
n n n
n x i y i x i y i
r X ,Y i 1 i 1 i 1
2 2
n
n n
n
n x x i
2
i n y y i 2
i
i 1 i 1 i 1 i 1

4.4.2 Properties
1. The correlation coefficient between X and Y is same as the correlation coefficient between Y and
X (i.e, rxy = ryx ).
2. The correlation coefficient is free from the units of measurements of X and Y
3. The correlation coefficient is unaffected by change of scale and origin.
x A y B
Thus, if ui i and vi i with c ≠ 0 and d ≠ 0 i=1,2, ..., n
c d
n n n
n ui vi ui vi
r i 1 i 1 i 1
2 2
n
n n
n
n ui ui
2
n vi vi 2

i 1 i 1 i 1 i 1

where A and B are arbitrary values.

Remark 1: If the widths between the values of the variabls are not equal then take c = 1 and d = 1.
Interpretation
The correlation coefficient lies between -1 and +1. i.e. -1 ≤ r ≤ 1
• A positive value of ‘r’ indicates positive correlation.
• A negative value of ‘r’ indicates negative correlation
• If r = +1, then the correlation is perfect positive
• If r = –1, then the correlation is perfect negative.
• If r = 0, then the variables are uncorrelated.
• If r ≥ 0.7 then the correlation will be of higher degree. In interpretation we use the
adjective ‘highly’
• If X and Y are independent, then rxy = 0. However the converse need not be true.

Example 4.1
The following data gives the heights(in inches) of father and his eldest son. Compute the
correlation coefficient between the heights of fathers and sons using Karl Pearson’s method.

Height of father 65 66 67 67 68 69 70 72
Height of son 67 68 65 68 72 72 69 71

12th Std Statistics 112

12th_Statistics_EM_Unit_4.indd 112 3/4/2019 1:36:39 PM

Solution:
Let x denote height of father and y denote height of son. The data is on the ratio scale.
We use Karl Pearson’s method.
n n n
n x i y i x i y i
r i 1 i 1 i 1
2 2
n
n n
n
n xi xi
2
n yi yi
2

i 1 i 1 i 1 i 1
Calculation
xi yi x i2 y i2 x iy i
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560

8 × 37560 − 544 × 552

r= = 0.603
8 × 37028 − ( 544 ) 8 × 38132 − ( 552 )
2 2

Heights of father and son are positively correlated. It means that on the average , if fathers are
tall then sons will probably tall and if fathers are short, probably sons may be short.
Short-cut method
Let A = 68 , B = 69, c = 1 and d = 1
xi yi ui = (xi – A)/c v i = (y i – B)/d ui 2 v i2 u iv i
= xi – 68 = y i – 69
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
Total 0 0 36 44 24
n n n
n ui vi ui vi
r i 1 i 1 i 1
2 2
n
n n
n
n ui ui
2
n vi vi 2

i 1 i 1 i 1 i 1

113 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 113 3/4/2019 1:36:39 PM

8 × 24 − 0 × 0
r=
8 × 36 − ( 0 ) 8 × 44 − ( 0 )
2 2

8 × 24
r=
8 × 36 8 × 44
= 0.603
Note: The correlation coefficient computed by using direct method and short-cut method is the same.

Example 4.2
The following are the marks scored by 7 students in two tests in a subject. Calculate
coefficient of correlation from the following data and interpret.

Marks in test-1 12 9 8 10 11 13 7
Marks in test-2 14 8 6 9 11 12 3

Solution:
Let x denote marks in test-1 and y denote marks in test-2.
xi yi xi2 yi2 xiyi
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
1 12 169 144 156
7 3 49 9 21
Total 70 63 728 651 676
n n n
n xi yi xi yi
r i 1 i 1 i 1
2 2
n
n n
n
n xi xi
2
n yi yi 2

i 1 i 1 i 1 i 1
n n n

xi 70
i 1
xi 2 728
i 1
x y
i 1
i i 676
n n

yi 63
i 1
y
i 1
i
2
651 n 7

7 676 70 63
r
7 728 702 7 651 632
4732 4410

5096 4900 7 651 3969
322 322 322
0.95
196 588 14 24.25 339.5

12th Std Statistics 114

12th_Statistics_EM_Unit_4.indd 114 3/4/2019 1:36:41 PM

There is a high positive correlation between test-1 and test-2. That is those who perform
well in test-1 will also perform well in test-2 and those who perform poor in test-1 will perform
poor in test- 2.
The students can also verify the results by using shortcut method.

4.4.3 Limitations of Correlation

Although correlation is a powerful tool, there
are some limitations in using it:

1. Outliers (extreme observations)

strongly influence the correlation
coefficient. If we see outliers in our
data, we should be careful about the
conclusions we draw from the value
of r. The outliers may be dropped before the calculation for meaningful conclusion.

2. Correlation does not imply causal relationship. That a change in one variable causes a
change in another.

NOTE
1. Uncorrelated : Uncorrelated (r = 0) implies no ‘linear relationship’. But there may exist non-
linear relationship (curvilinear relationship).

Example: Age and health care are related. Children and elderly people need much more health
care than middle aged persons as seen from the following graph.
Health care

Child Old

0 Age
Adult

However, if we compute the linear correlation r for such data, it may be zero implying
age and health care are uncorrelated, but non-linear correlation is present.
2. Spurious Correlation : The word ‘spurious’ from Latin means ‘false’ or ‘illegitimate’. Spurious
correlation means an association extracted from correlation coefficient that may not exist in reality.

115 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 115 3/4/2019 1:36:41 PM

4.5 SPEARMAN’S RANK CORRELATION COEFFICIENT
If the data are in ordinal scale then Spearman’s rank correlation coefficient is used. It is
denoted by the Greek letter ρ (rho).
Spearman’s correlation can be calculated for the subjectivity data also, like competition
scores. The data can be ranked from low to high or high to low by assigning ranks.
Spearman’s rank correlation coefficient is given by the formula
n
6 Di2
1 i 1

n n2 1
where Di = R1i – R2i

R1i = rank of i in the first set of data

R2i = rank of i in the second set of data and
n = number of pairs of observations

Interpretation
Spearman’s rank correlation coefficient is a statistical measure of the strength of a
monotonic (increasing/decreasing) relationship between paired data. Its interpretation is similar
to that of Pearson’s. That is, the closer to the ±1 means the stronger the monotonic relationship.

Positive Range Negative Range

0.01 to 0.19: “Very Weak Agreement” (-0.01) to (-0.19): “Very Weak Disagreement”

0.20 to 0.39:“Weak Agreement” (-0.20) to (-0.39): “Weak Disagreement”

0.40 to 0.59: “Moderate Agreement” (-0.40) to (-0.59): “Moderate Disagreement”

0.60 to 0.79: “Strong Agreement” (-0.60) to (-0.79): “Strong Disagreement”

0.80 to 1.0: “Very Strong Agreement” (-0.80) to (-1.0): “Very Strong Disagreement”

Example 4.3
Two referees in a flower beauty competition rank the 10 types of flowers as follows:

Referee A 1 6 5 10 3 2 4 9 7 8
Referee B 6 4 9 8 1 2 3 10 5 7

Use the rank correlation coefficient and find out what degree of agreement is between the
referees.

12th Std Statistics 116

12th_Statistics_EM_Unit_4.indd 116 3/4/2019 1:36:42 PM

Solution:

Rank by 1st referee Rank by 2nd referee

Di= R1i – R2i Di2
R1i R2i
1 6 -5 25
6 4 2 4
5 9 -4 16
10 8 2 4
3 1 2 4
2 2 0 0
4 3 1 1
9 10 -1 1
7 5 2 4
8 7 1 1
n

D 2
i 60
i 1

n
Here n = 10 and D 2
i 60
i 1

n
6 Di2
i 1
1

n n 1 2

6 60 360 360
1 1 1 0.636
2
10 10 1
10 99 990

Interpretation: Degree of agreement between the referees ‘A’ and ‘B’ is 0.636 and they have “strong
agreement” in evaluating the competitors.

Example 4.4
Calculate the Spearman’s rank correlation coefficient for the following data.

Candidates 1 2 3 4 5

Marks in Tamil 75 40 52 65 60

Marks in English 25 42 35 29 33

117 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 117 3/4/2019 1:36:43 PM

Solution:

Tamil English Di = R1i – R2i D i2

Marks Rank (R1i) Marks Rank (R2i)
75 1 25 5 -4 16
40 5 42 1 4 16
52 4 35 2 2 4
65 2 20 4 -2 4
60 3 33 3 0 0
40
n

D i
2
40 and n = 5
i 1

n
6 Di2
i 1
1

n n 1 2

6 40 240
1 1 1
2
5 5 1
5 24

Interpretation: This perfect negative rank correlation (-1) indicates that scorings in the subjects,
totally disagree. Student who is best in Tamil is weakest in English subject and vice-versa.

Example 4.5
Quotations of index numbers of equity share prices of a certain joint stock company and
the prices of preference shares are given below.
Years 2013 2014 2015 2016 2017 2008 2009
Equity shares 97.5 99.4 98.6 96.2 95.1 98.4 97.1
Reference shares 75.1 75.9 77.1 78.2 79 74.6 76.2
Using the method of rank correlation determine the relationship between equity shares
and preference shares prices.

Solution:

Equity shares Preference share R1i R2i Di = R1i – R2i D i2

97.5 75.1 4 6 -2 4
99.4 75.9 1 5 -4 16
98.6 77.1 2 3 -1 1
96.2 78.2 6 2 4 16
95.1 79.0 7 1 6 36
98.4 74.6 3 7 -4 16
97.1 76.2 5 4 1 1
n

D
i 1
2
i 90

12th Std Statistics 118

12th_Statistics_EM_Unit_4.indd 118 3/4/2019 1:36:45 PM

D 2
i 90 and n = 7.
i 1

Rank correlation coefficient is

n
6 Di2
i 1
1

n n 1 2

6 90 540 540
1 1 1 1 1.66071 0.6071

7 72 1 7 48 336

Interpretation: There is a negative correlation between equity shares and preference share prices.
There is a strong disagreement between equity shares and preference share prices.

4.5.1 Repeated ranks

When two or more items have equal values (i.e., a tie) it is difficult to give ranks to them.
In such cases the items are given the average of the ranks they would have received. For example,
8+9
if two individuals are placed in the 8th place, they are given the rank = 8.5 each, which is
2
common rank to be assigned and the next will be 10; and if three ranked equal at the 8th place,
8 + 9 + 10
they are given the rank = 9 which is the common rank to be assigned to each; and the
3
next rank will be 11.
In this case, a different formula is used when there is more than one item having the same
value.
1 1

Di 12 m1 m1 12 m2 m2 ...
2 3 3

1 6

n n2 1

where mi is the number of repetitions of ith rank

Example 4.6
Compute the rank correlation coefficient for the following data of the marks obtained by
8 students in the Commerce and Mathematics.

Marks in Commerce 15 20 28 12 40 60 20 80
Marks in Mathematics 40 30 50 30 20 10 30 60

119 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 119 3/4/2019 1:36:46 PM

Solution:
Marks in Marks in
Rank (R1i) Rank (R2i) Di = R1i – R2i D i2
Commerce (X) Mathematics (Y)
15 2 40 6 -4 16
20 3.5 30 4 -0.5 0.25
28 5 50 7 -2 4
12 1 30 4 -3 9
40 6 20 2 4 16
60 7 10 1 6 36
20 3.5 30 4 -0.5 0.25
80 8 60 8 0 0
Total ∑D 2
= 81.5

1 1

Di 12 m1 m1 12 m2 m2 ...
2 3 3

1 6
n n2 1

Repetitions of ranks
In Commerce (X), 20 is repeated two times corresponding to ranks 3 and 4. Therefore, 3.5
is assigned for rank 2 and 3 with m1=2.
In Mathematics (Y), 30 is repeated three times corresponding to ranks 3, 4 and 5. Therefore,
4 is assigned for ranks 3,4 and 5 with m2=3.
Therefore,

 1 3
( 1 3
 81.5 + 12 2 − 2 + 12 3 − 3) ( ) 
ρ = 1− 6  
 8 82 − 1 ( ) 
 

= 1− 6
[81.5 + 0.5 + 2] = 1−
504
=0
504 504
=
Interpretation: Marks in Commerce and Mathematics are uncorrelated

4.6 YULE’S COEFFICIENT OF ASSOCIATION

This measure is used to know the existence of relationship between the
two attributes A and B (binary complementary variables). Examples of attributes
are drinking, smoking, blindness, honesty, etc.
Udny Yule (1871 – 1951), was a British statistician. He was educated at
Winchester College and at University College London. After a year dong research
in experimental physics, he returned to University College in 1893 to work as a
Udny yule
demonstrator for Karl Pearson. Pearson was beginning to work in statistics and
Yule followed him into this new field. Yule was a prolific writer, and was active in Royal Statistical
Society and received its Guy Medal in Gold in 1911, and served as its President in 1924–26.The
concept of Association is due to him.
12th Std Statistics 120

12th_Statistics_EM_Unit_4.indd 120 3/4/2019 1:36:46 PM

Coefficient of Association

Yule’s Coefficient of Association measures the strength and direction of association.

“Association” means that the attributes have some degree of agreement.
2×2 Contingency Table
Attribute A Attribute B Total
Yes No
B β
Yes
(AB) (Aβ) (A)
A
No
(αB) (αβ) (α)
α
Total (B) (β) N

Yule’s coefficient: Q
AB A B
AB A B
Note 1: The usage of the symbol α is not to be confused with level of significance.
Note 2: (AB): Number with attributes AB etc.
This coefficient ranges from –1 to +1. The values between –1 and 0 indicate inverse
relationship (association) between the attributes. The values between 0 and +1 indicate direct
relationship (association) between the attributes.

Example 4.7
Out of 1800 candidates appeared for a competitive examination 625 were successful; 300 had
attended a coaching class and of these 180 came out successful. Test for the association of attributes
attending the coaching class and success in the examination.

Solution:

N = 1800
A: Success in examination α: No success in examination
B: Attended the coaching class β: Not attended the coaching class
(A) = 625, (B) = 300, (AB) = 180

B β Total
A 180 445 625
α 120 1055 1175
Total 300 1500 N = 1800

121 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 121 3/4/2019 1:36:46 PM

( AB )(αβ ) − ( Aβ )(α B )
Yule’s coefficient: Q =
( AB )(αβ ) + ( Aβ )(α B )
180 ×1055 − 445 ×120
=
180 ×1055 + 445 ×120
189900 − 53400
=
189900 + 53400
136500
=
243300
= 0.561 > 0
Interpretation: There is a positive association between success in examination and attending
coaching classes. Coaching class is useful for success in examination.

Remark: Consistency in the data using contingency table may be found as under.
Construct a 2 × 2 contingency table for the given information. If at least one of the cell
frequencies is negative then there is inconsistency in the given data.

Example 4.8
Verify whether the given data: N = 100, (A) = 75, (B) = 60 and (AB) = 15 is consistent.

Solution:
The given information is presented in the following contingency table.
B β Total
A 15 60 75
α 45 -20 25
Total 60 40 N = 100

Notice that (αβ) = −20

Interpretation: Since one of the cell frequencies is negative, the given data is “Inconsistent”.

POINTS TO REMEMBER

Correlation study is about finding the linear relationship between two variables.
Correlation is not causation. Sometimes the correlation may be spurious.
Correlation coefficient lies between –1 and +1.
Pearson’s correlation coefficient provides the type of relationship and intensity of
relationship, for the data in ratio scale measure.
Spearman’s correlation measures the relationship between the two ordinal variables.
Yule’s coefficient of Association measures the association between two dichotomous
attributes.

12th Std Statistics 122

12th_Statistics_EM_Unit_4.indd 122 3/4/2019 1:36:46 PM

EXERCISE 4

I. Choose the best answer.

1. The statistical device which helps in analyzing the co-variation of two or
more variables is
(a) variance (b) probability
(c) correlation coefficient (d) coefficient of skewness
2. “The attempts to determine the degree of relationship between variables is correlation” is the
definition given by
(a) A.M. Tuttle (b) Ya-Kun-Chou
(c) A.L. Bowley (d) Croxton and Cowden
3. If the two variables do not have linear relationship between them then they are said to have
(a) positive correlation (b) negative correlation
(c) uncorrelated (d) spurious correlation
4. If all the plotted points lie on a straight line falling from upper left hand corner to lower right
hand corner then it is called
(a) perfect positive correlation (b) perfect negative correlation
(c) positive correlation (d) negative correlation
5. If r = +1, then the correlation is called
(a) perfect positive correlation (b) perfect negative correlation
(c) positive correlation (d) negative correlation
6. The correlation coefficient lies in the interval
(a) -1 ≤ r ≤ 0 (b) –1 < r < 1 (c) 0 ≤ r ≤ 1 (d) –1 ≤ r ≤ 1

7. Rank correlation coefficient is given by

n
n n n
6 D 2
6 D 2
6 D 2 6 Di3
(a) i (b) i (c) i (d)
1 i 1
1 i 1
1 i 1 1 i 1

n n
3
n n
3
n n
3
n n 1 2

8. If ∑D 2
= 0, rank correlation is

(a) 0 (b) 1 (c) 0.5 (d) –1

9. Rank correlation was developed by

(a) Pearson (b) Spearman (c) Yule (d) Fisher
10. Product moment coefficient of correlation is
(a) σx σy (b) r = σ x σ y (c) r = cov ( x, y ) (d) cov ( x, y )
r= r=
cov ( x, y ) σx σy σ xy

123 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 123 3/4/2019 1:36:49 PM

11. The purpose of the study of _________ is to identify the factors of influence and try to control
them for better performance.
(a) mean (b) correlation (c) standard deviation (d) skewness
12. The height and weight of a group of persons will have _______ correlation.
(a) positive (b) negative
(c) zero (d) both positive and negative
13. ______ correlation studies the association of two variables with ordinal scale.
(a) A.M. Tuttle rank (b) Croxton and Cowdon rank
(c) Karl Pearson’s rank (d) Spearman’s rank.
14. ______ presents a graphic description of quantitative relation between two series of facts.
(a) scatter diagram (b) bar diagram (c) pareto diagram (d) pie diagram
15. ______ measures the degree of relationship between two variables.
(a) standard deviation (b) correlation coefficient
(c) moment (d) median
16. The correlation coefficient of x and y is symmetric. Hence
(a) rxy = r yx (b) rxy > r yx (c) rxy < r yx (d) rxy ≠ r yx

17. If cov (x, y) = 0 then its interpretation is

(a) x and y are positively correlated (b) x and y are negatively correlated
(c) x and y are uncorrelated (d) x and y are independent
18. Rank correlation is useful to study data in ______ scale.
(a) ratio (b) ordinal (c) nominal (d) ratio and nominal
19. If r = 0 then cov(x, y) is
(a) 0 (b) +1 (c) -1 (d) α

20. If cov(x, y) = σx, σy then

(a) r = 0 (b) r = –1 (c) r = +1 (d) r = α

II. Give very short answer to the following questions.

21. What is correlation?
22. Write the definition of correlation by A.M. Tuttle.
23. What are the different types of correlation?
24. What are the types of simple correlation?
25. What do you mean by uncorrelated?
26. What you understand by spurious correlation?

12th Std Statistics 124

12th_Statistics_EM_Unit_4.indd 124 3/4/2019 1:36:49 PM

27. What is scatter diagram?
28. Define co-variance.
29. Define rank correlation.
30. If ∑D 2
= 0 what is your conclusion regarding Spearman’s rank correlation coefficient?
31. Give an example for (i) positive correlation
(ii) negative correlation (iii) no correlation
32. What is the value of ‘r’ when two variables are uncorrelated?
33. When the correlation coefficient is +1, state your interpretation.

III. Give short answer to the following questions.

34. Write any three uses of correlation.
35. Define Karl Pearson’s coefficient of correlation.
36. How do you interpret the coefficient of correlation which lies between 0 and +1?
37. Write down any 3 properties of correlation?
38. If rank correlation coefficient r = 0.8, ∑D 2
= 33 then find n?
39. Write any three merits of scatter diagram.
40. Given that cov(x, y) = 18.6, variance of x = 20.2, variance of y = 23.7. Find r.
41. Test the consistency of the following data with the symbols having their usual meaning.
N = 1000, (A) = 600, (B) = 500, (AB) = 50.

IV. Give detailed answer to the following questions.

42. Explain different types of correlation.
43. Explain scatter diagram.
44. Calculate the Karl Pearson’s coefficient of correlation for the following data and interpret.
x 9 8 7 6 5 4 3 2 1
y 15 16 14 13 11 12 10 8 9

45. Find the Karl Pearson’s coefficient of correlation for the following data.
Wages 100 101 102 102 100 99 97 98 96 95
Cost of living 98 99 99 97 95 92 95 94 90 91
How are the wages and cost of living correlated?

46. Calculate the Karl Pearson’s correlation coefficient between the marks (out of 10) in statistics
and mathematics of 6 students.
Student 1 2 3 4 5 6
Statistics 7 4 6 9 3 8
Mathematics 8 5 4 8 3 6

125 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 125 3/4/2019 1:36:49 PM

47. In a marketing survey the prices of tea and prices of coffee in a town based on quality was
found as shown below. Find the rank correlation between prices of tea and prices of coffee.
Price of tea 88 90 95 70 60 75 50
Price of coffee 120 134 150 115 110 140 100

48. Calculate the Spearman’s rank correlation coefficient between price and supply from the
following data.
Price 4 6 8 10 12 14 16 18
Supply 10 15 20 25 30 35 40 45
49. A random sample of 5 college students is selected and their marks in Tamil and English are
found to be:
Tamil 85 60 73 40 90
English 93 75 65 50 80
Calculate Spearman’s rank correlation coefficient.

50. Calculate Spearman’s coefficient of rank correlation for the following data.
x 53 98 95 81 75 71 59 55
y 47 25 32 37 30 40 39 45

51. Calculate the coefficient of correlation for the following data using ranks.

Mark in Tamil 29 24 25 27 30 31

Mark in English 29 19 30 33 37 36

52. From the following data calculate the rank correlation coefficient.
x 49 34 41 10 17 17 66 25 17 58
y 14 14 25 7 16 5 21 10 7 20

Yule’s coefficient
53. Can vaccination be regarded as a preventive measure of Hepatitis B from the data given below.
Of 1500 person in a locality, 400 were attacked by Hepatitis B. 750 has been vaccinated. Among
them only 75 were attacked.

12th Std Statistics 126

12th_Statistics_EM_Unit_4.indd 126 3/4/2019 1:36:49 PM

ANSWERS
I 1. (c) 2. (b) 3. (c) 4. (b) 5. (a)
6. (d) 7. (b) 8. (b) 9. (b) 10. (b)
11. (b) 12. (a) 13. (d) 14. (a) 15. (b)
16. (a) 17. (c) 18. (b) 19. (a) 20. (c)
II 30. r = 1

III 38. n = 10
40. r = 0.85
41. (αβ ) = −50 , The given data is inconsistent

IV 44. r = 0.95 it is highly positively correlated

45. r = 0.847 wages and cost of living are highly positively correlated.

46. r = 0.8081. Statistics and mathematics marks are highly positively correlated.

47. ρ = 0.8929 price of tea and coffee are highly positively correlated.

48. ρ = 1 (perfect positive correlation)

49. ρ = 0.8

50. ρ = -0.905 x and y are highly negatively

51. ρ = -0.78 marks in Tamil and English are negatively correlated.

52. ρ = +0.733

53. There is a negative association between attacked and vaccinated.

54. There is a positive association between not attacked and not vaccinated.

55. Hence vaccination can be regarded as a preventive measure of Hepatitis B.

127 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 127 3/4/2019 1:36:50 PM

CHAPTER

REGRESSION
5 ANALYSIS

Francis Galton (1822-1911) was born in a wealthy family. The youngest of nine
children, he appeared as an intelligent child. Galton’s progress in education was
not smooth. He dabbled in medicine and then studied Mathematics at Cambridge.
In fact he subsequently freely acknowledged his weakness in formal Mathematics,
but this weakness was compensated by an exceptional ability to understand the
meaning of data. Many statistical terms, which are in current usage were coined
by Galton. For example, correlation is due to him, as is regression, and he was the Francis Galton
originator of terms and concepts such as quartile, decile and percentile, and of the use of median as
the midpoint of a distribution.
The concept of regression comes from genetics and was popularized by Sir Francis Galton
during the late 19th century with the publication of regression towards mediocrity in hereditary stature.
Galton observed that extreme characteristics (e.g., height) in parents are not passed on completely to
their offspring. An examination of publications of Sir Francis Galton and Karl Pearson revealed that
Galton's work on inherited characteristics of sweet peas led to the initial conceptualization of linear
regression. Subsequent efforts by Galton and Pearson brought many techniques of multiple regression
and the product-moment correlation coefficient.

LEARNING OBJECTIVES

The student will be able to

know the concept of regression, its types and their uses.
fit best line of regression by applying the method of least squares.
calculate the regression coefficient and interpret the same.
know the uses of regression coefficients.
distinguish between correlation analysis and regression analysis.

Introduction
The correlation coefficient is an useful statistical tool for describing the type ( positive or
negative or uncorrelated ) and intensity of linear relationship (such as moderately or highly) between
two variables. But it fails to give a mathematical functional relationship for prediction purposes.
Regression analysis is a vital statistical method for obtaining functional relationship between a
dependent variable and one or more independent variables. More specifically, regression analysis
helps one to understand how the typical value of the dependent variable (or ‘response variable’)
changes when any one of the independent variables (regressor(s) or predictor(s)) is varied, while
the other independent variables are held fixed. It helps to determine the impact of changes in
the value(s) of the the independent variable(s) upon changes in the value of the dependent variable.
Regression analysis is widely used for prediction.

129 Regression Analysis

12th_Statistics_EM_Unit_5.indd 129 2/27/2019 1:42:56 PM

5.1 DEFINITION
Regression analysis is a statistical method of determining the mathematical functional
relationship connecting independent variable(s) and a dependent variable.

Types of ‘Regression’
Based on the kind of relationship between the dependent variable and the set of independent
variable(s), there arises two broad categories of regression viz., linear regression and non-linear regression.
If the relationship is linear and there is only one independent variable, then the regression
is called as simple linear regression. On the other hand, if the relationship is linear and the
number of independent variables is two or more, then the regression is called as multiple linear
regression. If the relationship between the dependent variable and the independent variable(s) is
not linear, then the regression is called as non-linear regression.

5.1.1 Simple Linear Regression

It is one of the most widely known modeling techniques. In this technique, the dependent
variable is continuous, independent variable(s) can be continuous or discrete and nature of
relationship is linear. This relationship can be expressed using a straight line equation (linear
regression) that best approximates all the individual data points.
Simple linear regression establishes a relationship between a dependent variable (Y) and
one independent variable (X) using a best fitted straight line (also known as regression line).

NOTE
Error There are many reasons for the
presence of the error term in the
linear regression model. It is also
Inependent Dependent
known as measurement error. In
variable X variable Y
some situations, it indicates the
Regression line,Y=a+bX+e presence of several variables other
than the present set of regressors.

The general form of the simple linear regression equation is Y = a + bX + e, where ‘X’ is
independent variable, ‘Y’ is dependent variable, a’ is intercept, ‘b’ is slope of the line and ‘ e’ is
error term. This equation can be used to estimate the value of response variable (Y) based on the
given values of the predictor variable (X) within its domain.

5.1.2 Multiple Linear Regression

In the case of several independent variables, regression analysis also allows us to compare
the effects of independent variables measured on different scales, such as the effect of price
changes and the number of promotional activities.
Multiple linear regression uses two or more independent variables to estimate the value(s)
of the response variable (Y).

12th Std Statistics 130

12th_Statistics_EM_Unit_5.indd 130 2/27/2019 1:42:56 PM

The general form of the multiple linear regression equation is
Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + e X

Here, Y represents the dependent (response) variable, Xi

represents the ith independent variable (regressor), a and bi are the X Y
regression coefficients and e is the error term.
Suppose that price of a product (Y) depends mainly upon three X
promotional activities such as discount (X1), instalment scheme (X2)
and free installation (X3). If the price of the product has linear relationship with each promotional
activity, then the relationship among Y and X1, X2 and X3 may be expressed using the above
general form as
Y=
a + b1 X 1 + b2 X 2 + b3 X 3 + e .
These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the
best set of variables to be used for building regression models for predictive purposes.

5.1.3 Non-Linear Regression

If the regression is not linear and is in some other form, then the regression is said to be
non-linear regression. Some of the non-linear relationships are displayed below.

5.2 USES OF REGRESSION

Benefits of using regression analysis are as follows:
NOTE
1. It indicates the significant mathematical
Multiple linear regression and
relationship between independent variable (X) and
Curvilinear relationships (non-
dependent variable( Y ). (i.e) Model construction
linear regression) are out of the
2. It indicates the strength of impact (b) of
syllabus. Basic information about
independent variable on a dependent variable.
them are given here, for enhancing
3. It is used to estimate (interpolate) the value of the knowledge.
the response variable for different values of the
independent variable from its range in the given data. It means that extrapolation of the
dependent variable is not generally permissible.

131 Regression Analysis

12th_Statistics_EM_Unit_5.indd 131 2/27/2019 1:42:56 PM

4. In the case of several independent variables, regression analysis is a way of mathematically
sorting out which of those variables indeed have an impact (It answers the questions: Which
independent variable matters most? Which can we ignore? How do those independent
variables interact with each other?

5.3 WHY ARE THERE TWO REGRESSION LINES?

There may exist two regression lines in certain circumstances. When the variables X and Y
are interchangeable with related to causal effects, one can consider X as independent variable and
Y as dependent variable (or) Y as independent variable and X as dependent variable. As the result,
we have (1) the regression line of Y on X and (2) the regression line of X on Y.
Both are valid regression lines. But we must judicially select the one regression equation
which is suitable to the given environment.
Note: If, X only causes Y, then there is only one regression line, of Y on X.

5.3.1 Simple Linear Regression

In the general form of the simple linear regression equation of Y on X
Y= a + bX + e
the constants ‘a’ and ‘b’ are generally called as the regression coefficients.
The coefficient ‘b’ represents the rate of change in the value of the mean of Y due to
every unit change in the value of X. When the range of X includes ‘0’, then the intercept ‘a’ is
E(Y|X = 0). If the range of X does not include ‘0’, then ‘a’ does not have practical interpretation.
If (xi,y i), i = 1, 2, ..., n is a set of n-pairs of observations made on (X, Y), then fitting of
^
the above regression equation means finding the estimates ‘a^’ and ‘b’ for ‘a’ and ‘b’ respectively.
These estimates are determined based on the following general assumptions:
i) the relationship between Y and X is linear (approximately).
ii) the error term ‘e’ is a random variable with mean zero.
iii) the error term ‘e’ has constant variance.
There are other assumptions on ‘e’, which are not required at this level of study.

Before going for further study, the following points are to be kept in mind.
• Both the independent and dependent variables must be measured at the interval scale.
• There must be linear relationship between independent and dependent variables.
• Linear Regression is very sensitive to Outliers (extreme observations). It can affect the
regression line extremely and eventually the estimated values of Y too.

Meaning of line of “best fit”

Based on the assumption (ii), the response variable Y is also a random variable with mean
E(Y|X=x) = a + bx

12th Std Statistics 132

12th_Statistics_EM_Unit_5.indd 132 2/27/2019 1:42:56 PM

In regression analysis, the main objective is finding the line of best fit, which provides the
fitted equation of Y on X.
The line of ‘best fit‘ is the line (straight line equation) which minimizes the error in the
estimation of the dependent variable Y, for any specified value of the independent variable X
from its range.
The regression equation E(Y|X=x) = a +bx represents a family of straight lines for different
values of the coefficients ‘a’ and ‘b’. The problem is to determine the estimates of ‘a’ and ‘b’ by
minimizing the error in the estimation of Y so that the line is a best fit. This necessitates to find
the suitable values of the estimates of ‘a’ and ‘b’.

5.4 METHOD OF LEAST SQUARES

In most of the cases, the data points do not fall on a straight line (not highly correlated),
thus leading to a possibility of depicting the relationship between the two variables using several
different lines. Selection of each line may lead to a situation where the line will be closer to some
points and farther from other points. We cannot decide which line can provide best fit to the data.
Method of least squares can be used to determine the line of best fit in such cases. It
determines the line of best fit for given observed data by minimizing the sum of the squares of
the vertical deviations from each data point to the line.

5.4.1 Method of Least Squares

To obtain the estimates of the coefficients ‘a’ and ‘b’, the least squares method minimizes
the sum of squares of residuals. The residual for the ith data point ei is defined as the difference
between the observed value of the response variable, y i, and the estimate of the response variable,
ŷ i, and is identified as the error associated with the data. i.e., ei = y i–ŷ i , i =1 ,2, ..., n.

The method of least squares helps us to find the values of unknowns ‘a’ and ‘b’ in such a
way that the following two conditions are satisfied:
n
• Sum of the residuals is zero. That is ∑ ( y − yˆ ) =
i =1
i i 0.
n
Sum of the squares of the residuals E (= ∑ ( y − yˆ )
2
• a, b) i i is the least.
i =1

5.4.2 Fitting of Simple Linear Regression Equation

The method of least squares can be applied to determine the estimates of ‘a’ and ‘b’ in the
simple linear regression equation using the given data (x1,y1), (x2,y2), ..., (xn,y n) by minimizing
n
E (= ∑ ( y − yˆ )
2
a, b) i i
i =1 Simple Linear Regression Model

n
i.e., E (a,= ∑ ( y − a − bx )
2 Y
b) i i . yi = a+b xi+Error
i =1

Here, yˆi = a + bxi is the expected (estimated) value of the } Error

Regression line

response variable for given xi. yt i = a+b xi

X
Observed Value

133 Regression Analysis

12th_Statistics_EM_Unit_5.indd 133 2/27/2019 1:42:57 PM

It is obvious that if the expected value (y^i) is close to the observed value (y i), the residual will
be small. Since the magnitude of the residual is determined by the values of ‘a’ and ‘b’, estimates
of these coefficients are obtained by minimizing the sum of the squared residuals, E(a,b).
Differentiation of E(a,b) with respect to ‘a’ and ‘b’ and equating them to zero constitute a
set of two equations as described below:

∂E (a, b) n
=−2∑ ( yi − a − bxi ) = 0
∂a i =1

∂E (a, b) n
=−2∑ xi ( yi − a − bxi ) = 0
∂b i =1

These give
n n
na + b∑ xi =
∑ yi
=i 1 =i 1

n n n
a ∑ xi + b∑ xi2 =
∑ xi yi
=i 1 =i 1 =i 1

These equations are popularly known as normal equations. Solving these equations for ‘a’
and ‘b’ yield the estimates â and b̂ .
â= y − bx
ˆ

and
1 n
∑ xi yi − x y
n i =1
b=
ˆ
1 n 2
∑
n i =1
xi − x 2

It may be seen that in the estimate of ‘b’, the numerator and denominator are respectively
the sample covariance between X and Y, and the sample variance of X. Hence, the estimate of ‘b’
may be expressed as
Cov( X , Y )
bˆ =
V (X )

Further, it may be noted that for notational convenience the denominator of b̂ above is
mentioned as variance of X. But, the definition of sample variance remains valid as defined in
1 n
Chapter I, that is,
n 1 i 1

xi x 2 .

From Chapter 4, the above estimate can be expressed using, rXY , Pearson’s coefficient of the
simple correlation between X and Y, as
SD(Y )
bˆ = rXY .
SD( X )

12th Std Statistics 134

12th_Statistics_EM_Unit_5.indd 134 2/27/2019 1:42:57 PM

Important Considerations in the Use of Regression Equation:
1. Regression equation exhibits only the relationship between the respective two variables.
Cause and effect study shall not be carried out using regression analysis.
2. The regression equation is fitted to the given values of the independent variable. Hence, the
fitted equation can be used for prediction purpose corresponding to the values of the regressor
within its range. Interpolation of values of the response variable may be done corresponding
to the values of the regressor from its range only. The results obtained from extrapolation
work could not be interpreted.

Example 5.1
n n
Construct the simple linear regression equation of Y on X if n = 7, xi 113 , x 2
i 1983 ,
n n i 1 i 1
y 182
i 1
i and x y 3186 .
i i
i 1

Solution:
The simple linear regression equation of Y on X to be fitted for given data is of the form
^
Y = a + bx (1)
The values of ‘a’ and ‘b’ have to be estimated from the sample data solving the following
normal equations.
n n
na b xi yi (2)
i 1 i 1
n n n
a xi b xi2 xi yi
i 1 i 1 i 1
(3)
Substituting the given sample information in (2) and (3), the above equations can be
expressed as
7 a + 113 b = 182 (4)
113 a + 1983 b = 3186 (5)
(4) ×113 ⇒ 791 a + 12769 b = 20566
(5) ×7 ⇒ 791 a + 13881 b = 22302
(−) (−) (−)

−1112 b = −1736
1736
⇒b = = 1.56
1112
b = 1.56
Substituting this in (4) it follows that,
7 a + 113 × 1.56 = 182
7 a + 176.28 = 182
7 a = 182 – 176.28
= 5.72
Hence, a = 0.82

135 Regression Analysis

12th_Statistics_EM_Unit_5.indd 135 2/27/2019 1:42:59 PM

Example 5.2
Number of man-hours and the corresponding productivity (in units) are furnished below.
Fit a simple linear regression equation Yˆ= a + bx applying the method of least squares.

Man-hours 3.6 4.8 7.2 6.9 10.7 6.1 7.9 9.5 5.4
Productivity (in units) 9.3 10.2 11.5 12 18.6 13.2 10.8 22.7 12.7

Solution:
The simple linear regression equation to be fitted for the given data is
Yˆ= a + bx
Here, the estimates of a and b can be calculated using their least squares estimates
â= y − bx
ˆ
1 n n
ˆ1 x
=aˆ
=
∑ i n∑
y
n i 1=
− b i
i.e., i 1

1 n
∑ xi yi − ( x × y )
n i =1
b=
ˆ
1 n 2
∑ xi − x 2
n i =1
n
 n n

n∑ xi yi −  ∑ xi × ∑ yi 
or equivalently bˆ=
= i1 =  i 1 =i 1 
2
n
 n 
n∑ xi −  ∑ xi 
2

=i 1 = i1 

From the given data, the following calculations are made with n=9

Man-hours xi Productivity y i x i2 x iy i
3.6 9.3 12.96 33.48
4.8 10.2 23.04 48.96
7.2 11.5 51.84 82.8
6.9 12 47.61 82.8
10.7 18.6 114.49 199.02
6.1 13.2 37.21 80.52
7.9 10.8 62.41 85.32
9.5 22.7 90.25 215.65
5.4 12.7 29.16 66.42
9 9 9 9

∑ xi = 62.1
i =1
∑ yi = 121
i =1
∑ xi2 = 468.97
i =1
∑ x y = 894.97
i =1
i i

12th Std Statistics 136

12th_Statistics_EM_Unit_5.indd 136 2/27/2019 1:42:59 PM

Substituting the column totals in the respective places in the of the estimates â and b̂ ,
their values can be calculated as follows:
(9 × 894.97) − (62.1×121)
bˆ =
(9 × 468.97) − (62.1) 2

8054.73 − 7514
=
4220.73 − 3856.41

540.73
=
364.32

Thus, bˆ = 1.48 .
Now â can be calculated using b̂ as

121  62.1 
aˆ = − 1.48 × 
9  9 
= 13.40 – 10.21
Hence, â = 3.19
Therefore, the required simple linear regression equation fitted to the given data is
=
Yˆ 3.19 + 1.48 x
It should be noted that the value of Y can be estimated using the above fitted equation for
the values of x in its range i.e., 3.6 to 10.7.


In the estimated simple linear regression equation of Y on X

Yˆ= aˆ + bx
ˆ

we can substitute the estimate â= y − bx

ˆ . Then, the regression equation will become as

Yˆ =y − bx
ˆ + bx
ˆ

Yˆ − y = bˆ( x − x )

It shows that the simple linear regression equation of Y on X has the slope b̂ and the
corresponding straight line passes through the point of averages ( x , y ) . The above representation
of straight line is popularly known in the field of Coordinate Geometry as ‘Slope-Point form’. The
above form can be applied in fitting the regression equation for given regression coefficient b̂
and the averages x and y .
As mentioned in Section 5.3, there may be two simple linear regression equations for each
X and Y. Since the regression coefficients of these regression equations are different, it is essential
to distinguish the coefficients with different symbols. The regression coefficient of the simple
linear regression equation of Y on X may be denoted as bYX and the regression coefficient of the
simple linear regression equation of X on Y may be denoted as bXY.

137 Regression Analysis

12th_Statistics_EM_Unit_5.indd 137 2/27/2019 1:42:59 PM

Using the same argument for fitting the regression equation of Y on X, we have the simple
linear regression equation of X on Y with best fit as
Xˆ = cˆ + bXY y
where ĉ=== xcˆ −+−bbXY y
YX y

1 n
∑ xi yi − xxyy
n i =1
bXY =
1 n 2
∑
n i =1
yi − y 2

The slope-point form of this equation is

Xˆ −=
x bXY ( y − y ).
Also, the relationship between the Karl Pearson’s coefficient of correlation and the
regression coefficient are
SD(Y )
bXX = r SD( X ) and bYXbˆ == rXY .
XY
SD(Y ) SD( X )

5.5 PROPERTIES OF REGRESSION COEFFICIENTS

1. Correlation coefficient is the geometric mean between the regression coefficients.

rXY bXY bYX

2. It is clear from the property 1, both regression coefficients must have the same sign.
i.e., either they will positive or negative.
3. If one of the regression coefficients is greater than unity, the other must be less than unity.
4. The correlation coefficient will have the same sign as that of the regression coefficients.
5. Arithmetic mean of the regression coefficients is greater than the correlation coefficient.
bXY bYX
rXY
2
6. Regression coefficients are independent of the change of origin but not of scale.

Properties of regression equation

1. If r = 0, the variables are uncorrelated, the lines of regression become perpendicular to each
other.
2. If r = 1, the two lines of regression either coincide or parallel to each other.

m m2
3. Angle between the two regression lines is tan 1 1 where m1 and m2 are the
1 m1m2
slopes of regression lines X on Y and Y on X respectively.
4. The angle between the regression lines indicates the degree of dependence between the variable.
5. Regression equations intersect at (X, Y)

12th Std Statistics 138

12th_Statistics_EM_Unit_5.indd 138 2/27/2019 1:43:06 PM

Example 5.3
Calculate the two regression equations of X on Y and Y on X from the data given below,
taking deviations from actual means of X and Y.

x 12 14 15 14 18 17
y 42 40 45 47 39 45
Estimate the likely demand when the X = 25.

Solution:

xi ui = xi – 15 ui 2 yi v i = yi – 43 v i2 u iv i
12 -3 9 42 -1 1 3
14 -1 1 40 -3 9 3
15 -0 0 45 2 4 0
14 -1 1 47 4 16 -4
18 3 9 39 -4 16 -12
17 2 4 45 2 4 4

Total 90 0 24 258 0 50 -6

6
90
=
=
xx ∑xx=
∑ /=
i ==
1
66 ii == 15
6

6
258
=y ∑ y=
/5
i =1
i = 43
6

The regression line of U on V is computed as under

n n n
n∑ ui vi − ∑ ui ∑ vi
^∧ 6 ( −6 )
bbUVuv = =i 1 =i 1 =i 1
= = −0.12
 n 
n 2
6 × 50
n∑ vi −  ∑ vi  2

=i 1 = i1 
∧ ∧ ∧^ ∧
a=
=
u − −b UV
buvv v=
=0
^∧ ∧
Hence, the regression line of U on V is U = =b vv + a =
bUV −0.12v

Thus, the regression line of X on Y is (Y–43) = –0.25(x–15)

When x = 25, y – 43 = –0.25 (25–15)

y = 40.5

139 Regression Analysis

12th_Statistics_EM_Unit_5.indd 139 2/27/2019 1:43:06 PM

Important Note: If X, Y are not integers then the above method is tedious and time
consuming to calculate bYX and bXY. The following modified formulae are easy for calculation.
n n n
n x i y i x i y i
bYX i 1 i 1 i 1
2
n
n
n x x i
2
i
i 1 i 1
n n n
n x i y i x i y i
bXY i 1 i 1 i 1
2

n
n
n y y i
2
i
i 1 i 1

Example 5.4
The following data gives the experience of machine operators and their performance
ratings as given by the number of good parts turned out per 50 pieces.

Operators 1 2 3 4 5 6 7 8
Experience (X) 8 11 7 10 12 5 4 6
Ratings (Y) 11 30 25 44 38 25 20 27
Obtain the regression equations and estimate the ratings corresponding to the experience
x=15.
Solution:

xi yi x iy i x i2 y i2
8 11 88 64 121
11 30 330 121 900
7 25 175 49 625
10 44 440 100 1936
12 38 456 144 1444
5 25 125 25 625
4 20 80 16 400
6 27 162 36 729
Total 63 220 1856 555 6780
Regression equation of Y on X,

Y y bYX x x
^

x i
63
x i 1
7.875
n 8
n

y i
220
y i 1
27.5
n 8

12th Std Statistics 140

12th_Statistics_EM_Unit_5.indd 140 2/27/2019 1:43:13 PM

The above two means are in decimal places so for the simplicity we use this formula to compute bYX .
n n n
n x i y i x i y i
bYX i 1 i 1 i 1
2

n
n
n x x i
2
i
i 1 i 1
8 1856 63 220

8 555 63 63
14848 13860

4440 3969
988
=
471
bYX = 2.098

The regression equation of Y on X,

Y y bYX x x
^

^
Y – 27.5 = 2.098 (x – 7.875)
^
Y – 27.5 = 2.098 x – 16.52
^
Y = 2.098x + 10.98
When x = 15,
^
Y = 2.098 × 15 +10.98
^
Y = 31.47 + 10.98
= 42.45
Regression equation of X on Y,

X x bXY y y
^

n n n
n x i y i x i y i
bXY i 1 i 1 i 1
2

n
n
n y y i
2
i
i 1 i 1
8 1856 63 220

8 6780 220 220
14848 13860

54240 48400
988
=
5840
bXY = 0.169

141 Regression Analysis

12th_Statistics_EM_Unit_5.indd 141 2/27/2019 1:43:24 PM

The regression equation of X on Y,
^
X – 7.875 = 0.169 (y – 27.5)
^
X – 7.875 = 0.169y – 0.169 × 27.5
^
X = 0.169y + 3.222

Example 5.5
The random sample of 5 school students is selected and their marks in statistics and
accountancy are found to be

Statistics 85 60 73 40 90
Accountancy 93 75 65 50 80

Find the two regression lines.

Solution:
The two regression lines are:
Regression equation of Y on X,
^
Y y bYX x x

Regression equation of X on Y,
X x bXY y y
^

ui = x i – A v i = xi – B
xi yi u iv i ui 2 y i2
= xi – 60 = xi – 75

85 93 25 18 450 625 324

60 A 75 B 0 0 0 0 0
73 65 13 –10 –130 169 100
40 50 –20 –25 500 400 625
90 80 30 5 150 900 25

Total 348 363 48 12 970 2094 1074

x i
348
x i 1
69.6
n 5
n

y i
363
y i 1
72.6
n 5
Since the mean values are in decimals format not as integers and numbers are big, we take
origins for x and y and then solve the problem.

12th Std Statistics 142

12th_Statistics_EM_Unit_5.indd 142 2/27/2019 1:43:24 PM

Regression equation of Y on X,
^
Y y bYX x x

Calculation of bYX
n n n
n ui vi ui vi
bYX bVU i 1 i 1 i 1
2
n
n
n u ui
2
i
i 1 i 1
5 970 48 9(12)

5 2094 (48) 2
4850 + 576
=
10470 – 2304

5426
= = 0.664
8126
b=
YX b=
VU 0.664
^
Y – 72.6 = 0.664 (x – 69.6)
^
Y – 72.6 = 0.64x – 46.214
^
Y = 0.664x + 26.386
Regression equation of X on Y,
X x bXY y y
^

Calculation of bXY
n n n
n ui vi ui vi
bXY bUV i 1 i 1 i 1
2

n
n
n v vi
2
i
i 1 i 1
5 970 48 (12)

5 1074 (12) 2
4850 576 5426

5370 144 5226
bUV = 1.038
^
X – 69.6 = 1.038 (y – 72.6)
^
X – 69.6 = 1.038y – 75.359
^
X = 1.038y – 5.759

143 Regression Analysis

12th_Statistics_EM_Unit_5.indd 143 2/27/2019 1:43:32 PM

Example 5.6
Is there any mistake in the data provided about the two regression lines
Y = −1.5 X + 7, and X = 0.6 Y + 9? Give reasons.

Solution:
The regression coefficient of Y on X is bYX = –1.5
The regression coefficient of X on Y is bXY = 0.6
Both the regression coefficients are of different sign, which is a contrary. So the given
equations cannot be regression lines.

Example: 5.7

mean S.D
Yield of wheat (kg. unit area) 10 8
Annual Rainfall (inches) 8 2

Correlation coefficient: 0.5

Estimate the yield when rainfall is 9 inches

Solution:
Let us denote the dependent variable yield by Y and the independent variable rainfall by X.
Regression equation of Y on X is given by
SD(Y )
Y – ybˆ == rXY (x – x)
SD( X )

x = 8, SD(X) = 2, y = 10, SD(Y) = 8, rXY = 0.5

8
Y 10 0.5 (xX–8)
2
= 2 (x – 8)
When x = 9,
Y – 10 = 2 (9 – 8)
Y = 2 + 10
= 12 kg (per unit area)
Corresponding to the annual rain fall 9 inches the expected yield is 12 kg ( per unit area).

Example 5.8
For 50 students of a class the regression equation of marks in Statistics (X) on marks in
Accountancy (Y) is 3Y – 5X + 180 = 0. The mean marks in of Accountancy is 50 and variance of
marks in statistics is 16
25
of the variance of marks in Accountancy.

12th Std Statistics 144

12th_Statistics_EM_Unit_5.indd 144 2/27/2019 1:43:34 PM

Find the mean marks in statistics and the coefficient of correlation between marks in the
two subjects when the variance of Y is 25.

Solution:
We are given that:
n = 50, Regression equation of X on Y as 3Y – 5X + 180 = 0
16
y = 50 , V ( X ) = V (Y ) , and V(Y) = 25.
25
We have to find (i) x and (ii) rXY
(i) Calculation for x
Since (x, y) is the point of intersection of the two regression lines, they lie on the regression
line 3Y – 5X + 180 = 0
Hence, 3 y 5x 180 0
3(50) 5x 180 0
5 x 180 150
330
330
x 66
5
x 66
(ii) Calculation for coefficient of correlation.
3Y 5 X 180 0
5 X 180 3Y
X 36 0.6 Y
bXY 0.6

Also bXY = r SD( X )

XY
SD(Y )

0.6 = r SD( X )
XY
SD(Y )
0.6 × SD(Y )
rXY =
SD( X )
V (Y )
2
rXY = 0.36 × (1)
V (X)
Given that:
V(Y) = 25
16
V ( X ) = V (Y )
25
= 16 × 25
25
V(X) = 16

145 Regression Analysis

12th_Statistics_EM_Unit_5.indd 145 2/27/2019 1:43:43 PM

Substituting in (1) we have
0.36 25
2
rXY
16

0.36 25
rXY = 0.75
16

Example 5.9
5 9
If two regression coefficients are bYX = and bXY = , what would be the value of rXY?
6 20
Solution:
The correlation coefficient rXY bYX bXY
5 9 = 0.375

6 20
Since both the signs in bYX and bXY are positive, correlation coefficient between X and Y is
positive.

Example 5.10 NOTE

18 5 The sign of the corelation
Given that bYX 7 and bXY . Find r?
6 coefficient will be the signs of the
Solution: regression coefficients.

rXY bYX bXY

18 5 15
= = –0.553.
7 6 7
Since both the signs in bYX and bXY are negative, correlation coefficient between X and Y
is negative.

12th Std Statistics 146

12th_Statistics_EM_Unit_5.indd 146 2/27/2019 1:43:55 PM

5.6 DIFFERENCE BETWEEN CORRELATION AND REGRESSION

Correlation Regression
1. It indicates only the nature and extent of It is the study about the impact of the
linear relationship independent variable on the dependent
variable. It is used for predictions.
2. If the linear correlation is coefficient is The regression coefficient is positive, then for
positive / negative , then the two variables every unit increase in x, the corresponding
are positively / or negatively correlated average increase in y is bYX. Similarly, if the
regression coefficient is negative , then for
every unit increase in x, the corresponding
average decrease in y is bYX.

3. One of the variables can be taken as x and Care must be taken for the choice of independent
the other one can be taken as the variable y. variable and dependent variable. We can not
assign arbitrarily x as independent variable and
y as dependent variable.
4. It is symmetric in x and y, It is not symmetric in x and y, that is, bXY and bYX
ie., rXY=rYX have different meaning and interpretations.

POINTS TO REMEMBER

There are several types of regression - Simple linear correlation , multiple linear
correlation and non-linear correlation.
In simple linear regression there are two linear regression lines Y on X and X on Y.
In the linear regression line Y = a + bX + e , where ‘X’ is independent variable, ‘Y’ is
dependent variable, a’ is intercept, ‘b’ is slope of the line and ‘ e’ is error term.
The point ( X , Y ) passes through the regression lines.
The “ Method of least squares” gives the line of best fit.
Both the regression lines have the same sign either positive of negative.
The sign of the regression coefficient and the sign of the correlation coefficient is
same.

147 Regression Analysis

12th_Statistics_EM_Unit_5.indd 147 2/27/2019 1:43:57 PM

EXERCISE 5

I. Choose the best answer.

1. ______ is widely used for prediction
a) regression analysis b) correlation analysis
c) analysis of variance d) analysis of covariance

2. The linear regression analysis can be classified in to

a) 4 types b) 3 types c) 2 types d) none of the above

3. The linear equation Y = a + bx is called as regression equation of

a) X on Y b) Y on X c) between X and Y d) ‘a’ on ‘b’

4. In regression equation X = a + by + e is
a) correlation coefficient of Y on X b) correlation coefficient of X on Y
c) regression coefficient of Y on X d) regression coefficient of X on Y

5. bYX =
SD( X ) SD(Y ) SD( X ) SD(Y )
a) rXY b) rXY c) d)
SD(Y ) SD( X ) SD(Y ) SD( X )

6. If bXY > 1 then bYX is

a) 1 b) 0 c) > 1 d) < 1
SD(Y )
x x , rXY SD(Y ) is
^
7. In the Regression equation Y y rXY
SD( X ) SD( X )
a) bYX b) bXY c) rXY d) cov(X,Y)

8. Using the regression coefficients we can calculate

a) cov(X, Y) b) SD(X)
c) correlation coefficient d) coefficient of variance

9. Arithmetic mean of the regression coefficients bXY and bYX is

a) > rXY b) ≥ rXY c) ≤ rXY d) < rXY

10. Regression analysis helps in establishing a functional relationship between ______ variables.
a) 2 or more variables b) 2 variables
c) 3 variables d) none of these

11. _____ is the Father of mental tests

a) R.A. Fisher b) Croxton and Cowden

c) Francis Galton d) A.L. Bowley

12th Std Statistics 148

12th_Statistics_EM_Unit_5.indd 148 2/27/2019 1:43:58 PM

12. Correlation coefficient is the _______ between the regression coefficients

a) arithmetic mean b) geometric mean

c) harmonic mean d) none of the above

13. If the two lines of regression are perpendicular to each other then rXY =

a) 0 b) 1 c) –1 d) 0.5

14. If the two regression lines are parallel then

a) rXY = 0 b) rXY = +1 c) rXY = –1 d) rXY = ± 1

15. Angle between the two regression lines is

m1 m2
a) tan 1 m1 m2 b) tan 1
1 m m
1 m1m2 1 2

m m2
c) tan 1 1 d) none of the above
1 m1 m2

16. bXY =

SD(Y ) SD( X )
a) rXY b) rXY
SD( X ) SD(Y )
1
c) rXY SD(X) SD(Y) d)
bYX
17. Regression equation of X on Y is

a) Y = a + bYX x + e b) Y = bXY x + a + e
c) X = a + bXY y + e d) X = bYX y + a + e
^
18. For the regression equation 2Y = 0.605x + 351.58. The regression coefficient of Y on X is

a) bXY = 0.3025 b) bXY = 0.605

c) bYX = 175.79 d) bYX = 351.58

19. If bXY = 0.7 and ‘a’ = 8 then the regression equation of X on Y is

a) Y = 8 + 0.7 X b) X = 8 + 0.7 Y
c) Y = 0.7 + 8 X d) X = 0.7 + 8 Y

20. The regression lines intersect at

a) (X, Y) b) (X, Y) c) (0, 0) d) (1, 1)

149 Regression Analysis

12th_Statistics_EM_Unit_5.indd 149 2/27/2019 1:44:02 PM

II. Give very short answer to the following questions.
21. Define regression.
22. What are the types of regression?
23. Write the two simple linear regression equations.
24. Write the two simple linear regression coefficients.
25. If the regression coefficient of X on Y is 16 and the regression coefficient of Y on X is 4, then
find the correlation coefficient.
26. Find the standard deviation of Y given that V(X) is 36, bXY = 0.8, rXY = 0.5.

III. Give short answer to the following questions.

27. Define simple linear and multiple linear regressions
28. Distinguish between linear and non-linear regression.
29. Write the regression equation of X on Y and its normal equations.
30. Write the regression equation of Y on X and its normal equations.
31. Write any three properties of regression.
32. Write any three uses of regression.
33. Write any three differences between correlation and regression.
^ ^
34. If the regression equations are X = 64 – 0.95y, Y = 7.25 – 0.95x then find the correlation
coefficient.
35. Given the following lines of regression.
8X – 10Y + 66 = 0 and 40X – 18Y = 214. Find the mean values of X and Y.

36. Given x = 90, y = 70, bXY = 1.36, bYX = 0.61 when y = 50, Find the most probable value of X.

37. Compute the two regression equations from the following data.

x 1 2 3 4 5
y 3 4 5 6 7
^
If x = 3.5 what will be the value of Y ?

IV Give detailed answer to the following questions.

38. Write in detail the properties of regression.
39. Explain in detail the uses of regression analysis.
40. Distinguish between correlation and regression.
41. Interpret the result for the given information. A simple regression line is fitted for a data set
and its intercept and slope respectively are 2 and 3. Construct the linear regression of the
form Y = a + bx and offer your interpretation for ‘a’ and ‘b’. If X is increased from 1 to 2, what
is the increase in Y value. Further if X is increased from 2 to 5 what would be the increase
in Y. Demonstrate your answer mathematically.

12th Std Statistics 150

12th_Statistics_EM_Unit_5.indd 150 2/27/2019 1:44:02 PM

42. Using the method of least square, calculate the regression equation of X on Y and Y on X from
the following data and estimate X where Y is 16.
x 10 12 13 17 18
y 5 6 7 9 13
Also determine the value of correlation coefficient.

43. The following table shows the age (X) and systolic blood pressure (Y) of 8 persons.

Age (X) 56 42 60 50 54 49 39 45
Blood pressure (Y) 160 130 125 135 145 115 140 120

Fit a simple linear regression model, Y on X and estimate the blood pressure of a person of
60 years.

44. Find the regression equation of X on Y given that n = 5, ∑x = 30, ∑y = 40, ∑xy = 214, ∑x2 = 220,
∑y2 = 340.

45. Given the following data, estimate the marks in statistics obtained by a student who has
scored 60 marks in English.
Mean of marks in Statistics = 80, Mean of marks in English = 50, S.D of marks in Statistics =
15, S.D of marks in English = 10 and Coefficient of correlation = 0.4.

46. Find the linear regression equation of percentage worms (Y) on size of the crop (X) based on
the following seven observations.

Size of the crop (X) 16 15 11 27 39 22 20

Percentage worms (Y) 24 25 34 40 35 20 23

47. In a correlation analysis, between production (X) and price of a commodity (Y) we get the
following details.
Variance of X = 36.
The regression equations are:
12X – 15Y + 99 = 0 and 60 X – 27 Y =321
Calculate (a) The average value of X and Y.
(b) Coefficient of correlation between X and Y.

151 Regression Analysis

12th_Statistics_EM_Unit_5.indd 151 2/27/2019 1:44:02 PM

ANSWERS
I. 1. a) 2. c) 3. b) 4. d) 5. b)
6. d) 7. a) 8. c) 9. b) 10. a)
11. c) 12. b) 13. a) 14. d) 15. c)
16. b) 17. c) 18. a) 19. b) 20. a)

II. 25) rXY = 8

26) SD(X) = 3.75

III. 34) rXY = –0.95

35) X = 13, Y = 17
^
36) when Y = 50, X = 62.8
^
37) Regression equation X on Y: X = Y – 2
^
Regression equation Y on X: Y = X + 2
^
when X = 3.5, Y = 5.5

IV. 41) (1) If X increases by 1 unit then Y increases by 3 units

(2) If X increases by 3 units then Y increases by 9 units
42) (1) Regression equation of X on Y is X = Y + 6; when Y = 16, X = 22
(2) Regression equation of Y on X is Y = 0.89 × –2.59
(3) bxy = 1, byx = 0.87, r = 0.93
43) Y = 0.45 X + 111.53, Y = 138.53 when age is 60 years.
44) a = 16.4, b = –1.3
Regression equation of X on Y is : X = 16.4 – 1.3Y
45) X = 86 when Y = 60
46) Y = 0.32 X + 21.84
47) (a) Mean of X = 13 and mean of Y = 17. (b) r = 0.6

12th Std Statistics 152

12th_Statistics_EM_Unit_5.indd 152 2/27/2019 1:44:02 PM

Correlation Analysis
No ratings yet
Correlation Analysis
50 pages
Module 4
No ratings yet
Module 4
95 pages
Correlation and Regression
No ratings yet
Correlation and Regression
71 pages
Measures of Correlation
No ratings yet
Measures of Correlation
23 pages
L 10 Correlation
No ratings yet
L 10 Correlation
57 pages
Correlation
No ratings yet
Correlation
44 pages
Correlation Research Design - PRESENTASI
100% (1)
Correlation Research Design - PRESENTASI
62 pages
Core Statistics For Economics II
No ratings yet
Core Statistics For Economics II
48 pages
Using Statistical Techniq Ues in Analyzing Data
100% (1)
Using Statistical Techniq Ues in Analyzing Data
40 pages
Correlation: (For M.B.A. I Semester)
100% (2)
Correlation: (For M.B.A. I Semester)
46 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
35 pages
4-1 Introduction To Corrrelation and Its Properties
0% (1)
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation
No ratings yet
Correlation
64 pages
Correlation Analysis
No ratings yet
Correlation Analysis
49 pages
202003241550009941rajeev Pandey Correlation Research
No ratings yet
202003241550009941rajeev Pandey Correlation Research
87 pages
Correlation and Regression
No ratings yet
Correlation and Regression
54 pages
Fds Unit III Notes
No ratings yet
Fds Unit III Notes
23 pages
Correlation Analysis
100% (1)
Correlation Analysis
51 pages
Correlation Analysis Guide
No ratings yet
Correlation Analysis Guide
83 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
100 pages
Lecture Sheet H
No ratings yet
Lecture Sheet H
17 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
29 pages
5 - Correlation Analysis
No ratings yet
5 - Correlation Analysis
34 pages
Customer Churn Data - A Project Based On Logistic Regression
100% (12)
Customer Churn Data - A Project Based On Logistic Regression
31 pages
Statics
No ratings yet
Statics
41 pages
Concept of Correlation
No ratings yet
Concept of Correlation
17 pages
Business Statistic-Correlation and Regression
No ratings yet
Business Statistic-Correlation and Regression
30 pages
Correlation SBC
No ratings yet
Correlation SBC
4 pages
Correlation
No ratings yet
Correlation
41 pages
Correlation Notes - Module3
No ratings yet
Correlation Notes - Module3
7 pages
ECAP790 U06L01 Correlation
No ratings yet
ECAP790 U06L01 Correlation
37 pages
Unit 2 Correlation Analysis: 2.1. Definition
No ratings yet
Unit 2 Correlation Analysis: 2.1. Definition
9 pages
Correlation
No ratings yet
Correlation
14 pages
CORRELATION
No ratings yet
CORRELATION
5 pages
11 Correlation
No ratings yet
11 Correlation
28 pages
MRS - Diana-Correlation Analysis-Notes
No ratings yet
MRS - Diana-Correlation Analysis-Notes
16 pages
QT Module II Correlation and Regression Analysis
No ratings yet
QT Module II Correlation and Regression Analysis
10 pages
Correlation
No ratings yet
Correlation
19 pages
Correlation and Its Significance
No ratings yet
Correlation and Its Significance
15 pages
Correlation 26-2-24
No ratings yet
Correlation 26-2-24
16 pages
Unit II - Correlation
No ratings yet
Unit II - Correlation
28 pages
Correlation
No ratings yet
Correlation
22 pages
Correlation
No ratings yet
Correlation
7 pages
4-1 Introduction To Corrrelation and Its Properties
No ratings yet
4-1 Introduction To Corrrelation and Its Properties
14 pages
Correlation: Definitions
No ratings yet
Correlation: Definitions
24 pages
Correlation KDK DHH W
No ratings yet
Correlation KDK DHH W
16 pages
STAT-111 - (C1) Corelations and Regression
No ratings yet
STAT-111 - (C1) Corelations and Regression
10 pages
Chapter - Six
No ratings yet
Chapter - Six
8 pages
Data Analysis With Small Samples and Non-Normal Data - Nonparametrics and Other Strategies
100% (1)
Data Analysis With Small Samples and Non-Normal Data - Nonparametrics and Other Strategies
241 pages
Correlation BMLT
No ratings yet
Correlation BMLT
5 pages
Tricycle Drivers' Income Study
No ratings yet
Tricycle Drivers' Income Study
8 pages
Correlation and Regression - Interview Questions in Business Analytics
No ratings yet
Correlation and Regression - Interview Questions in Business Analytics
5 pages
Correlation
No ratings yet
Correlation
4 pages
Security Analysis and Portfolio Management
80% (15)
Security Analysis and Portfolio Management
71 pages
Correlation
No ratings yet
Correlation
30 pages
Correlation
No ratings yet
Correlation
17 pages
Time Series Analysis (Stat 569 Lecture Notes)
100% (1)
Time Series Analysis (Stat 569 Lecture Notes)
21 pages
Approach To Comparative Politics
No ratings yet
Approach To Comparative Politics
8 pages
Correlation Notes
No ratings yet
Correlation Notes
9 pages
Evans Analytics2e PPT 12
100% (1)
Evans Analytics2e PPT 12
63 pages
Presentation On: Correlation and Rank Correlation: Submitted To
100% (3)
Presentation On: Correlation and Rank Correlation: Submitted To
23 pages
009 D 1 Correlation
No ratings yet
009 D 1 Correlation
29 pages
Peter
No ratings yet
Peter
48 pages
Shopee vs Lazada: Web Service Analysis
No ratings yet
Shopee vs Lazada: Web Service Analysis
4 pages
Statistics For Management
No ratings yet
Statistics For Management
102 pages
Statistics: Understanding Deviation
No ratings yet
Statistics: Understanding Deviation
14 pages
Apmc Prev Med Compilation
No ratings yet
Apmc Prev Med Compilation
72 pages
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
No ratings yet
Business Decision Making II Simple Linear Regression: Dr. Nguyen Ngoc Phan
69 pages
Unit 7 Correlation Analysis
100% (1)
Unit 7 Correlation Analysis
21 pages
MM Unit 2 Notes
No ratings yet
MM Unit 2 Notes
37 pages
Lesson: Chapter 4 - RESULTS AND DISCUSSION Presentation, Analysis and Interpretation of Data
No ratings yet
Lesson: Chapter 4 - RESULTS AND DISCUSSION Presentation, Analysis and Interpretation of Data
7 pages
Project Report Multiple Discriminant Analysis: in Partial Fulfilment of Covering The Course of Business Research Methods
No ratings yet
Project Report Multiple Discriminant Analysis: in Partial Fulfilment of Covering The Course of Business Research Methods
36 pages
Multivariate Statistical Analysis: Prof. DR.: RAFAEL AMARO
No ratings yet
Multivariate Statistical Analysis: Prof. DR.: RAFAEL AMARO
29 pages
FIN 5309 Homework 5
No ratings yet
FIN 5309 Homework 5
2 pages
MM Unit 1 Notes
No ratings yet
MM Unit 1 Notes
22 pages
TKI TranslationsStudy
No ratings yet
TKI TranslationsStudy
23 pages
Correlation and Association Tests Guide
No ratings yet
Correlation and Association Tests Guide
17 pages
Sperandio Et Al. (2013)
No ratings yet
Sperandio Et Al. (2013)
9 pages
Family Efficacy Scale
No ratings yet
Family Efficacy Scale
8 pages
Pill Camera
No ratings yet
Pill Camera
19 pages
Seven Tools of TQM
No ratings yet
Seven Tools of TQM
40 pages
Service Demand Management Guide
No ratings yet
Service Demand Management Guide
23 pages
Aspiring Data Scientist Profile
No ratings yet
Aspiring Data Scientist Profile
13 pages
R.A. Fisher's Maximum Likelihood Origins
No ratings yet
R.A. Fisher's Maximum Likelihood Origins
15 pages
PCR 702
No ratings yet
PCR 702
177 pages
DEVIATION - QUARTILE and PERCENTIL
No ratings yet
DEVIATION - QUARTILE and PERCENTIL
8 pages
IEOR E4630 Spring 2016 Syllabus
No ratings yet
IEOR E4630 Spring 2016 Syllabus
2 pages
Educators Critique Experiential Learning
100% (1)
Educators Critique Experiential Learning
2 pages
OCR Maths S1 Topic Questions From Papers Discrete Random Variables
No ratings yet
OCR Maths S1 Topic Questions From Papers Discrete Random Variables
7 pages
Expected Value & Variance Explained
No ratings yet
Expected Value & Variance Explained
3 pages
Statistical Project Guide
No ratings yet
Statistical Project Guide
3 pages
Assessment 1 - Assignment: Unit: STA101 - Statistics For Business
No ratings yet
Assessment 1 - Assignment: Unit: STA101 - Statistics For Business
3 pages

Unit-1 Correlation and Regression

Uploaded by

Unit-1 Correlation and Regression

Uploaded by

CHAPTER

Karl Pearson (1857-1936) was Charles Edward Spearman

The student will be able to

12th Std Statistics 106

12th_Statistics_EM_Unit_4.indd 106 3/4/2019 1:36:35 PM

“Figure as far as you can, then add judgment”

4.1 DEFINITION OF CORRELATION

4.2 TYPES OF CORRELATION

107 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 107 3/4/2019 1:36:35 PM

4.2.1 Simple correlation or Linear correlation

2. Negative correlation (Inverse correlation)

4. Perfect positive correlation

5. Perfect negative correlation

The variables are said to be positively correlated if

Y -Height posion of this li

12th Std Statistics 108

12th_Statistics_EM_Unit_4.indd 108 3/4/2019 1:36:36 PM

The variables are said to be negatively correlated if

If x increases and y decreases proportionately or if x decreases and y increases

109 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 109 3/4/2019 1:36:36 PM

Methods to find correlation NOTE

4.3 SCATTER DIAGRAM

12th Std Statistics 110

12th_Statistics_EM_Unit_4.indd 110 3/4/2019 1:36:37 PM

4.3.1 Merits and Demerits of scatter diagram

4.4 KARL PEARSON’S CORRELATION COEFFICIENT

4.4.1 Karl Pearson’s coefficient of correlation

111 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 111 3/4/2019 1:36:38 PM

where A and B are arbitrary values.

12th Std Statistics 112

12th_Statistics_EM_Unit_4.indd 112 3/4/2019 1:36:39 PM

8 × 37560 − 544 × 552

113 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 113 3/4/2019 1:36:39 PM

12th Std Statistics 114

12th_Statistics_EM_Unit_4.indd 114 3/4/2019 1:36:41 PM

4.4.3 Limitations of Correlation

1. Outliers (extreme observations)

115 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 115 3/4/2019 1:36:41 PM

R1i = rank of i in the first set of data

Positive Range Negative Range

0.20 to 0.39:“Weak Agreement” (-0.20) to (-0.39): “Weak Disagreement”

0.40 to 0.59: “Moderate Agreement” (-0.40) to (-0.59): “Moderate Disagreement”

0.60 to 0.79: “Strong Agreement” (-0.60) to (-0.79): “Strong Disagreement”

12th Std Statistics 116

12th_Statistics_EM_Unit_4.indd 116 3/4/2019 1:36:42 PM

Rank by 1st referee Rank by 2nd referee

117 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 117 3/4/2019 1:36:43 PM

Tamil English Di = R1i – R2i D i2

Equity shares Preference share R1i R2i Di = R1i – R2i D i2

12th Std Statistics 118

12th_Statistics_EM_Unit_4.indd 118 3/4/2019 1:36:45 PM

Rank correlation coefficient is

4.5.1 Repeated ranks

119 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 119 3/4/2019 1:36:46 PM

4.6 YULE’S COEFFICIENT OF ASSOCIATION

12th_Statistics_EM_Unit_4.indd 120 3/4/2019 1:36:46 PM

Yule’s Coefficient of Association measures the strength and direction of association.

121 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 121 3/4/2019 1:36:46 PM

Notice that (αβ) = −20

12th Std Statistics 122

12th_Statistics_EM_Unit_4.indd 122 3/4/2019 1:36:46 PM

I. Choose the best answer.

7. Rank correlation coefficient is given by

(a) 0 (b) 1 (c) 0.5 (d) –1

9. Rank correlation was developed by

123 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 123 3/4/2019 1:36:49 PM

17. If cov (x, y) = 0 then its interpretation is

Y -Height posion of this li

rXY bXY bYX