0% found this document useful (0 votes)
615 views46 pages

Unit-1 Correlation and Regression

1. The document discusses correlation analysis, which examines the relationship between two variables. It introduces Karl Pearson and Charles Spearman, who made important contributions to the development of correlation coefficients. 2. There are different types of correlation - positive correlation means the variables increase or decrease together, while negative correlation means they change in opposite directions. 3. The document covers the definition and uses of correlation analysis. It aims to help students understand correlation coefficients and how to calculate and interpret them to analyze bivariate data.

Uploaded by

Shyam Sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
615 views46 pages

Unit-1 Correlation and Regression

1. The document discusses correlation analysis, which examines the relationship between two variables. It introduces Karl Pearson and Charles Spearman, who made important contributions to the development of correlation coefficients. 2. There are different types of correlation - positive correlation means the variables increase or decrease together, while negative correlation means they change in opposite directions. 3. The document covers the definition and uses of correlation analysis. It aims to help students understand correlation coefficients and how to calculate and interpret them to analyze bivariate data.

Uploaded by

Shyam Sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

CHAPTER

CORRELATION
4 ANALYSIS

Karl Pearson (1857-1936) was Charles Edward Spearman


a English Mathematician and (1863-1945) was an English
Biostatistician. He founded psychologist and ,after
the world’s first university serving 15 years in Army
statistics department at he joined to study PhD in
University College, London Experimental Psychology
Karl Pearson Charles Spearman
in 1911. The linear correlation and obtained his degree in
coefficient is also called Pearson product moment 1906. Spearman was strongly influenced by the
correlation coefficent. It was developed by Karl work of Galton and developed rank correlation
Pearson with a related idea by Francis Galton (see in 1904.He also pioneered factor analysis in
Regression analysis - for Galton’s contribution). It is statistics.
the first of the correlation measures developed and
commonly used.

“When the relationship is of a quantitative nature, the appropriate statistical tool for discovering
the existence of relation and measuring the intensity of relationship is known as correlation”
—CROXTON AND COWDEN

LEARNING OBJECTIVES

The student will be able to


™ learn the meaning, definition and the uses of correlation.
™ identify the types of correlation.
™ understand correlation coefficient for different types of measurement scales.
™ differentiate different types of correlation using scatter diagram.
™ calculate Karl Pearson’s coefficient of correlation, Spearman’s rank correlation coefficient
and Yule’s coefficient of association.
™ interpret the given data with the help of coefficient of correlation.

12th Std Statistics 106

12th_Statistics_EM_Unit_4.indd 106 3/4/2019 1:36:35 PM


Introduction

“Figure as far as you can, then add judgment”

The statistical techniques discussed so far are for only one variable. In many research
situations one has to consider two variables simultaneously to know whether these two variables
are related linearly. If so, what type of relationship that exists between them. This leads to
bivariate (two variables) data analysis namely correlation analysis. If two quantities vary in such a
way that movements ( upward or downward) in one are accompanied by the movements( upward
or downward) in the other, these quantities are said to be co-related or correlated.
The correlation concept will help to answer the following types of questions.
• Whether study time in hours is related with marks scored in the examination?
• Is it worth spending on advertisement for the promotion of sales?
• Whether a woman’s age and her systolic blood pressure are related? 
• Is age of husband and age of wife related?
• Whether price of a commodity and demand related?
• Is there any relationship between rainfall and production of rice?

4.1 DEFINITION OF CORRELATION


Correlation is a statistical measure which helps in analyzing the interdependence of two
or more variables. In this chapter the dependence between only two variables are considered.
1. A.M. Tuttle defines correlation as:
“An analysis of the co-variation of two or more variables is usually called correlation”
2. Ya-kun-chou defines correlation as:
“The attempts to determine the degree of relationship between variables”.
Correlation analysis is the process of studying the strength of the relationship between
two related variables. High correlation means that variables have a strong linear relationship
with each other while a low correlation means that the variables are hardly related. The type and
intensity of correlation is measured through the correlation analysis. The measure of correlation
is the correlation coefficient or correlation index. It is an absolute measure.
Uses of correlation

• Investigates the type and strength of the relationship that exists between the two variables.
• Progressive development in the methods of science and philosophy has been characterized by
the rich knowledge of relationship.

4.2 TYPES OF CORRELATION


1. Simple (Linear) correlation (2 variables only) : The correlation between the given two variables.
It is denoted by rxy
2. Partial correlation (more than 2 variables): The correlation between any two variables while
removing the effect of other variables. It is denoted by rxy.z …

107 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 107 3/4/2019 1:36:35 PM


3. Multiple correlation (more than 2 variables) : The correlation between a group of variables and
a variable which is not included in that group. It is denoted by Ry.(xz…)

In this chapter, we study simple correlation only, multiple correlation and partial correlation
involving three or more variables will be studied in higher classes .

4.2.1 Simple correlation or Linear correlation


Here, we are dealing with data involving two related variables and generally we assign a
symbol ‘x ’ to scores of one variable and symbol ‘y ’ to scores of the other variable. There are five
types in simple correlation. They are
1. Positive correlation (Direct correlation)

2. Negative correlation (Inverse correlation)

3. Uncorrelated

4. Perfect positive correlation

5. Perfect negative correlation


Posive or Direct Correlaon
1) Positive correlation: (Direct correlation)

The variables are said to be positively correlated if


larger values of x are associated with larger values of y and
smaller values of x are associated with smaller values of X Y X Y
y. In other words, if both the variables are varying in the
same direction then the correlation is said to be positive. Things move in the same direcon

In other words, if one variable increases, the other variable (on an average) also increases or if one
variable decreases, the other (on an average)variable also decreases.
For example,
i) Income and savings
ii) Marks in Mathematics and Marks in Statistics. (i.e.,Direct relationship pattern exists).

Y -Height posion of this li

X -Height of goods
Height of the Li increases / decreases according The starng posion of wring depends on the height of
to the Height of goods increases / decreases. the writer.

12th Std Statistics 108

12th_Statistics_EM_Unit_4.indd 108 3/4/2019 1:36:36 PM


2) Negative correlation: (Inverse correlation) Negave or Inverse relaonship

The variables are said to be negatively correlated if


smaller values of x are associated with larger values of y or
larger values x are associated with smaller values of y. That
is the variables varying in the opposite directions is said to Down Up Up Down
be negatively correlated. In other words, if one variable Things move in opposite direcon
increases the other variable decreases and vice versa.

For example,
i) Price and demand
ii) Unemployment and purchasing power
3) Uncorrelated:

The variables are said to be uncorrelated if smaller values of x are associated with smaller
or larger values of y and larger values of x are associated with larger or smaller values of y. If the
two variables do not associate linearly, they are said to be uncorrelated. Here r = 0.
Important note: Uncorrelated does not
imply independence. This means “do not interpret
as the two variables are independent instead
interpret as there is no specific linear pattern exists
but there may be non linear relationship”.
X Y X Y
4) Perfect Positive Correlation

If the values of x and y increase or decrease proportionately then they are said to have
perfect positive correlation.
5) Perfect Negative Correlation

If x increases and y decreases proportionately or if x decreases and y increases


proportionately, then they are said to have perfect negative correlation.
Correlation Analysis

The purpose of correlation analysis is to find the existence of linear relationship between
the variables. However, the method of calculating correlation coefficient depends on the types of
measurement scale, namely, ratio scale or ordinal scale or nominal scale.

109 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 109 3/4/2019 1:36:36 PM


Statistical tool selection

Methods to find correlation NOTE


1. Scatter diagram For higher order dimension of
2. Karl Pearson’s product moment correlation nominal or categorical variables in
coefficient : ‘r’ a contingency table, use chi-square
3. Spearman’s Rank correlation coefficient: ‘ ρ ’ test for independence of attributes.
(Refer Chapter 2)
4. Yule’s coefficient of Association: ‘Q’

4.3 SCATTER DIAGRAM


A scatter diagram is the simplest way of the diagrammatic representation of bivariate data. One
variable is represented along the X-axis and the other variable is represented along the Y-axis. The pair
of points are plotted on the two dimensional graph. The diagram of points so obtained is known as
scatter diagram. The direction of flow of points shows the type of correlation that exists between the
two given variables.
1) Positive correlation Y

If the plotted points in the plane form a band and they show the
rising trend from the lower left hand corner to the upper right hand corner,
X
the two variables are positively correlated. In this case 0 < r < 1

2) Negative correlation
Y
If the plotted points in the plane form a band and they show the falling
trend from the upper left hand corner to the lower right hand corner, the two
X
variables are negatively correlated. In this case -1 < r < 0

3) Uncorrelated Y

If the plotted points spread over in the plane then the two variables
are uncorrelated.
X
In this case r = 0
4) Perfect positive correlation
Y
If all the plotted points lie on a straight line from lower left hand
corner to the upper right hand corner then the two variables have perfect
positive correlation. X
In this case r = +1

12th Std Statistics 110

12th_Statistics_EM_Unit_4.indd 110 3/4/2019 1:36:37 PM


5) Perfect Negative correlation Y

If all the plotted points lie on a straight line falling from upper left
hand corner to lower right hand corner, the two variables have perfect
negative correlation. In this case r = -1
X

4.3.1 Merits and Demerits of scatter diagram


Merits
• It is a simple and non-mathematical method of studying correlation between the variables.
• It is not influenced by the extreme items
• It is the first step in investigating the relationship between two variables.
• It gives a rough idea at a glance whether there is a positive correlation, negative correlation or
uncorrelated.
Demerits
• We get an idea about the direction of correlation but we cannot establish the exact strength of
correlation between the variables.
• No mathematical formula is involved.

4.4 KARL PEARSON’S CORRELATION COEFFICIENT


When there exists some relationship between two measurable variables, we compute the
degree of relationship using the correlation coefficient.
Co-variance
Let (X,Y) be a bivariable normal random variable where V(X) and V(Y) exists. Then,
covariance between X and Y is defined as
cov(X,Y) = E[(X-E(X))(Y-E(Y))] = E(XY) – E(X)E(Y)
If (xi,y i), i=1,2, ..., n is a set of n realisations of (X,Y), then the sample covariance between
X and Y can be calculated from
1 n 1 n
cov  X ,Y    (xi  x )( yi  y )   xi yi  x y
n i 1 n i 1

4.4.1 Karl Pearson’s coefficient of correlation


When X and Y are linearly related and (X,Y) has a bivariate normal distribution, the
co-efficient of correlation between X and Y is defined as
cov( X ,Y )
r  X ,Y  
V ( X )V (Y )
This is also called as product moment correlation co-efficient which was defined by Karl Pearson.
Based on a given set of n paired observations (xi,y i), i=1,2, ... n the sample correlation
co-efficient between X and Y can be calculated from
1 n
x y  x y
n i 1 i i
r  X ,Y  
1 n 2 1 n 2

n i 1
x i  x 2

n i 1
yi  y 2

111 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 111 3/4/2019 1:36:38 PM


or, equivalently
n n n
n x i y i   x i  y i
r  X ,Y   i 1 i 1 i 1
2 2
n
 n   n 
n
n x    x i 
2
i n y    y i  2
i
i 1  i 1  i 1  i 1 

4.4.2 Properties
1. The correlation coefficient between X and Y is same as the correlation coefficient between Y and
X (i.e, rxy = ryx ).
2. The correlation coefficient is free from the units of measurements of X and Y
3. The correlation coefficient is unaffected by change of scale and origin.
x A y B
Thus, if ui  i and vi  i with c ≠ 0 and d ≠ 0 i=1,2, ..., n
c d
n n n
n ui vi   ui  vi
r i 1 i 1 i 1
2 2
n
 n  n
 n 
n  ui    ui 
2
n  vi    vi  2

i 1  i 1  i 1  i 1 

where A and B are arbitrary values.


Remark 1: If the widths between the values of the variabls are not equal then take c = 1 and d = 1.
Interpretation
The correlation coefficient lies between -1 and +1. i.e. -1 ≤ r ≤ 1
• A positive value of ‘r’ indicates positive correlation.
• A negative value of ‘r’ indicates negative correlation
• If r = +1, then the correlation is perfect positive
• If r = –1, then the correlation is perfect negative.
• If r = 0, then the variables are uncorrelated.
• If r ≥ 0.7 then the correlation will be of higher degree. In interpretation we use the
adjective ‘highly’
• If X and Y are independent, then rxy = 0. However the converse need not be true.

Example 4.1
The following data gives the heights(in inches) of father and his eldest son. Compute the
correlation coefficient between the heights of fathers and sons using Karl Pearson’s method.

Height of father 65 66 67 67 68 69 70 72
Height of son 67 68 65 68 72 72 69 71

12th Std Statistics 112

12th_Statistics_EM_Unit_4.indd 112 3/4/2019 1:36:39 PM


Solution:
Let x denote height of father and y denote height of son. The data is on the ratio scale.
We use Karl Pearson’s method.
n n n
n x i y i   x i  y i
r i 1 i 1 i 1
2 2
 n 
n n
 n 
n  xi    xi 
2
n  yi    yi 
2

i 1  i 1  i 1  i 1 
Calculation
xi yi x i2 y i2 x iy i
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560

8 × 37560 − 544 × 552


r= = 0.603
8 × 37028 − ( 544 ) 8 × 38132 − ( 552 )
2 2

Heights of father and son are positively correlated. It means that on the average , if fathers are
tall then sons will probably tall and if fathers are short, probably sons may be short.
Short-cut method
Let A = 68 , B = 69, c = 1 and d = 1
xi yi ui = (xi – A)/c v i = (y i – B)/d ui 2 v i2 u iv i
= xi – 68 = y i – 69
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
Total 0 0 36 44 24
n n n
n ui vi   ui  vi
r i 1 i 1 i 1
2 2
n
 n  n
 n 
n  ui    ui 
2
n  vi    vi 2

i 1  i 1  i 1  i 1 

113 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 113 3/4/2019 1:36:39 PM


8 × 24 − 0 × 0
r=
8 × 36 − ( 0 ) 8 × 44 − ( 0 )
2 2

8 × 24
r=
8 × 36 8 × 44
= 0.603
Note: The correlation coefficient computed by using direct method and short-cut method is the same.

Example 4.2
The following are the marks scored by 7 students in two tests in a subject. Calculate
coefficient of correlation from the following data and interpret.

Marks in test-1 12 9 8 10 11 13 7
Marks in test-2 14 8 6 9 11 12 3

Solution:
Let x denote marks in test-1 and y denote marks in test-2.
xi yi xi2 yi2 xiyi
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
1 12 169 144 156
7 3 49 9 21
Total 70 63 728 651 676
n n n
n xi yi   xi  yi
r i 1 i 1 i 1
2 2
n
 n  n
 n 
n  xi    xi 
2
n  yi    yi  2

i 1  i 1  i 1  i 1 
n n n

 xi  70
i 1
 xi 2  728
i 1
x y
i 1
i i  676
n n

 yi  63
i 1
y
i 1
i
2
 651 n  7

7  676  70  63
r
7  728  702   7  651  632 
4732  4410

5096  4900  7  651  3969
322 322 322
    0.95
196  588 14  24.25 339.5

12th Std Statistics 114

12th_Statistics_EM_Unit_4.indd 114 3/4/2019 1:36:41 PM


There is a high positive correlation between test-1 and test-2. That is those who perform
well in test-1 will also perform well in test-2 and those who perform poor in test-1 will perform
poor in test- 2.
The students can also verify the results by using shortcut method.

4.4.3 Limitations of Correlation


Although correlation is a powerful tool, there
are some limitations in using it:

1. Outliers (extreme observations)


strongly influence the correlation
coefficient. If we see outliers in our
data, we should be careful about the
conclusions we draw from the value
of r. The outliers may be dropped before the calculation for meaningful conclusion.

2. Correlation does not imply causal relationship. That a change in one variable causes a
change in another.

NOTE
1. Uncorrelated : Uncorrelated (r = 0) implies no ‘linear relationship’. But there may exist non-
linear relationship (curvilinear relationship).

Example: Age and health care are related. Children and elderly people need much more health
care than middle aged persons as seen from the following graph.
Health care

Child Old

0 Age
Adult

However, if we compute the linear correlation r for such data, it may be zero implying
age and health care are uncorrelated, but non-linear correlation is present.
2. Spurious Correlation : The word ‘spurious’ from Latin means ‘false’ or ‘illegitimate’. Spurious
correlation means an association extracted from correlation coefficient that may not exist in reality.

115 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 115 3/4/2019 1:36:41 PM


4.5 SPEARMAN’S RANK CORRELATION COEFFICIENT
If the data are in ordinal scale then Spearman’s rank correlation coefficient is used. It is
denoted by the Greek letter ρ (rho).
Spearman’s correlation can be calculated for the subjectivity data also, like competition
scores. The data can be ranked from low to high or high to low by assigning ranks.
Spearman’s rank correlation coefficient is given by the formula
n
6 Di2
  1 i 1


n n2  1 
where Di = R1i – R2i

R1i = rank of i in the first set of data


R2i = rank of i in the second set of data and
n = number of pairs of observations

Interpretation
Spearman’s rank correlation coefficient is a statistical measure of the strength of a
monotonic (increasing/decreasing) relationship between paired data. Its interpretation is similar
to that of Pearson’s. That is, the closer to the ±1 means the stronger the monotonic relationship.

Positive Range Negative Range

0.01 to 0.19: “Very Weak Agreement” (-0.01) to (-0.19): “Very Weak Disagreement”

0.20 to 0.39:“Weak Agreement” (-0.20) to (-0.39): “Weak Disagreement”

0.40 to 0.59: “Moderate Agreement” (-0.40) to (-0.59): “Moderate Disagreement”

0.60 to 0.79: “Strong Agreement” (-0.60) to (-0.79): “Strong Disagreement”

0.80 to 1.0: “Very Strong Agreement” (-0.80) to (-1.0): “Very Strong Disagreement”

Example 4.3
Two referees in a flower beauty competition rank the 10 types of flowers as follows:

Referee A 1 6 5 10 3 2 4 9 7 8
Referee B 6 4 9 8 1 2 3 10 5 7

Use the rank correlation coefficient and find out what degree of agreement is between the
referees.

12th Std Statistics 116

12th_Statistics_EM_Unit_4.indd 116 3/4/2019 1:36:42 PM


Solution:

Rank by 1st referee Rank by 2nd referee


Di= R1i – R2i Di2
R1i R2i
1 6 -5 25
6 4 2 4
5 9 -4 16
10 8 2 4
3 1 2 4
2 2 0 0
4 3 1 1
9 10 -1 1
7 5 2 4
8 7 1 1
n

D 2
i  60
i 1

n
Here n = 10 and D 2
i  60
i 1

n
6 Di2
i 1
  1

n n 1 2

6  60 360 360
1   1  1  0.636
 2
10 10  1 
10  99  990

Interpretation: Degree of agreement between the referees ‘A’ and ‘B’ is 0.636 and they have “strong
agreement” in evaluating the competitors.

Example 4.4
Calculate the Spearman’s rank correlation coefficient for the following data.

Candidates 1 2 3 4 5

Marks in Tamil 75 40 52 65 60

Marks in English 25 42 35 29 33

117 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 117 3/4/2019 1:36:43 PM


Solution:

Tamil English Di = R1i – R2i D i2


Marks Rank (R1i) Marks Rank (R2i)
75 1 25 5 -4 16
40 5 42 1 4 16
52 4 35 2 2 4
65 2 20 4 -2 4
60 3 33 3 0 0
40
n

D i
2
 40 and n = 5
i 1

n
6 Di2
i 1
  1

n n 1 2

6  40 240
 1  1  1
 2
5 5 1 
5  24 

Interpretation: This perfect negative rank correlation (-1) indicates that scorings in the subjects,
totally disagree. Student who is best in Tamil is weakest in English subject and vice-versa.

Example 4.5
Quotations of index numbers of equity share prices of a certain joint stock company and
the prices of preference shares are given below.
Years 2013 2014 2015 2016 2017 2008 2009
Equity shares 97.5 99.4 98.6 96.2 95.1 98.4 97.1
Reference shares 75.1 75.9 77.1 78.2 79 74.6 76.2
Using the method of rank correlation determine the relationship between equity shares
and preference shares prices.

Solution:

Equity shares Preference share R1i R2i Di = R1i – R2i D i2


97.5 75.1 4 6 -2 4
99.4 75.9 1 5 -4 16
98.6 77.1 2 3 -1 1
96.2 78.2 6 2 4 16
95.1 79.0 7 1 6 36
98.4 74.6 3 7 -4 16
97.1 76.2 5 4 1 1
n

D
i 1
2
i  90

12th Std Statistics 118

12th_Statistics_EM_Unit_4.indd 118 3/4/2019 1:36:45 PM


n

D 2
i  90 and n = 7.
i 1

Rank correlation coefficient is

n
6 Di2
i 1
  1

n n 1 2

6  90 540 540
 1  1   1   1  1.66071  0.6071

7 72  1  7  48 336

Interpretation: There is a negative correlation between equity shares and preference share prices.
There is a strong disagreement between equity shares and preference share prices.

4.5.1 Repeated ranks


When two or more items have equal values (i.e., a tie) it is difficult to give ranks to them.
In such cases the items are given the average of the ranks they would have received. For example,
8+9
if two individuals are placed in the 8th place, they are given the rank = 8.5 each, which is
2
common rank to be assigned and the next will be 10; and if three ranked equal at the 8th place,
8 + 9 + 10
they are given the rank = 9 which is the common rank to be assigned to each; and the
3
next rank will be 11.
In this case, a different formula is used when there is more than one item having the same
value.
 1 1 
   
  Di  12 m1  m1  12 m2  m2  ... 
2 3 3

  1 6 
 
n n2  1  
 
where mi is the number of repetitions of ith rank

Example 4.6
Compute the rank correlation coefficient for the following data of the marks obtained by
8 students in the Commerce and Mathematics.

Marks in Commerce 15 20 28 12 40 60 20 80
Marks in Mathematics 40 30 50 30 20 10 30 60

119 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 119 3/4/2019 1:36:46 PM


Solution:
Marks in Marks in
Rank (R1i) Rank (R2i) Di = R1i – R2i D i2
Commerce (X) Mathematics (Y)
15 2 40 6 -4 16
20 3.5 30 4 -0.5 0.25
28 5 50 7 -2 4
12 1 30 4 -3 9
40 6 20 2 4 16
60 7 10 1 6 36
20 3.5 30 4 -0.5 0.25
80 8 60 8 0 0
Total ∑D 2
= 81.5

 1 1 
 
  Di  12 m1  m1  12 m2  m2  ... 
2 3 3
 
  1 6 
 n n2  1   
 
Repetitions of ranks
In Commerce (X), 20 is repeated two times corresponding to ranks 3 and 4. Therefore, 3.5
is assigned for rank 2 and 3 with m1=2.
In Mathematics (Y), 30 is repeated three times corresponding to ranks 3, 4 and 5. Therefore,
4 is assigned for ranks 3,4 and 5 with m2=3.
Therefore,

 1 3
( 1 3
 81.5 + 12 2 − 2 + 12 3 − 3) ( ) 
ρ = 1− 6  
 8 82 − 1 ( ) 
 

= 1− 6
[81.5 + 0.5 + 2] = 1−
504
=0
504 504
=
Interpretation: Marks in Commerce and Mathematics are uncorrelated

4.6 YULE’S COEFFICIENT OF ASSOCIATION


This measure is used to know the existence of relationship between the
two attributes A and B (binary complementary variables). Examples of attributes
are drinking, smoking, blindness, honesty, etc.
Udny Yule (1871 – 1951), was a British statistician. He was educated at
Winchester College and at University College London. After a year dong research
in experimental physics, he returned to University College in 1893 to work as a
Udny yule
demonstrator for Karl Pearson. Pearson was beginning to work in statistics and
Yule followed him into this new field. Yule was a prolific writer, and was active in Royal Statistical
Society and received its Guy Medal in Gold in 1911, and served as its President in 1924–26.The
concept of Association is due to him.
12th Std Statistics 120

12th_Statistics_EM_Unit_4.indd 120 3/4/2019 1:36:46 PM


Coefficient of Association

Yule’s Coefficient of Association  measures the strength and direction of association.


“Association” means that the attributes have some degree of agreement.
2×2 Contingency Table
Attribute A Attribute B Total
Yes No
B β
Yes
(AB) (Aβ) (A)
A
No
(αB) (αβ) (α)
α
Total (B) (β) N

Yule’s coefficient: Q 
 AB     A  B 
 AB     A  B 
Note 1: The usage of the symbol α is not to be confused with level of significance.
Note 2: (AB): Number with attributes AB etc.
This coefficient ranges from –1 to +1. The values between –1 and 0 indicate inverse
relationship (association) between the attributes. The values between 0 and +1 indicate direct
relationship (association) between the attributes.

Example 4.7
Out of 1800 candidates appeared for a competitive examination 625 were successful; 300 had
attended a coaching class and of these 180 came out successful. Test for the association of attributes
attending the coaching class and success in the examination.

Solution:

N = 1800
A: Success in examination α: No success in examination
B: Attended the coaching class β: Not attended the coaching class
(A) = 625, (B) = 300, (AB) = 180

B β Total
A 180 445 625
α 120 1055 1175
Total 300 1500 N = 1800

121 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 121 3/4/2019 1:36:46 PM


( AB )(αβ ) − ( Aβ )(α B )
Yule’s coefficient: Q =
( AB )(αβ ) + ( Aβ )(α B )
180 ×1055 − 445 ×120
=
180 ×1055 + 445 ×120
189900 − 53400
=
189900 + 53400
136500
=
243300
= 0.561 > 0
Interpretation: There is a positive association between success in examination and attending
coaching classes. Coaching class is useful for success in examination.

Remark: Consistency in the data using contingency table may be found as under.
Construct a 2 × 2 contingency table for the given information. If at least one of the cell
frequencies is negative then there is inconsistency in the given data.

Example 4.8
Verify whether the given data: N = 100, (A) = 75, (B) = 60 and (AB) = 15 is consistent.

Solution:
The given information is presented in the following contingency table.
B β Total
A 15 60 75
α 45 -20 25
Total 60 40 N = 100

Notice that (αβ) = −20


Interpretation: Since one of the cell frequencies is negative, the given data is “Inconsistent”.

POINTS TO REMEMBER

™ Correlation study is about finding the linear relationship between two variables.
Correlation is not causation. Sometimes the correlation may be spurious.
™ Correlation coefficient lies between –1 and +1.
™ Pearson’s correlation coefficient provides the type of relationship and intensity of
relationship, for the data in ratio scale measure.
™ Spearman’s correlation measures the relationship between the two ordinal variables.
™ Yule’s coefficient of Association measures the association between two dichotomous
attributes.

12th Std Statistics 122

12th_Statistics_EM_Unit_4.indd 122 3/4/2019 1:36:46 PM


EXERCISE 4

I. Choose the best answer.


1. The statistical device which helps in analyzing the co-variation of two or
more variables is
(a) variance (b) probability
(c) correlation coefficient (d) coefficient of skewness
2. “The attempts to determine the degree of relationship between variables is correlation” is the
definition given by
(a) A.M. Tuttle (b) Ya-Kun-Chou
(c) A.L. Bowley (d) Croxton and Cowden
3. If the two variables do not have linear relationship between them then they are said to have
(a) positive correlation (b) negative correlation
(c) uncorrelated (d) spurious correlation
4. If all the plotted points lie on a straight line falling from upper left hand corner to lower right
hand corner then it is called
(a) perfect positive correlation (b) perfect negative correlation
(c) positive correlation (d) negative correlation
5. If r = +1, then the correlation is called
(a) perfect positive correlation (b) perfect negative correlation
(c) positive correlation (d) negative correlation
6. The correlation coefficient lies in the interval
(a) -1 ≤ r ≤ 0 (b) –1 < r < 1 (c) 0 ≤ r ≤ 1 (d) –1 ≤ r ≤ 1

7. Rank correlation coefficient is given by


n
n n n
6 D 2
6 D 2
6 D 2 6 Di3
(a) i (b) i (c) i (d)
1 i 1
1 i 1
1 i 1 1 i 1

n n
3
n n
3
n n
3 
n n 1 2

8. If ∑D 2
= 0, rank correlation is

(a) 0 (b) 1 (c) 0.5 (d) –1

9. Rank correlation was developed by


(a) Pearson (b) Spearman (c) Yule (d) Fisher
10. Product moment coefficient of correlation is
(a) σx σy (b) r = σ x σ y (c) r = cov ( x, y ) (d) cov ( x, y )
r= r=
cov ( x, y ) σx σy σ xy

123 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 123 3/4/2019 1:36:49 PM


11. The purpose of the study of _________ is to identify the factors of influence and try to control
them for better performance.
(a) mean (b) correlation (c) standard deviation (d) skewness
12. The height and weight of a group of persons will have _______ correlation.
(a) positive (b) negative
(c) zero (d) both positive and negative
13. ______ correlation studies the association of two variables with ordinal scale.
(a) A.M. Tuttle rank (b) Croxton and Cowdon rank
(c) Karl Pearson’s rank (d) Spearman’s rank.
14. ______ presents a graphic description of quantitative relation between two series of facts.
(a) scatter diagram (b) bar diagram (c) pareto diagram (d) pie diagram
15. ______ measures the degree of relationship between two variables.
(a) standard deviation (b) correlation coefficient
(c) moment (d) median
16. The correlation coefficient of x and y is symmetric. Hence
(a) rxy = r yx (b) rxy > r yx (c) rxy < r yx (d) rxy ≠ r yx

17. If cov (x, y) = 0 then its interpretation is


(a) x and y are positively correlated (b) x and y are negatively correlated
(c) x and y are uncorrelated (d) x and y are independent
18. Rank correlation is useful to study data in ______ scale.
(a) ratio (b) ordinal (c) nominal (d) ratio and nominal
19. If r = 0 then cov(x, y) is
(a) 0 (b) +1 (c) -1 (d) α

20. If cov(x, y) = σx, σy then


(a) r = 0 (b) r = –1 (c) r = +1 (d) r = α

II. Give very short answer to the following questions.


21. What is correlation?
22. Write the definition of correlation by A.M. Tuttle.
23. What are the different types of correlation?
24. What are the types of simple correlation?
25. What do you mean by uncorrelated?
26. What you understand by spurious correlation?

12th Std Statistics 124

12th_Statistics_EM_Unit_4.indd 124 3/4/2019 1:36:49 PM


27. What is scatter diagram?
28. Define co-variance.
29. Define rank correlation.
30. If ∑D 2
= 0 what is your conclusion regarding Spearman’s rank correlation coefficient?
31. Give an example for (i) positive correlation
(ii) negative correlation (iii) no correlation
32. What is the value of ‘r’ when two variables are uncorrelated?
33. When the correlation coefficient is +1, state your interpretation.

III. Give short answer to the following questions.


34. Write any three uses of correlation.
35. Define Karl Pearson’s coefficient of correlation.
36. How do you interpret the coefficient of correlation which lies between 0 and +1?
37. Write down any 3 properties of correlation?
38. If rank correlation coefficient r = 0.8, ∑D 2
= 33 then find n?
39. Write any three merits of scatter diagram.
40. Given that cov(x, y) = 18.6, variance of x = 20.2, variance of y = 23.7. Find r.
41. Test the consistency of the following data with the symbols having their usual meaning.
N = 1000, (A) = 600, (B) = 500, (AB) = 50.

IV. Give detailed answer to the following questions.


42. Explain different types of correlation.
43. Explain scatter diagram.
44. Calculate the Karl Pearson’s coefficient of correlation for the following data and interpret.
x 9 8 7 6 5 4 3 2 1
y 15 16 14 13 11 12 10 8 9

45. Find the Karl Pearson’s coefficient of correlation for the following data.
Wages 100 101 102 102 100 99 97 98 96 95
Cost of living 98 99 99 97 95 92 95 94 90 91
How are the wages and cost of living correlated?

46. Calculate the Karl Pearson’s correlation coefficient between the marks (out of 10) in statistics
and mathematics of 6 students.
Student 1 2 3 4 5 6
Statistics 7 4 6 9 3 8
Mathematics 8 5 4 8 3 6

125 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 125 3/4/2019 1:36:49 PM


47. In a marketing survey the prices of tea and prices of coffee in a town based on quality was
found as shown below. Find the rank correlation between prices of tea and prices of coffee.
Price of tea 88 90 95 70 60 75 50
Price of coffee 120 134 150 115 110 140 100

48. Calculate the Spearman’s rank correlation coefficient between price and supply from the
following data.
Price 4 6 8 10 12 14 16 18
Supply 10 15 20 25 30 35 40 45
49. A random sample of 5 college students is selected and their marks in Tamil and English are
found to be:
Tamil 85 60 73 40 90
English 93 75 65 50 80
Calculate Spearman’s rank correlation coefficient.

50. Calculate Spearman’s coefficient of rank correlation for the following data.
x 53 98 95 81 75 71 59 55
y 47 25 32 37 30 40 39 45

51. Calculate the coefficient of correlation for the following data using ranks.

Mark in Tamil 29 24 25 27 30 31

Mark in English 29 19 30 33 37 36

52. From the following data calculate the rank correlation coefficient.
x 49 34 41 10 17 17 66 25 17 58
y 14 14 25 7 16 5 21 10 7 20

Yule’s coefficient
53. Can vaccination be regarded as a preventive measure of Hepatitis B from the data given below.
Of 1500 person in a locality, 400 were attacked by Hepatitis B. 750 has been vaccinated. Among
them only 75 were attacked.

12th Std Statistics 126

12th_Statistics_EM_Unit_4.indd 126 3/4/2019 1:36:49 PM


ANSWERS
I 1. (c) 2. (b) 3. (c) 4. (b) 5. (a)
6. (d) 7. (b) 8. (b) 9. (b) 10. (b)
11. (b) 12. (a) 13. (d) 14. (a) 15. (b)
16. (a) 17. (c) 18. (b) 19. (a) 20. (c)
II 30. r = 1

III 38. n = 10
40. r = 0.85
41. (αβ ) = −50 , The given data is inconsistent

IV 44. r = 0.95 it is highly positively correlated

45. r = 0.847 wages and cost of living are highly positively correlated.

46. r = 0.8081. Statistics and mathematics marks are highly positively correlated.

47. ρ = 0.8929 price of tea and coffee are highly positively correlated.

48. ρ = 1 (perfect positive correlation)

49. ρ = 0.8

50. ρ = -0.905 x and y are highly negatively

51. ρ = -0.78 marks in Tamil and English are negatively correlated.

52. ρ = +0.733

53. There is a negative association between attacked and vaccinated.

54. There is a positive association between not attacked and not vaccinated.

55. Hence vaccination can be regarded as a preventive measure of Hepatitis B.

127 Correlation Analysis

12th_Statistics_EM_Unit_4.indd 127 3/4/2019 1:36:50 PM


CHAPTER

REGRESSION
5 ANALYSIS

Francis Galton (1822-1911) was born in a wealthy family. The youngest of nine
children, he appeared as an intelligent child. Galton’s progress in education was
not smooth. He dabbled in medicine and then studied Mathematics at Cambridge.
In fact he subsequently freely acknowledged his weakness in formal Mathematics,
but this weakness was compensated by an exceptional ability to understand the
meaning of data. Many statistical terms, which are in current usage were coined
by Galton. For example, correlation is due to him, as is regression, and he was the Francis Galton
originator of terms and concepts such as quartile, decile and percentile, and of the use of median as
the midpoint of a distribution.
The concept of regression comes from genetics and was popularized by Sir Francis Galton
during the late 19th century with the publication of regression towards mediocrity in hereditary stature.
Galton observed that extreme characteristics (e.g., height) in parents are not passed on completely to
their offspring. An examination of publications of Sir Francis Galton and Karl Pearson revealed that
Galton's work on inherited characteristics of sweet peas led to the initial conceptualization of linear
regression. Subsequent efforts by Galton and Pearson brought many techniques of multiple regression
and the product-moment correlation coefficient.

LEARNING OBJECTIVES

The student will be able to


™ know the concept of regression, its types and their uses.
™ fit best line of regression by applying the method of least squares.
™ calculate the regression coefficient and interpret the same.
™ know the uses of regression coefficients.
™ distinguish between correlation analysis and regression analysis.

Introduction
The correlation coefficient is an useful statistical tool for describing the type ( positive or
negative or uncorrelated ) and intensity of linear relationship (such as moderately or highly) between
two variables. But it fails to give a mathematical functional relationship for prediction purposes.
Regression analysis is a vital statistical method for obtaining functional relationship between a
dependent variable and one or more independent variables. More specifically, regression analysis
helps one to understand how the typical value of the dependent variable (or ‘response variable’)
changes when any one of the independent variables (regressor(s) or predictor(s)) is varied, while
the other independent variables are held fixed. It helps to determine the impact of changes in
the value(s) of the the independent variable(s) upon changes in the value of the dependent variable.
Regression analysis is widely used for prediction.

129 Regression Analysis

12th_Statistics_EM_Unit_5.indd 129 2/27/2019 1:42:56 PM


5.1 DEFINITION
Regression analysis is a statistical method of determining the mathematical functional
relationship connecting independent variable(s) and a dependent variable.

Types of ‘Regression’
Based on the kind of relationship between the dependent variable and the set of independent
variable(s), there arises two broad categories of regression viz., linear regression and non-linear regression.
If the relationship is linear and there is only one independent variable, then the regression
is called as simple linear regression. On the other hand, if the relationship is linear and the
number of independent variables is two or more, then the regression is called as multiple linear
regression. If the relationship between the dependent variable and the independent variable(s) is
not linear, then the regression is called as non-linear regression.

5.1.1 Simple Linear Regression


It is one of the most widely known modeling techniques.  In this technique, the dependent
variable is continuous, independent variable(s) can be continuous or discrete  and nature of
relationship is linear. This relationship can be expressed using a straight line equation (linear
regression) that best approximates all the individual data points.
Simple linear regression establishes a relationship between a dependent variable (Y) and
one independent variable (X) using a best fitted straight line (also known as regression line).

NOTE
Error There are many reasons for the
presence of the error term in the
linear regression model. It is also
Inependent Dependent
known as measurement error. In
variable X variable Y
some situations, it indicates the
Regression line,Y=a+bX+e presence of several variables other
than the present set of regressors.

The general form of the simple linear regression equation is Y = a + bX + e, where ‘X’ is
independent variable, ‘Y’ is dependent variable, a’ is intercept, ‘b’ is slope of the line and ‘ e’ is
error term. This equation can be used to estimate the value of response variable (Y) based on the
given values of the predictor variable (X) within its domain.

5.1.2 Multiple Linear Regression


In the case of several independent variables, regression analysis also allows us to compare
the effects of independent variables measured on different scales, such as the effect of price
changes and the number of promotional activities.
Multiple linear regression uses two or more independent variables to estimate the value(s)
of the response variable (Y).

12th Std Statistics 130

12th_Statistics_EM_Unit_5.indd 130 2/27/2019 1:42:56 PM


The general form of the multiple linear regression equation is
Y = a + b1X1 + b2X2 + b3X3 + ... + btXt + e X

Here, Y represents the dependent (response) variable, Xi


represents the ith independent variable (regressor), a and bi are the X Y
regression coefficients and e is the error term.
Suppose that price of a product (Y) depends mainly upon three X
promotional activities such as discount (X1), instalment scheme (X2)
and free installation (X3). If the price of the product has linear relationship with each promotional
activity, then the relationship among Y and X1, X2 and X3 may be expressed using the above
general form as
Y=
a + b1 X 1 + b2 X 2 + b3 X 3 + e .
These benefits help market researchers / data analysts / data scientists to eliminate and evaluate the
best set of variables to be used for building regression models for predictive purposes.

5.1.3 Non-Linear Regression


If the regression is not linear and is in some other form, then the regression is said to be
non-linear regression. Some of the non-linear relationships are displayed below.

5.2 USES OF REGRESSION


Benefits of using regression analysis are as follows:
NOTE
1. It indicates the significant mathematical
Multiple linear regression and
relationship between independent variable (X) and
Curvilinear relationships (non-
dependent variable( Y ). (i.e) Model construction
linear regression) are out of the
2. It indicates the strength of impact (b) of
syllabus. Basic information about
independent variable on a dependent variable.
them are given here, for enhancing
3. It is used to estimate (interpolate) the value of the knowledge.
the response variable for different values of the
independent variable from its range in the given data. It means that extrapolation of the
dependent variable is not generally permissible.

131 Regression Analysis

12th_Statistics_EM_Unit_5.indd 131 2/27/2019 1:42:56 PM


4. In the case of several independent variables, regression analysis is a way of mathematically
sorting out which of those variables indeed have an impact (It answers the questions: Which
independent variable matters most? Which can we ignore? How do those independent
variables interact with each other?

5.3 WHY ARE THERE TWO REGRESSION LINES?


There may exist two regression lines in certain circumstances. When the variables X and Y
are interchangeable with related to causal effects, one can consider X as independent variable and
Y as dependent variable (or) Y as independent variable and X as dependent variable. As the result,
we have (1) the regression line of Y on X and (2) the regression line of X on Y.
Both are valid regression lines. But we must judicially select the one regression equation
which is suitable to the given environment.
Note: If, X only causes Y, then there is only one regression line, of Y on X.

5.3.1 Simple Linear Regression


In the general form of the simple linear regression equation of Y on X
Y= a + bX + e
the constants ‘a’ and ‘b’ are generally called as the regression coefficients.
The coefficient ‘b’ represents the rate of change in the value of the mean of Y due to
every unit change in the value of X. When the range of X includes ‘0’, then the intercept ‘a’ is
E(Y|X = 0). If the range of X does not include ‘0’, then ‘a’ does not have practical interpretation.
If (xi,y i), i = 1, 2, ..., n is a set of n-pairs of observations made on (X, Y), then fitting of
^
the above regression equation means finding the estimates ‘a^’ and ‘b’ for ‘a’ and ‘b’ respectively.
These estimates are determined based on the following general assumptions:
i) the relationship between Y and X is linear (approximately).
ii) the error term ‘e’ is a random variable with mean zero.
iii) the error term ‘e’ has constant variance.
There are other assumptions on ‘e’, which are not required at this level of study.

Before going for further study, the following points are to be kept in mind.
• Both the independent and dependent variables must be measured at the interval scale.
• There must be linear relationship between independent and dependent variables.
• Linear Regression is very sensitive to Outliers (extreme observations). It can affect the
regression line extremely and eventually the estimated values of Y too.

Meaning of line of “best fit”


Based on the assumption (ii), the response variable Y is also a random variable with mean
E(Y|X=x) = a + bx

12th Std Statistics 132

12th_Statistics_EM_Unit_5.indd 132 2/27/2019 1:42:56 PM


In regression analysis, the main objective is finding the line of best fit, which provides the
fitted equation of Y on X.
The line of ‘best fit‘ is the line (straight line equation) which minimizes the error in the
estimation of the dependent variable Y, for any specified value of the independent variable X
from its range.
The regression equation E(Y|X=x) = a +bx represents a family of straight lines for different
values of the coefficients ‘a’ and ‘b’. The problem is to determine the estimates of ‘a’ and ‘b’ by
minimizing the error in the estimation of Y so that the line is a best fit. This necessitates to find
the suitable values of the estimates of ‘a’ and ‘b’.

5.4 METHOD OF LEAST SQUARES


In most of the cases, the data points do not fall on a straight line (not highly correlated),
thus leading to a possibility of depicting the relationship between the two variables using several
different lines. Selection of each line may lead to a situation where the line will be closer to some
points and farther from other points. We cannot decide which line can provide best fit to the data.
Method of least squares can be used to determine the line of best fit in such cases. It
determines the line of best fit for given observed data by minimizing the sum of the squares of
the vertical deviations from each data point to the line.

5.4.1 Method of Least Squares


To obtain the estimates of the coefficients ‘a’ and ‘b’, the least squares method minimizes
the sum of squares of residuals. The residual for the ith data point ei is defined as the difference
between the observed value of the response variable, y i, and the estimate of the response variable,
ŷ i, and is identified as the error associated with the data. i.e., ei = y i–ŷ i , i =1 ,2, ..., n.

The method of least squares helps us to find the values of unknowns ‘a’ and ‘b’ in such a
way that the following two conditions are satisfied:
n
• Sum of the residuals is zero. That is ∑ ( y − yˆ ) =
i =1
i i 0.
n
Sum of the squares of the residuals E (= ∑ ( y − yˆ )
2
• a, b) i i is the least.
i =1

5.4.2 Fitting of Simple Linear Regression Equation


The method of least squares can be applied to determine the estimates of ‘a’ and ‘b’ in the
simple linear regression equation using the given data (x1,y1), (x2,y2), ..., (xn,y n) by minimizing
n
E (= ∑ ( y − yˆ )
2
a, b) i i
i =1 Simple Linear Regression Model

n
i.e., E (a,= ∑ ( y − a − bx )
2 Y
b) i i . yi = a+b xi+Error
i =1

Here, yˆi = a + bxi is the expected (estimated) value of the } Error


Regression line

response variable for given xi. yt i = a+b xi


X
Observed Value

133 Regression Analysis

12th_Statistics_EM_Unit_5.indd 133 2/27/2019 1:42:57 PM


It is obvious that if the expected value (y^i) is close to the observed value (y i), the residual will
be small. Since the magnitude of the residual is determined by the values of ‘a’ and ‘b’, estimates
of these coefficients are obtained by minimizing the sum of the squared residuals, E(a,b).
Differentiation of E(a,b) with respect to ‘a’ and ‘b’ and equating them to zero constitute a
set of two equations as described below:

∂E (a, b) n
=−2∑ ( yi − a − bxi ) = 0
∂a i =1

∂E (a, b) n
=−2∑ xi ( yi − a − bxi ) = 0
∂b i =1

These give
n n
na + b∑ xi =
∑ yi
=i 1 =i 1

n n n
a ∑ xi + b∑ xi2 =
∑ xi yi
=i 1 =i 1 =i 1

These equations are popularly known as normal equations. Solving these equations for ‘a’
and ‘b’ yield the estimates â and b̂ .
â= y − bx
ˆ

and
1 n
∑ xi yi − x y
n i =1
b=
ˆ
1 n 2

n i =1
xi − x 2

It may be seen that in the estimate of ‘b’, the numerator and denominator are respectively
the sample covariance between X and Y, and the sample variance of X. Hence, the estimate of ‘b’
may be expressed as
Cov( X , Y )
bˆ =
V (X )

Further, it may be noted that for notational convenience the denominator of b̂ above is
mentioned as variance of X. But, the definition of sample variance remains valid as defined in
1 n
Chapter I, that is,  
n  1 i 1

xi  x 2 .

From Chapter 4, the above estimate can be expressed using, rXY , Pearson’s coefficient of the
simple correlation between X and Y, as
SD(Y )
bˆ = rXY .
SD( X )

12th Std Statistics 134

12th_Statistics_EM_Unit_5.indd 134 2/27/2019 1:42:57 PM


Important Considerations in the Use of Regression Equation:
1. Regression equation exhibits only the relationship between the respective two variables.
Cause and effect study shall not be carried out using regression analysis.
2. The regression equation is fitted to the given values of the independent variable. Hence, the
fitted equation can be used for prediction purpose corresponding to the values of the regressor
within its range. Interpolation of values of the response variable may be done corresponding
to the values of the regressor from its range only. The results obtained from extrapolation
work could not be interpreted.

Example 5.1
n n
Construct the simple linear regression equation of Y on X if n = 7,  xi 113 , x 2
i 1983 ,
n n i 1 i 1
 y 182
i 1
i and  x y  3186 .
i i
i 1

Solution:
The simple linear regression equation of Y on X to be fitted for given data is of the form
^
Y = a + bx (1)
The values of ‘a’ and ‘b’ have to be estimated from the sample data solving the following
normal equations.
n n
na  b  xi   yi (2)
i 1 i 1
n n n
a  xi  b xi2   xi yi
i 1 i 1 i 1
(3)
Substituting the given sample information in (2) and (3), the above equations can be
expressed as
7 a + 113 b = 182 (4)
113 a + 1983 b = 3186 (5)
(4) י113 ⇒ 791 a + 12769 b = 20566
(5) י7 ⇒ 791 a + 13881 b = 22302
(−) (−) (−)

−1112 b = −1736
1736
⇒™b = = 1.56
1112
b = 1.56
Substituting this in (4) it follows that,
7 a + 113 × 1.56 = 182
7 a + 176.28 = 182
7 a = 182 – 176.28
= 5.72
Hence, a = 0.82

135 Regression Analysis

12th_Statistics_EM_Unit_5.indd 135 2/27/2019 1:42:59 PM


Example 5.2
Number of man-hours and the corresponding productivity (in units) are furnished below.
Fit a simple linear regression equation Yˆ= a + bx applying the method of least squares.

Man-hours 3.6 4.8 7.2 6.9 10.7 6.1 7.9 9.5 5.4
Productivity (in units) 9.3 10.2 11.5 12 18.6 13.2 10.8 22.7 12.7

Solution:
The simple linear regression equation to be fitted for the given data is
Yˆ= a + bx
Here, the estimates of a and b can be calculated using their least squares estimates
â= y − bx
ˆ
1 n n
ˆ1 x
=aˆ
=
∑ i n∑
y
n i 1=
− b i
i.e., i 1

1 n
∑ xi yi − ( x × y )
n i =1
b=
ˆ
1 n 2
∑ xi − x 2
n i =1
n
 n n

n∑ xi yi −  ∑ xi × ∑ yi 
or equivalently bˆ=
= i1 =  i 1 =i 1 
2
n
 n 
n∑ xi −  ∑ xi 
2

=i 1 = i1 

From the given data, the following calculations are made with n=9

Man-hours xi Productivity y i x i2 x iy i
3.6 9.3 12.96 33.48
4.8 10.2 23.04 48.96
7.2 11.5 51.84 82.8
6.9 12 47.61 82.8
10.7 18.6 114.49 199.02
6.1 13.2 37.21 80.52
7.9 10.8 62.41 85.32
9.5 22.7 90.25 215.65
5.4 12.7 29.16 66.42
9 9 9 9

∑ xi = 62.1
i =1
∑ yi = 121
i =1
∑ xi2 = 468.97
i =1
∑ x y = 894.97
i =1
i i

12th Std Statistics 136

12th_Statistics_EM_Unit_5.indd 136 2/27/2019 1:42:59 PM


Substituting the column totals in the respective places in the of the estimates â and b̂ ,
their values can be calculated as follows:
(9 × 894.97) − (62.1×121)
bˆ =
(9 × 468.97) − (62.1) 2

8054.73 − 7514
=
4220.73 − 3856.41

540.73
=
364.32

Thus, bˆ = 1.48 .
Now â can be calculated using b̂ as

121  62.1 
aˆ = − 1.48 × 
9  9 
= 13.40 – 10.21
Hence, â = 3.19
Therefore, the required simple linear regression equation fitted to the given data is
=
Yˆ 3.19 + 1.48 x
It should be noted that the value of Y can be estimated using the above fitted equation for
the values of x in its range i.e., 3.6 to 10.7.


In the estimated simple linear regression equation of Y on X

Yˆ= aˆ + bx
ˆ

we can substitute the estimate â= y − bx


ˆ . Then, the regression equation will become as

Yˆ =y − bx
ˆ + bx
ˆ

Yˆ − y = bˆ( x − x )

It shows that the simple linear regression equation of Y on X has the slope b̂ and the
corresponding straight line passes through the point of averages ( x , y ) . The above representation
of straight line is popularly known in the field of Coordinate Geometry as ‘Slope-Point form’. The
above form can be applied in fitting the regression equation for given regression coefficient b̂
and the averages x and y .
As mentioned in Section 5.3, there may be two simple linear regression equations for each
X and Y. Since the regression coefficients of these regression equations are different, it is essential
to distinguish the coefficients with different symbols. The regression coefficient of the simple
linear regression equation of Y on X may be denoted as bYX and the regression coefficient of the
simple linear regression equation of X on Y may be denoted as bXY.

137 Regression Analysis

12th_Statistics_EM_Unit_5.indd 137 2/27/2019 1:42:59 PM


Using the same argument for fitting the regression equation of Y on X, we have the simple
linear regression equation of X on Y with best fit as
Xˆ = cˆ + bXY y
where ĉ=== xcˆ −+−bbXY y
YX y

1 n
∑ xi yi − xxyy
n i =1
bXY =
1 n 2

n i =1
yi − y 2

The slope-point form of this equation is

Xˆ −=
x bXY ( y − y ).
Also, the relationship between the Karl Pearson’s coefficient of correlation and the
regression coefficient are
SD(Y )
bXX = r SD( X ) and bYXbˆ == rXY .
XY
SD(Y ) SD( X )

5.5 PROPERTIES OF REGRESSION COEFFICIENTS


1. Correlation coefficient is the geometric mean between the regression coefficients.

rXY  bXY  bYX

2. It is clear from the property 1, both regression coefficients must have the same sign.
i.e., either they will positive or negative.
3. If one of the regression coefficients is greater than unity, the other must be less than unity.
4. The correlation coefficient will have the same sign as that of the regression coefficients.
5. Arithmetic mean of the regression coefficients is greater than the correlation coefficient.
bXY  bYX
 rXY
2
6. Regression coefficients are independent of the change of origin but not of scale.

Properties of regression equation


1. If r = 0, the variables are uncorrelated, the lines of regression become perpendicular to each
other.
2. If r = 1, the two lines of regression either coincide or parallel to each other.

 m  m2 
3. Angle between the two regression lines is   tan 1  1  where m1 and m2 are the
 1  m1m2 
slopes of regression lines X on Y and Y on X respectively.
4. The angle between the regression lines indicates the degree of dependence between the variable.
5. Regression equations intersect at (X, Y)

12th Std Statistics 138

12th_Statistics_EM_Unit_5.indd 138 2/27/2019 1:43:06 PM


Example 5.3
Calculate the two regression equations of X on Y and Y on X from the data given below,
taking deviations from actual means of X and Y.

x 12 14 15 14 18 17
y 42 40 45 47 39 45
Estimate the likely demand when the X = 25.

Solution:

xi ui = xi – 15 ui 2 yi v i = yi – 43 v i2 u iv i
12 -3 9 42 -1 1 3
14 -1 1 40 -3 9 3
15 -0 0 45 2 4 0
14 -1 1 47 4 16 -4
18 3 9 39 -4 16 -12
17 2 4 45 2 4 4

Total 90 0 24 258 0 50 -6

6
90
=
=
xx ∑xx=
∑ /=
i ==
1
66 ii == 15
6

6
258
=y ∑ y=
/5
i =1
i = 43
6

The regression line of U on V is computed as under


n n n
n∑ ui vi − ∑ ui ∑ vi
^∧ 6 ( −6 )
bbUVuv = =i 1 =i 1 =i 1
= = −0.12
 n 
n 2
6 × 50
n∑ vi −  ∑ vi  2

=i 1 = i1 
∧ ∧ ∧^ ∧
a=
=
u − −b UV
buvv v=
=0
^∧ ∧
Hence, the regression line of U on V is U = =b vv + a =
bUV −0.12v

Thus, the regression line of X on Y is (Y–43) = –0.25(x–15)

When x = 25, y – 43 = –0.25 (25–15)

y = 40.5

139 Regression Analysis

12th_Statistics_EM_Unit_5.indd 139 2/27/2019 1:43:06 PM


Important Note: If X, Y are not integers then the above method is tedious and time
consuming to calculate bYX and bXY. The following modified formulae are easy for calculation.
n n n
n x i y i   x i  y i
bYX  i 1 i 1 i 1
2
n
 n
n x    x i 
2
i
i 1  i 1 
n n n
n x i y i   x i  y i
bXY  i 1 i 1 i 1
2

n
 n
n y    y i 
2
i
i 1  i 1 

Example 5.4
The following data gives the experience of machine operators and their performance
ratings as given by the number of good parts turned out per 50 pieces.

Operators 1 2 3 4 5 6 7 8
Experience (X) 8 11 7 10 12 5 4 6
Ratings (Y) 11 30 25 44 38 25 20 27
Obtain the regression equations and estimate the ratings corresponding to the experience
x=15.
Solution:

xi yi x iy i x i2 y i2
8 11 88 64 121
11 30 330 121 900
7 25 175 49 625
10 44 440 100 1936
12 38 456 144 1444
5 25 125 25 625
4 20 80 16 400
6 27 162 36 729
Total 63 220 1856 555 6780
Regression equation of Y on X,

Y  y  bYX  x  x 
^

x i
63
x i 1
  7.875
n 8
n

y i
220
y i 1
  27.5
n 8

12th Std Statistics 140

12th_Statistics_EM_Unit_5.indd 140 2/27/2019 1:43:13 PM


The above two means are in decimal places so for the simplicity we use this formula to compute bYX .
n n n
n x i y i   x i  y i
bYX  i 1 i 1 i 1
2

n
 n
n x    x i 
2
i
i 1  i 1 
8 1856  63  220

8  555  63  63
14848  13860

4440  3969
988
=
471
bYX = 2.098

The regression equation of Y on X,

Y  y  bYX  x  x 
^

^
Y – 27.5 = 2.098 (x – 7.875)
^
Y – 27.5 = 2.098 x – 16.52
^
Y = 2.098x + 10.98
When x = 15,
^
Y = 2.098 × 15 +10.98
^
Y = 31.47 + 10.98
= 42.45
Regression equation of X on Y,

X  x  bXY  y  y 
^

n n n
n x i y i   x i  y i
bXY  i 1 i 1 i 1
2

n
 n
n y    y i 
2
i
i 1  i 1 
8 1856  63  220

8  6780  220  220
14848  13860

54240  48400
988
=
5840
bXY = 0.169

141 Regression Analysis

12th_Statistics_EM_Unit_5.indd 141 2/27/2019 1:43:24 PM


The regression equation of X on Y,
^
X – 7.875 = 0.169 (y – 27.5)
^
X – 7.875 = 0.169y – 0.169 × 27.5
^
X = 0.169y + 3.222

Example 5.5
The random sample of 5 school students is selected and their marks in statistics and
accountancy are found to be

Statistics 85 60 73 40 90
Accountancy 93 75 65 50 80

Find the two regression lines.

Solution:
The two regression lines are:
Regression equation of Y on X,
^
Y  y  bYX  x  x 

Regression equation of X on Y,
X  x  bXY  y  y 
^

ui = x i – A v i = xi – B
xi yi u iv i ui 2 y i2
= xi – 60 = xi – 75

85 93 25 18 450 625 324


60 A 75 B 0 0 0 0 0
73 65 13 –10 –130 169 100
40 50 –20 –25 500 400 625
90 80 30 5 150 900 25

Total 348 363 48 12 970 2094 1074

x i
348
x i 1
  69.6
n 5
n

y i
363
y i 1
  72.6
n 5
Since the mean values are in decimals format not as integers and numbers are big, we take
origins for x and y and then solve the problem.

12th Std Statistics 142

12th_Statistics_EM_Unit_5.indd 142 2/27/2019 1:43:24 PM


Regression equation of Y on X,
^
Y  y  bYX  x  x 

Calculation of bYX
n n n
n ui vi   ui  vi
bYX  bVU i 1 i 1 i 1
2
n
  n
n u    ui 
2
i
i 1  i 1 
5  970  48  9(12)

5  2094  (48) 2
4850 + 576
=
10470 – 2304

5426
= = 0.664
8126
b=
YX b=
VU 0.664
^
Y – 72.6 = 0.664 (x – 69.6)
^
Y – 72.6 = 0.64x – 46.214
^
Y = 0.664x + 26.386
Regression equation of X on Y,
X  x  bXY  y  y 
^

Calculation of bXY
n n n
n ui vi   ui  vi
bXY  bUV i 1 i 1 i 1
2

n
 n
n v    vi 
2
i
i 1  i 1 
5  970  48  (12)

5 1074  (12) 2
4850  576 5426
 
5370  144 5226
bUV = 1.038
^
X – 69.6 = 1.038 (y – 72.6)
^
X – 69.6 = 1.038y – 75.359
^
X = 1.038y – 5.759

143 Regression Analysis

12th_Statistics_EM_Unit_5.indd 143 2/27/2019 1:43:32 PM


Example 5.6
Is there any mistake in the data provided about the two regression lines
Y = −1.5 X + 7, and X = 0.6 Y + 9? Give reasons.

Solution:
The regression coefficient of Y on X is bYX = –1.5
The regression coefficient of X on Y is bXY = 0.6
Both the regression coefficients are of different sign, which is a contrary. So the given
equations cannot be regression lines.

Example: 5.7

mean S.D
Yield of wheat (kg. unit area) 10 8
Annual Rainfall (inches) 8 2

Correlation coefficient: 0.5


Estimate the yield when rainfall is 9 inches

Solution:
Let us denote the dependent variable yield by Y and the independent variable rainfall by X.
Regression equation of Y on X is given by
SD(Y )
Y – ybˆ == rXY (x – x)
SD( X )

x = 8, SD(X) = 2, y = 10, SD(Y) = 8, rXY = 0.5


8
Y  10  0.5  (xX–8)
2
= 2 (x – 8)
When x = 9,
Y – 10 = 2 (9 – 8)
Y = 2 + 10
= 12 kg (per unit area)
Corresponding to the annual rain fall 9 inches the expected yield is 12 kg ( per unit area).

Example 5.8
For 50 students of a class the regression equation of marks in Statistics (X) on marks in
Accountancy (Y) is 3Y – 5X + 180 = 0. The mean marks in of Accountancy is 50 and variance of
marks in statistics is 16
25
of the variance of marks in Accountancy.

12th Std Statistics 144

12th_Statistics_EM_Unit_5.indd 144 2/27/2019 1:43:34 PM


Find the mean marks in statistics and the coefficient of correlation between marks in the
two subjects when the variance of Y is 25.

Solution:
We are given that:
n = 50, Regression equation of X on Y as 3Y – 5X + 180 = 0
16
y = 50 , V ( X ) = V (Y ) , and V(Y) = 25.
25
We have to find (i) x and (ii) rXY
(i) Calculation for x
Since (x, y) is the point of intersection of the two regression lines, they lie on the regression
line 3Y – 5X + 180 = 0
Hence, 3 y  5x  180  0
3(50)  5x  180  0
5 x  180  150
 330
330
x  66
5
x  66
(ii) Calculation for coefficient of correlation.
3Y  5 X  180  0
5 X  180  3Y
X  36  0.6 Y
bXY  0.6

Also bXY = r SD( X )


XY
SD(Y )

0.6 = r SD( X )
XY
SD(Y )
0.6 × SD(Y )
rXY =
SD( X )
V (Y )
2
rXY = 0.36 × (1)
V (X)
Given that:
V(Y) = 25
16
V ( X ) = V (Y )
25
= 16 × 25
25
V(X) = 16

145 Regression Analysis

12th_Statistics_EM_Unit_5.indd 145 2/27/2019 1:43:43 PM


Substituting in (1) we have
0.36  25
2
rXY 
16

0.36  25
rXY  = 0.75
16

Example 5.9
5 9
If two regression coefficients are bYX = and bXY = , what would be the value of rXY?
6 20
Solution:
The correlation coefficient rXY   bYX  bXY 
5 9 = 0.375
 
6 20
Since both the signs in bYX and bXY are positive, correlation coefficient between X and Y is
positive.

Example 5.10 NOTE


18 5 The sign of the corelation
Given that bYX   7 and bXY   . Find r?
6 coefficient will be the signs of the
Solution: regression coefficients.

rXY   bYX  bXY 


18 5 15
   = = –0.553.
7 6 7
Since both the signs in bYX and bXY are negative, correlation coefficient between X and Y
is negative.

12th Std Statistics 146

12th_Statistics_EM_Unit_5.indd 146 2/27/2019 1:43:55 PM


5.6 DIFFERENCE BETWEEN CORRELATION AND REGRESSION

Correlation Regression
1. It indicates only the nature and extent of It is the study about the impact of the
linear relationship independent variable on the dependent
variable. It is used for predictions.
2. If the linear correlation is coefficient is The regression coefficient is positive, then for
positive / negative , then the two variables every unit increase in x, the corresponding
are positively / or negatively correlated average increase in y is bYX. Similarly, if the
regression coefficient is negative , then for
every unit increase in x, the corresponding
average decrease in y is bYX.

3. One of the variables can be taken as x and Care must be taken for the choice of independent
the other one can be taken as the variable y. variable and dependent variable. We can not
assign arbitrarily x as independent variable and
y as dependent variable.
4. It is symmetric in x and y, It is not symmetric in x and y, that is, bXY and bYX
ie., rXY=rYX have different meaning and interpretations.

POINTS TO REMEMBER

™ There are several types of regression - Simple linear correlation , multiple linear
correlation and non-linear correlation.
™ In simple linear regression there are two linear regression lines Y on X and X on Y.
™ In the linear regression line Y = a + bX + e , where ‘X’ is independent variable, ‘Y’ is
dependent variable, a’ is intercept, ‘b’ is slope of the line and ‘ e’ is error term.
™ The point ( X , Y ) passes through the regression lines.
™ The “ Method of least squares” gives the line of best fit.
™ Both the regression lines have the same sign either positive of negative.
™ The sign of the regression coefficient and the sign of the correlation coefficient is
same.

147 Regression Analysis

12th_Statistics_EM_Unit_5.indd 147 2/27/2019 1:43:57 PM


EXERCISE 5

I. Choose the best answer.


1. ______ is widely used for prediction
a) regression analysis b) correlation analysis
c) analysis of variance d) analysis of covariance

2. The linear regression analysis can be classified in to


a) 4 types b) 3 types c) 2 types d) none of the above

3. The linear equation Y = a + bx is called as regression equation of


a) X on Y b) Y on X c) between X and Y d) ‘a’ on ‘b’

4. In regression equation X = a + by + e is
a) correlation coefficient of Y on X b) correlation coefficient of X on Y
c) regression coefficient of Y on X d) regression coefficient of X on Y

5. bYX =
SD( X ) SD(Y ) SD( X ) SD(Y )
a) rXY b) rXY c) d)
SD(Y ) SD( X ) SD(Y ) SD( X )

6. If bXY > 1 then bYX is


a) 1 b) 0 c) > 1 d) < 1
SD(Y )
 x  x  , rXY SD(Y ) is
^
7. In the Regression equation Y  y  rXY
SD( X ) SD( X )
a) bYX b) bXY c) rXY d) cov(X,Y)

8. Using the regression coefficients we can calculate

a) cov(X, Y) b) SD(X)
c) correlation coefficient d) coefficient of variance

9. Arithmetic mean of the regression coefficients bXY and bYX is


a) > rXY b) ≥ rXY c) ≤ rXY d) < rXY

10. Regression analysis helps in establishing a functional relationship between ______ variables.
a) 2 or more variables b) 2 variables
c) 3 variables d) none of these

11. _____ is the Father of mental tests

a) R.A. Fisher b) Croxton and Cowden


c) Francis Galton d) A.L. Bowley

12th Std Statistics 148

12th_Statistics_EM_Unit_5.indd 148 2/27/2019 1:43:58 PM


12. Correlation coefficient is the _______ between the regression coefficients

a) arithmetic mean b) geometric mean


c) harmonic mean d) none of the above

13. If the two lines of regression are perpendicular to each other then rXY =

a) 0 b) 1 c) –1 d) 0.5

14. If the two regression lines are parallel then

a) rXY = 0 b) rXY = +1 c) rXY = –1 d) rXY = ± 1

15. Angle between the two regression lines is


 m1 m2 
a) tan 1  m1  m2  b) tan 1 
   1  m m 
 1  m1m2   1 2 

 m  m2 
c) tan 1  1  d) none of the above
 1  m1 m2 

16. bXY =

SD(Y ) SD( X )
a) rXY b) rXY
SD( X ) SD(Y )
1
c) rXY SD(X) SD(Y) d)
bYX
17. Regression equation of X on Y is

a) Y = a + bYX x + e b) Y = bXY x + a + e
c) X = a + bXY y + e d) X = bYX y + a + e
^
18. For the regression equation 2Y = 0.605x + 351.58. The regression coefficient of Y on X is

a) bXY = 0.3025 b) bXY = 0.605

c) bYX = 175.79 d) bYX = 351.58

19. If bXY = 0.7 and ‘a’ = 8 then the regression equation of X on Y is

a) Y = 8 + 0.7 X b) X = 8 + 0.7 Y
c) Y = 0.7 + 8 X d) X = 0.7 + 8 Y

20. The regression lines intersect at

a) (X, Y) b) (X, Y) c) (0, 0) d) (1, 1)

149 Regression Analysis

12th_Statistics_EM_Unit_5.indd 149 2/27/2019 1:44:02 PM


II. Give very short answer to the following questions.
21. Define regression.
22. What are the types of regression?
23. Write the two simple linear regression equations.
24. Write the two simple linear regression coefficients.
25. If the regression coefficient of X on Y is 16 and the regression coefficient of Y on X is 4, then
find the correlation coefficient.
26. Find the standard deviation of Y given that V(X) is 36, bXY = 0.8, rXY = 0.5.

III. Give short answer to the following questions.


27. Define simple linear and multiple linear regressions
28. Distinguish between linear and non-linear regression.
29. Write the regression equation of X on Y and its normal equations.
30. Write the regression equation of Y on X and its normal equations.
31. Write any three properties of regression.
32. Write any three uses of regression.
33. Write any three differences between correlation and regression.
^ ^
34. If the regression equations are X = 64 – 0.95y, Y = 7.25 – 0.95x then find the correlation
coefficient.
35. Given the following lines of regression.
8X – 10Y + 66 = 0 and 40X – 18Y = 214. Find the mean values of X and Y.

36. Given x = 90, y = 70, bXY = 1.36, bYX = 0.61 when y = 50, Find the most probable value of X.

37. Compute the two regression equations from the following data.

x 1 2 3 4 5
y 3 4 5 6 7
^
If x = 3.5 what will be the value of Y ?

IV Give detailed answer to the following questions.


38. Write in detail the properties of regression.
39. Explain in detail the uses of regression analysis.
40. Distinguish between correlation and regression.
41. Interpret the result for the given information. A simple regression line is fitted for a data set
and its intercept and slope respectively are 2 and 3. Construct the linear regression of the
form Y = a + bx and offer your interpretation for ‘a’ and ‘b’. If X is increased from 1 to 2, what
is the increase in Y value. Further if X is increased from 2 to 5 what would be the increase
in Y. Demonstrate your answer mathematically.

12th Std Statistics 150

12th_Statistics_EM_Unit_5.indd 150 2/27/2019 1:44:02 PM


42. Using the method of least square, calculate the regression equation of X on Y and Y on X from
the following data and estimate X where Y is 16.
x 10 12 13 17 18
y 5 6 7 9 13
Also determine the value of correlation coefficient.

43. The following table shows the age (X) and systolic blood pressure (Y) of 8 persons.

Age (X) 56 42 60 50 54 49 39 45
Blood pressure (Y) 160 130 125 135 145 115 140 120

Fit a simple linear regression model, Y on X and estimate the blood pressure of a person of
60 years.

44. Find the regression equation of X on Y given that n = 5, ∑x = 30, ∑y = 40, ∑xy = 214, ∑x2 = 220,
∑y2 = 340.

45. Given the following data, estimate the marks in statistics obtained by a student who has
scored 60 marks in English.
Mean of marks in Statistics = 80, Mean of marks in English = 50, S.D of marks in Statistics =
15, S.D of marks in English = 10 and Coefficient of correlation = 0.4.

46. Find the linear regression equation of percentage worms (Y) on size of the crop (X) based on
the following seven observations.

Size of the crop (X) 16 15 11 27 39 22 20


Percentage worms (Y) 24 25 34 40 35 20 23

47. In a correlation analysis, between production (X) and price of a commodity (Y) we get the
following details.
Variance of X = 36.
The regression equations are:
12X – 15Y + 99 = 0 and 60 X – 27 Y =321
Calculate (a) The average value of X and Y.
(b) Coefficient of correlation between X and Y.

151 Regression Analysis

12th_Statistics_EM_Unit_5.indd 151 2/27/2019 1:44:02 PM


ANSWERS
I. 1. a) 2. c) 3. b) 4. d) 5. b)
6. d) 7. a) 8. c) 9. b) 10. a)
11. c) 12. b) 13. a) 14. d) 15. c)
16. b) 17. c) 18. a) 19. b) 20. a)

II. 25) rXY = 8


26) SD(X) = 3.75

III. 34) rXY = –0.95


35) X = 13, Y = 17
^
36) when Y = 50, X = 62.8
^
37) Regression equation X on Y: X = Y – 2
^
Regression equation Y on X: Y = X + 2
^
when X = 3.5, Y = 5.5

IV. 41) (1) If X increases by 1 unit then Y increases by 3 units


(2) If X increases by 3 units then Y increases by 9 units
42) (1) Regression equation of X on Y is X = Y + 6; when Y = 16, X = 22
(2) Regression equation of Y on X is Y = 0.89 × –2.59
(3) bxy = 1, byx = 0.87, r = 0.93
43) Y = 0.45 X + 111.53, Y = 138.53 when age is 60 years.
44) a = 16.4, b = –1.3
Regression equation of X on Y is : X = 16.4 – 1.3Y
45) X = 86 when Y = 60
46) Y = 0.32 X + 21.84
47) (a) Mean of X = 13 and mean of Y = 17. (b) r = 0.6

12th Std Statistics 152

12th_Statistics_EM_Unit_5.indd 152 2/27/2019 1:44:02 PM

You might also like