0% found this document useful (0 votes)
16 views30 pages

Correlation

Uploaded by

2002rohanjha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views30 pages

Correlation

Uploaded by

2002rohanjha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

BUSINESS STATISTICS:

Text and Problems


With Introduction to Business Analytics

Dr. N. D. VOHRA
Chapter 12

Correlation analysis
INTRODUCTION

One variable Univariate

Classification of
Statistical data Bivariate (Two
Variables)
More than one
variable
Multivariate
(more than two
variables)
INTRODUCTION

• For a study of correlation and regression analysis, we


consider bivariate and multivariate data.
• Correlation analysis: Related to discovery and measurement
of degree of co-variation of the variables involved.
• Regression analysis: Analysis of the nature of relationship
with a view to make estimates of the values of one variable
on the basis of the given values of the other variable(s).
CORRELATION ANALYSIS

• Bivariate Data : When two variables move in sympathy with each other
so that changes in one variable are associated with changes in the other
variable in the same, or in the opposite direction, they are said to be
correlated.
• When the variables move in same direction, then the correlation is said
to be positive while if they are in the opposite directions, the
correlation is said to be negative.
• Remember that the direction of movement indicated is in general. It
means that it is not necessary that in positive correlation a higher value
of one variable shall necessarily be accompanied by a higher value of
the other.
DIRECTION AND DEGREES OF
CORRELATION

DIRECTION DEGREE

POSITIVE: Higher values of one Perfect Correlation


variable are associated with higher High Degree Correlation
values of the other variable & lower Moderate Degree Correlation
values with lower values Low Degree Correlation
No Correlation
NEGATIVE: Higher values of one
variable are associated with higher
values of the other variable & lower
values with lower values
CORRELATION

• Linear and Non-linear Relationship


In a set of bivariate data, when pairs of values are plotted on a
graph then they would fall on, or closely on, a straight line,
correlation is linear. If they do not, the correlation is nonlinear.
• Simple, Multiple and Partial Correlations
The correlation is said to be simple when we deal with bivariate
data. In case three or more variables are involved so that we are
dealing with multivariate data sets, the correlation between
variables is multiple or partial.
SIMPLE CORRELATION

• In this case, pairs of values are given.


• The variables are arbitrarily designated as X and Y and we seek to
determine if the two are correlated.
• And if they are correlated then what is the degree and direction of
such correlation.
• An idea about the correlation can be had by showing the data on
a scatter diagram.
• To draw a scatter diagram, plot the values the two variables on the
two axes of a graph – one on the X-axis and the other on Y-axis.
• The various pairs of values are shown by means of dots.
GRAPHIC ANALYSIS OF CORRELATION:
SCATTER DIAGRAM

• While moving to right on the X-axis, if various dots are found to


be lying higher and higher on the graph, the correlation between
variables is positive. On the other hand, if they are observed to
be lying lower and lower, then the correlation is negative.
• If various dots may be joined by a straight line, sloping upward
or downward, the correlation is said to be perfect. The
correlation is positive or negative accordingly as the line is
sloping upward or downward.
• If the dots do not fall exactly on a line but are very close to being
on a line, then there is a high degree of correlation.
GRAPHIC ANALYSIS OF CORRELATION:
SCATTER DIAGRAM

• The more scattered are the dots, the smaller is the degree of correlation
between the variables.
• There is no correlation between the variables when
❑ the dots are so scattered that there is no clear direction of their slope, and
❑ the dots are falling on a line that is parallel to the X-axis or the Y-axis.
A line parallel to the X-axis implies that the variable Y is not responsive to
changes in X whereas a line parallel to the Y-axis implies that X is not
sensitive to changes in Y.
Hence there is no correlation in either case.
SOME SELECTED SCATTER DIAGRAMS
EXAMPLE

At National Company the newly recruited salesmen are given a training


which is followed by an aptitude test before they are put on the job.
The following data collected by the sales manager of the company shows the
scores at the aptitude test and sales made in the first quarter of their
employment by a total of 10 salesmen.
Plot these data on a graph as a scatter diagram and establish whether
correlation exists between the test scores and sales.

Salesman: 1 2 3 4 5 6 7 8 9 10
Test scores: 18 20 21 22 27 27 28 29 29 29
Sales (000 Rs): 23 27 29 28 28 31 35 30 36 33
SOLUTION
Line through
the points
KARL PEARSON’S
COEFFICIENT OF CORRELATION


KARL PEARSON’S
COEFFICIENT OF CORRELATION


KARL PEARSON’S
COEFFICIENT OF CORRELATION

• This coefficient may assume negative as well as positive values and its
value can lie only within ±1.
• The negative sign of the correlation coefficient implies negative
correlation between the variables and positive sign implies a positive
correlation.
• Ignoring sign, closer the coefficient to zero, smaller the degree of
correlation and closer is the value to 1, higher is the degree of
correlation.
• However, the correlation coefficient should always be interpreted taking
in to account the sample size.
AN EXAMPLE


CALCULATION OF
COEFFICIENT OF CORRELATION
By Measuring Deviations From Mean Values
CALCULATION OF
COEFFICIENT OF CORRELATION
By Measuring Deviations From Assumed Mean Values
CALCULATION OF
COEFFICIENT OF CORRELATION
Without Measuring Deviations
ASSUMPTIONS OF THE COEFFICIENT
OF CORRELATION, r

• Linear Relationship: The product-moment coefficient of


correlation assumes essentially that the relationship between the
variables is linear in nature.
• Normality: A further assumption is that a large number of
independent factors operate on each of the variables being
correlated in such a way that each of them is normally distributed.
PROPERTIES OF THE
COEFFICIENT OF CORRELATION, r

• The Karl Pearson’s coefficient of correlation is a pure number


and is divorced of the units in which the original data are
expressed.
• As indicated earlier, the value of the coefficient of correlation
varies between ±1.
• The coefficient of correlation is independent of the change of
origin and scale of the data. Thus, if a constant is added
to/subtracted from one or both variable values or if all values are
multiplied or divided by a constant, it will have no effect on the
value of the coefficient.
COEFFICIENT OF DETERMINATION

• It measures how much variation in one variable is explained by


variation in the other variable.
• It is numerically equal to the square of the coefficient of
correlation, r2.
• An r2 equal to 0.64 implies that 64 percent of the variation in one
variable is due to variation in the other variable.
• In the context of a situation where the variables are perfectly
correlated so that r = 1 (or −1). In such a case, r2 = 1 implies that
all changes in one variable are explained by changes in the other
variable.
COEFFICIENT OF CORRELATION
AND ITS LIMITATIONS

• First, too much importance may not be given to coefficients of


correlation obtained from small data sets as they may lead to
erroneous conclusions.
• In any case, it is always advisable to interpret the value of a given
correlation coefficient using the probable error.
• Secondly, it should be clearly understood that while a
cause-and-effect relationship between two variables would result in
a correlation between them the reverse is not true.
• Further, sometimes a high correlation may be found between the
variables due to chance alone.
RANK CORRELATION

• Rank correlation is calculated essentially where the variables under


consideration cannot quantified being measured on ordinal scale.
• However, it can be calculated even where the variables are
objectively quantifiable.
• This is done by ranking the given data on the basis of the values
involved.
• The rank correlation coefficient also varies between ±1.
• The presence of extreme observations in the data does not distort
the value of rank correlation coefficient.
CHARLES SPEARMAN’S
COEFFICIENT OF RANK CORRELATION, rs


TIED RANKS

• While ranking, it may sometimes not be possible to


distinguish clearly between adjacent units.
• The ranks are said to be tied in such a case.
• Similarly, in quantitatively expressed data, tied ranks are
experienced when equal values appear in a given series.
• The problem is resolved by assigning the average of the
ranks involved to each of them.
TIED RANKS


MULTIPLE AND PARTIAL
CORRELATION
• When the data involve two variables, the correlation between the variables
is called simple correlation
• When they involve more than two variables, then we study multiple and
partial correlations.
• In such data, there are two or more independent variables which affect a
dependent variable.
• Multiple correlation is used to study the joint or cumulative effect of all the
given independent variables on the dependent variable.
• The partial correlation involves a study of correlation between one
independent variable and the dependent variable holding the other
independent variable(s) constant statistically.
END OF CHAPTER 12

You might also like