0% found this document useful (0 votes)

178 views15 pages

Unit 2 - (A) Correlation & Regression

This document provides an introduction to correlation and regression. It defines correlation as the relationship between two variables where a change in one variable is associated with changes in the other. Correlation can be positive if variables change in the same direction, or negative if they change in opposite directions. Methods for studying correlation include scatter plots, Pearson's correlation coefficient, Spearman's rank correlation, and concurrent deviation. Correlation indicates an association but does not prove causation between variables.

Uploaded by

saumya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

178 views15 pages

Unit 2 - (A) Correlation & Regression

Uploaded by

saumya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

F.Y.B.B.

A
Semester –I
Unit – II
Correlation and Regression
➢ Introduction:
We have studied the different series where various items assumed different value of one variable.
We have discussed up till now, measures of central tendency and measures of dispersion are
calculated in such cases for purpose of comparison and analysis. With the help of these measures
data can be easily understood. There can, however, be such series also where, each item assumes the
values of two or more variables. For examples, if the heights and weights of a group of persons are
measured, we shall get such series where each member of the group would assume two values, one
relating to height and other relating to weight. Such a distribution is known as bivariate distribution.
But someti mes it appears that the values of the various variables, so obtained are interrelated. It is
likely that such relationship may be obtained in two series relating to the heights and weights of a
group of persons. It may be observed that weight increases with increase in height. So that tall people
are heavier than short sized people. Similarly, if the data are collected about the prices of a
commodity and quantities sold at different prices, two series would be obtained. In two such series
we are again likely to find some relationship. With increases in the price of the commodity the
quantity sold is bound to decrease. We can thus conclude that there is some relationship between
price and demand. Such relationship can be found in many types of series, for example, price and
supply, heights and weights of persons, price of sugar and sugarcane, age of husbands and wives, ec.
So, we can say that “The term correlation (or co-variation) indicates the relationship between
two such variables in which with changes in the values of one variable, the values of the other
variable also changes.” Thus correlation is statistical tool of studying the relationship between two
variables. For correlation it is essential that the two phenomena should have cause-effect
relationship. If such relationship does not exist then one should not talk of correlation.

➢ Types of correlation:
1) By direction of change (Positive and Negative)
Positive Correlation: While studying the relationships of any two related variables, if we find the
deviation of the value of variables are in the same direction i.e. if one variable increases (or
decreases), the corresponding value of the second variable also increases (or decreases), then it is
called a Positive Correlation. For e.g. Height and weight of human beings, demand and supply,
amount of rain fall and yield of crop have positive correlation.

Negative Correlation: While studying the relationships of any two related variables, if we find the
deviation of the value of variables in the opposite direction i.e. if one variable increases (or
decreases), the corresponding value of the second variable decreases (or increases), then it is called
a Negative Correlation. For e.g. price and demand of commodity, temperatures and sales of woolen
clothes have negative correlation.

2) Linear and Non-Linear Correlation:

When the amount of change in one variable tends to bear a constant ratio to the amount of change in
the other variable then the correlation is called a Linear Correlation. In such a case if the values of
the variables are plotted on a graph paper, then a straight line is obtained. But when the amount of
change in one variable does not tends to bear a constant ratio to the amount of change in the other
variable, and then the correlation is called a Non-Linear Correlation or Curvilinear. In such situation
if the values of the variables are plotted on a graph paper, then a curve is obtained.
Correlation can either be simple correlation or it can be partial correlation or it can be multiple
correlation.
3) By number of variables under study. (Simple, Partial and Multiple correlation)
When we study the relationship between only two variables then it is called simple correlation. e.g.
Let two variables be, volume of sale and price of item then correlation between them is simple.
When more than two variables are involved in a study relating to correlation then it can either be
multiple correlation or partial correlation. Partial correlation may be defined as the correlation
between one dependent variable with one independent variable by keeping the effect of other
independent variables constant. e.g Let three variables are, volume of sale, expenditure on
advertisement and price of item then correlation between Volume of sale and advertisement expense,
by keeping the effect of price of item constant is called partial correlation. Multiple correlation may
be defined as correlation between one dependent variable with all other independent variables. e.g
multiple correlation is the study of joint effect of price and advertisement expenditure on volume of
sale.
➢ Correlation and causation:
Correlation analysis enables us to have an idea about the degree and direction of the relationship
between the two variables under study. However, it fails to reflect upon the cause and effect
relationship between the variables. In a bivariate distribution, if the variables have the cause and
effect relationship, they are bound to vary in sympathy with each other and, there is bound to be
high degree of correlation between them. Thus causation always implies correlation, but
converse is not true. The high degree of correlation between variable due to the following reasons.
1. Mutual dependence: The variables under study may be inter-influence each other, e.g. price
of commodity and its demand. Here it is very difficult to isolate the exact cause from the
effect.
2. Both variables being influenced by the same external factors: A high degree of correlation
between variables is observed due to the effect of a third variable or a number of variables
on each of these two variables, e.g. high degree of correlation between yield of two crops,
say rice and potato, due to effect of number of factors like, weather condition, fertilizer used,
irrigation facilities, etc., on each of them.
3. Pure chance: It may happen that a small randomly selected sample from a bivariate
distribution may show a fairly high degree of correlation though, actually, the variables may
not be correlated in the population. Such correlation is called non-sense correlation, e.g. the
correlation between the size of the shoe and the intelligence of a group of individuals.
➢ Methods to study correlation:
1. Scatter diagram.
2. Karl Pearson’s method (product moment correlation coefficient)
3. Spearman’s Rank correlation method
4. Concurrent deviation method

1. Scatter Diagram Method

It is simplest method to study the correlation between two variables. Take value of one variable on
x-axis and another variable on y-axis and the values of each pair we plot on the graph paper and the
diagram so obtained are called scatter diagram or dot diagram.

• If plotted dots lie on the straight line rising from the lower left-hand corner to the upper right
hand corner then the correlation is said to be perfect positive correlation.

• If plotted dots lies on the straight line from the upper left hand corner to the lower right hand
corner then correlation is said to be perfect negative correlation.

• If plotted dots fall in a narrow band showing a rising tendency from the lower left hand corner
to the upper right hand corner, then correlation is high degree positive correlation. As the band
becomes wider the degree of correlation becomes low and we called low degree positive
correlation.

• If plotted dots fall in a narrow band showing a decreasing tendency from the upper left hand
corner to the lower right hand corner, then correlation is high degree negative correlation. As
the band becomes wider the degree of correlation becomes low and we called low degree
negative correlation.
• If the dots are widely scattered in haphazard manner, it indicates no correlation between two
study variables.

Perfect Positive High degree Low degree

correlation positive correlation Positive correlation

Perfect Negative High degree Low degree

correlation Negative correlation Negative correlation

No correlation Non linear correlation

It is simplest method to study the correlation between two variables. It helps to visualize the
relationship between two related variables but does not enable us to measure the degree to which the
variables are linearly related.
2. Karl Pearson’s method
The Karl Pearson’s Method is most widely used method of measuring the relationship between two
variables. This coefficient is based on the following assumptions:
(i) There is a linear relationship between the two variables which means that straight line would
be obtained if the observed data are plotted on a graph.
(ii) The two variables are causally related which means that one of the variable is independent
and the other one is dependent.
(iii) A large number of independent causes are operating in both the variables so as to produce a
normal distribution.
Bivariate table without frequency : ( for n pairs )
X x1 x2 ….. xn-1 xn
Y y1 y2 …. yn-1 yn

Bivariate table with frequency : (for N pairs )

If in a bivariate distribution the data are fairly large, they may be summarized in the form of
two-way table. Here for each variable, the values are group into various classes (not necessarily
the same for both the variables), keeping in view the same considerations as in the case of
univariate distribution. For example , if there are m classes for X-series and n classes for the Y-
variable series then there will be m*n cells in the two–way table. By going through the different
pairs of the values(x,y) and using tally marks we can find the frequency for each cell and thus
obtain the so-called bivariate frequency table as shown below.

BIVARIATE FREQUENCY TABLE

X series Classes Total of
. frequencies
Y series . x1 x2 x3 . . . . xm . of Y
Classes
midpoint
y1
y2 fy

.
. fxy
.

Total of
Frequencies of X fx
N

Here fxy is the frequency of the pair (x,y)

Karl Pearson’s Correlation coefficient: It measures the degree of correlation between two
variables. It is denoted by r xy or r denoting the measure of correlation between two variables x
and y. It can be written as
cov( x, y )
r =
 x y

Where,
_
X=
x for without frequency data
n

=
f x x
for with frequency data
N
_
Y=
y for without frequency data
n

=
f y y
for with frequency data
N

Cov( x, y ) =
 xy − x y for without frequency data
n

=
f xy xy
− x y for with frequency data
N

 x − (x )
2


2
x
= for without frequency data
n

 ( f x ) − (x )
2
x 2
= for with frequency data
N

 y − (y )
2


2
y
= for without frequency data
n
( fy y2)
=
N
()
− x
for with frequency data
2

Interpretation: If r xy = +1 means perfect positive correlation between variables x and y,

If r xy
= −1 means perfect negative correlation between variables x and y,
If r xy
lies between 0 and 1 means positive correlation between variables x and y,
If r xy
lies between − 1 and 0 means negative correlation between variables x and y.
If r xy
= 0 means no linear correlation between variables x and y.
If the correlation coefficient is close to +1 that means you have a strong positive relationship.

If the correlation coefficient is close to -1 that means you have a strong negative relationship

Formulas:
(a)For ungrouped bivariate data (without frequency)
 x−x y− y ( )( )
r xy =
 x− x ( )  (y − y )
2 2

=
 xy − n x y
r xy
 x − nx  y
2 2 2
− ny
2

n xy −  x y
r =
n x − ( x ) n y − ( y )
xy 2
2 2 2

(b) for grouped bivariate data (frequency data)

n  f xy −  f x f y
xy x y
r xy = 2

−   y 
2 2 2
n  f x x −   f x  n f y y f
 x   y

➢ Properties of Correlation Coefficient:
(i) Karl Pearson’s Correlation coefficient lies between -1 and +1, i.e. -1 ≤ r ≤ +1
(ii) Correlation coefficient is independent of the change of origin and scale.
(iii) Two independent variables are uncorrelated but converse is not true.
Hence r xy = 0 for independent variables.

Rank Correlation:

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship
between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is
the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular
variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used
to assess the significance of the relation between them.
if, for example, one variable is the identity of a college basketball program and another variable is the identity of
a college football program, one could test for a relationship between the poll rankings of the two types of program:
do colleges with a higher-ranked basketball program tend to have a higher-ranked football program? A rank
correlation coefficient can measure that relationship, and the measure of significance of the rank correlation
coefficient can show whether the measured relationship is small enough to likely be a coincidence.
If there is only one variable, the identity of a college football program, but it is subject to two different poll
rankings (say, one by coaches and one by sportswriters), then the similarity of the two different polls' rankings
can be measured with a rank correlation coefficient.

The Spearman correlation coefficient, rs, can take values from +1 to -1.

A rs of +1 indicates a perfect association of ranks, a rs of zero indicates no association between ranks and
a rs of -1 indicates a perfect negative association of ranks.

The closer rs is to zero, the weaker the association between the ranks.

An example of calculating Spearman's correlation

To calculate a Spearman rank-order correlation on data without any ties we will use the following data:

We then complete the following table:

Exam Marks

English 56 75 45 71 62 64 58 80 76 61

Maths 66 70 40 60 65 56 59 77 67 63

Maths Rank Rank

English (mark) d d^2
(mark) (English) (maths)

56 66 9 4 5 25

75 70 3 2 1 1

45 40 10 10 0 0

71 60 4 7 3 9

62 65 6 5 1 1

64 56 5 9 4 16

58 59 8 8 0 0

80 77 1 1 0 0

76 67 2 3 1 1

61 63 7 6 1 1

Where d = difference between ranks and d2 = difference squared.

We then calculate the following:

We then substitute this into the main equation with the other information as follows:

as n = 10. Hence, we have a ρ (or rs) of 0.67. This indicates a strong positive relationship between the ranks
individuals obtained in the maths and English exam. That is, the higher you ranked in maths, the higher you
ranked in English also, and vice versa.

Tie Case in Rank Correlation:

+ m(m -1)/12}

Regression Analysis:

➢ Regression Analysis: It is the mathematical measure of the average relationship between

two or more variables in terms of the original units of the data.
➢ Dependent Variable (Regressed or Explained Variable): The Variable whose value is to
be predicted.
➢ Independent Variable (Regressor or Predictor or Explanatory Variable): The variable
which influences the values or is used for prediction.
➢ Simple Linear Regression: It is the technique for estimation of unknown value of the
dependent variable from the known value of independent variable.
➢ Regression Lines:
If we take the case of two variables X and Y, we shall have two regression lines as the regression
lines of X on Y and the regression lines Y on X. The regression line of Y on X gives the most
probable values of Y for given values of X and the regression line of X on Y gives the most probable
values of X for given values of Y. Thus, we have two regression lines. However when there is either
perfect positive or perfect negative correlation between two variables, the two regression lines will
coincide, i.e we will have one line. The two regression lines are far from each other then, the degree
of correlation is less, and the two regression lines are nearer to each other then, the degree of
correlation is more. If the variables are independent, correlation coefficient (r) is zero and lines of
regression are perpendicular.
It should be noted that the regression lines cut each other at the point of average of X and Y, i.e, if
from the point where both the regression lines cut each other, a perpendicular is drawn on the X-
axis, we will get the mean value of X and if from the point a horizontal line is drawn on the Y-axis,
we will get the mean value of Y.
➢ Regression equations: The Regression equation also known as estimating equations, are
algebraic expressions of the regression lines. There are two regression equations – the regression
equation of X on Y is used to describe the variations in the values of X for given changes in Y and
the regression equation of Yon X is used to describe the variation in the values of Y for given changes
in X.
• Regression Equation of Y on X:
The regression equation of Y on X is expressed as follows:
Y = a + bX
y = 500+100(X)
500
100+500
2*100+500
3*100+500
4*100+500

It may be noted that in this equation ‘y’ is a dependent variable and ‘x’ is independent variable.
‘a’ is Y-intercept and
‘b’ is the slope of the line and it represents the change in Y variable for a unit change in X variable.

The value of numerical constants ‘a’ and ‘b’ are obtained with the help of the best fit curve and this
based on the principal of least square. The principle of least square is that we minimize the sum of
squares of the deviations or the errors of estimates. Thus the deviations between the given observed
values of the variable and their corresponding estimated values are given by the line of best fit.
Thus Line of Regression of Y on X written as

(y − y ) = b (x − x )yx

= r xy
 y
Where b is called regression coefficient of y on x.
yx
 x

• Line of Regression of X on Y:

• X = c+ dY
• ( )
x − x = b xy y − y ( )
Where d= b = r xy  x is called regression coefficient of x on y.
xy
 y

➢ Regression coefficient: It gives the rate of change of the dependent variable when
independent variable changes by one unit. It is also called the slope of the line.
i.e. b yx measures the how much unit change in variable y when x change by one unit.
and b xy
measures the how much unit change in variable x when y change by one unit.
➢ Formulas:
(a) For ungrouped bivariate data(without frequency)
 (x − x ) (y − y )  (x − x ) (y − y )
( )
b =

( )
and b =
 y− y
xy 2

 x− x
yx 2

=
 xy − n x y =
 xy − n x y
b and b
 y − ny  x − nx
xy 2 2 yx 2 2
n  xy −  xy n  xy −  x
y
b = and b =
n  y − ( y ) n  x − ( x )
xy 2 2 yx 2 2

➢ Properties of Regression coefficients:

1. Correlation coefficient is geometric mean of both regression coefficient. i.e r xy
= b .b
yx xy

2. If one regression coefficient is greater than one than other regression coefficient must be
less than one. i.e., bxy  b yx  1
3. Sign of both regression coefficients and correlation coefficients are ALWAYS same.
4. Arithmetic mean of the regression coefficients is greater than the correlation coefficient. i.e.
1
( +
2 bxy b yx
r)
5. Regression coefficients are independent of change of origin but not of scale.
➢ Remarks:
1. Two lines of regression intersect at point of mean values of variable X and Y i.e (X, Y).

point of intersection (X, Y).

X+3y=10 y=5+0x x=-5 y=5 (-5,5)

X: 4,5,6,7,8,9,10 7
Y:10,20,30,40,50,60,70 40

2. When two regression lines are perpendicular to each other than there is no correlation between
two study variables. i.e. rxy = 0

3. When two regression lines are coincides to each other then there is perfect correlation between
two study variables. i.e. rxy =  1

➢ Uses of Regression analysis:

1. To get functional relationship between dependent variable with one or more independent
variables.
2. To provide estimate of values of the dependent variable from values of the independent
variable.
3. To obtain a measure of the error involved in using the regression line as a basis for estimation.
4. Using regression coefficients we can calculate the correlation coefficient.

Y= a+ bX+ € e= Y-Y^

➢ Coefficient of Determination

It is useful to measure the strength of the relationship. This is done by calculating the coefficient
of determination R2. In other words, the coefficient of determination gives the ratio of the explain
variance to the total variance. The coefficient of determination is the square of the coefficient of
correlation i.e r2. Thus.
Explained Variance
Coefficient of determination = r 2 =
Total Variance

Demand r = 0.4 r(square) = 0.4 * 0.4 =0.16 = 16%

Rainfall r = 0.7 r(square) =0.7 *0.7 = 0.49 = 49%

Y: temp r = 0.9 r2 = 0.90.9= 0.81 100 = 81% 19%

Remark :This is true for models with only one independent variable.

R2 has a value of 0.6483. This means 64.83% of the variation in the y is explained by your regression
model. The remaining 35.17% is unexplained, i.e. due to error.

In general the higher the value of R2, the better the model fits the data.
R2 = 1: Perfect match between the line and the data points.
R2 = 0: There are no linear relationship between x and y.
➢ Correlation Analysis Vs. Regression Analysis

Correlation Analysis Regression Analysis

1. Correlation literally means the relationship 1. Literal meaning of Regression is stepping
between two or more variables, which tells back or returning to the average value and is
the movements in one tend to be a mathematical measure expressing the
corresponding movement in the others. average relationship between the variables.
2. Correlation coefficient ‘rxy’ between two 2. Regression analysis is used establish the
variables x and y is a measure of the functional relationship between variables
direction and degree of the linear and predict or estimate the value of the
relationship between two variables which is dependent variable for any given
mutual. It is symmetric, i.e. rxy = ryx independent variable. Hence regression
coefficient are not symmetric, i.e. byx ≠ bxy
3. Correlation need not imply cause and effect 3. Regression analysis clearly indicates the
relationship between variables under study. cause and effect relationship between
variables. The variable corresponding to
cause is taken as independent variable and
the variable corresponding to effect is taken
as dependent variable.
4. Correlation coefficient rxy is relative 4. The regression coefficient byx and bxy are
measure of the linear relationship between absolute measures representing the change
X and Y, i.e. it is independent of the unit of in the value of the dependent variable, for a
unit change in the independent variable.
measurement. It is pure number lying
between ±1.
5. There may be non-sense correlation 5. There is no so such thing like non-sense
between two variable which is due to pure Regression.
chance, e.g. the correlation between the
size of the shoe and the intelligence of a
group of individuals.
6. Correlation analysis is confirm only to the 6. Regression analysis has much wider
study of linear relationship between the application as it studies linear as well as
variables and therefore, has limited non-linear relationship between variables.
applications.

Exercise
Correlation
1. The following data refers to advertisement expense and no. of units sold in last six months.
Ad. Expense (in ‘000 Rs.) 14 21 26 22 15 19

No. of units sold (in lacks) 31 37 50 45 33 39

Calculate the correlation coefficient and comment on the result. Also draw a scatter diagram and
interpret it.
2. To study the effectiveness of an advertisement, a survey is conducted by calling people at random
by asking the number of advertisements read or seen in a week and the number of items purchased
in that week.
Add. Seen/read 5 10 4 0 2 7 3 6
No. of items purchased 10 12 5 2 1 3 4 8
Calculate the correlation coefficient and comment on the result. Estimate the value of
advertisement expense for 40 lakhs sold units?

3. From the following data, find out the correlation coefficient between heights of fathers and sons.
Heights of fathers(inches) 65 66 67 67 68 69 70 72
Heights of sons(inches) 67 68 65 68 72 72 69 71
4. Compute Karl Pearson’s coefficient of correlation in the following series relating to cost of living
and wages.
Wages (Rs.) 100 101 102 100 99 98 97 98 96 95
Cost of living 98 99 99 97 95 92 95 94 90 91
5. A prognostic test in Mathematics was given to 10 students who were about to bring a course in
statistics. The scores (X) in their test were examined in relations to score (Y) in the final examination
in Statistics. The following result were obtained:
∑x = 71, ∑y = 70, ∑x2 = 555, ∑y2 = 526, and ∑xy =527.
Find the coefficient of correlation between x and y.
6. Calculate correlation coefficient from the following results:
N=10, ∑ (x- 14)2 =180, ∑ (y – 15)2 = 215, and ∑(x – 14 )(y – 15) = 60.

7. If coefficient of correlation between X and Y is 0.32 and their covariance is 7.86.

The variance of X is 10. Find the standard deviation of Y.

r = 0.32
cov(x, y) = 7.86 v(x) 10 sd(x) = sq root 10= 3.162
cov( x, y )
r =
 x y

0.32= 7.86/ (3.162* sd y)

sd y = 7.86/ (3.162*0.32) = 7.768
8. If coefficient of correlation between X and Y is -0.92 then find coefficient of correlation between (i)
U = 2X + 6 and V = 3Y-15. (ii) U= 2X+6 and V = -3Y + 15
iii) U= - 2X+6 and V = -3Y + 15

i) r(u,v) = -0.92 ii)r(u,v) = -(-0.92) =0.92 iii)r(u,v) = -0.92

9. From the following data, compute the compute the coefficient of correlation and interpret it.
x y
No. of pairs of observations 15 15
Arithmetic mean 25 18
Standard deviation 3.01 3.03
Sum of squares of deviations from mean 136 138
Sum of product of deviations of x and y from 122
their respective means

=
 (x − x ) (y − y )
( ) ( )
r xy

 x− x  y − y
2 2
= 122/{ sqrt(136)* sqrt(138)}= 0.89

10. The following table gives bivariate frequency distribution of age and marks of 100 students in a test.

Marks Age (in years)

18 19 20 21
10-20 4 2 2 -
20-30 5 4 6 4
30-40 6 8 10 11
40-50 4 4 6 8
50-60 - 2 4 4
60-70 - 2 3 1
Calculate the correlation coefficient.

11. Calculate the coefficient of correlation and interpret it.

Sales Advertising Expenditure
revenue 5-15 15-25 25-35 35-45
75-125 3 4 4 8
125-175 8 6 5 7
175-225 2 2 3 4
225-275 2 3 2 2

12. Following is the distribution of students according to their heights and weights:
Height (in Weight x (in lbs.)
inches) 90-100 100-110 110-120 120-130
50-55 4 7 5 2
55-60 6 10 7 4
60-65 6 12 10 7
65-70 3 8 6 3
Find out the correlation coefficient between height and weight.

Regression
13. Given the following information:
Year 1999 2000 2001 2002 2003 2004
Research expense (in ‘000 Rs.) 5 11 4 5 3 2
(X)
Annual Profit ( in ‘000 Rs.) (Y) 31 40 30 34 25 20
(i) Develop the estimating equation that best describes the given data. Y on X -regression eq.
(ii) Estimate the annual profit when research expense made will 7000.
(iii) How much variation in the annual profits (Y) is explained by the variation in the research
expenditure(X)? –coeff. of determination – r2
14. From the following data of the age of husband and the age of wife, form two regression lines.
Calculate the husband’s age when wife’s age is 16. Calculate wife’s age when husband’s age is 25.
Husband’s 36 23 27 28 28 29 30 31 33 35
age
Wife’s 29 18 20 22 27 21 29 27 29 28
age
15. Given the following results for the height (x) and weight (y) in appropriate units of 1000 students.
Mean of X = 68, mean of y = 150, σx =2.5, σy =20, and r=0.6.
Obtain the equations of two regression lines. Estimate height of a student whose weight 200 units
and also estimate weight of a student whose height is 60 units.
16. Find out the regression equation showing the regression of capacity utilization on product from the
following data.
Average Standard deviation
Production (in lack units ) 35.6 10.5
Capacity utilization (in %) 84.8 8.5
r = 0.62
Estimate the production, when capacity utilization is 70%.
17. To know what relationship exist between unemployment and suicide attempts, a sociologist surveyed
twelve citied and obtained the following data.
city 1 2 3 4 5 6 7 8 9 10 11 12
Unemployment rate percent 7.3 6.4 6.2 5.5 6.4 4.7 5.8 7.9 6.7 9.6 10.3 7.2
No. of suicide attempts per 22 17 9 8 12 5 7 19 13 29 33 18
1000 residents
(i) Develop the estimating equation that best describes the given data.
(ii) Estimate attempted suicide rate when unemployment rate happens to be 6%.
(iii) Calculate coefficient of determination and interpret it.
18. The equations of two regression lines between two variables are expressed as 2x – 3y = 0 and 4y -
5x -8 = 0.
(i) Identify which of the two can be called regression of y on x and of x on y.
(ii) Find mean of x and mean of y.
(iii)Find coefficient of correlation between x and y.

LET 2x – 3y = 0 IS X ON Y REGRESSION EQUATION X = c +d Y X= 3/2 y bxy =3/2 = 1.5

4y - 5x -8 = 0 IS Y ON X REGRESSION EQUATION Y= a+ bX y =5/4 x + 2 byx=5/4 = 1.25
Actual regression coefficient : byx = 2/3 = 0.6666 and bxy= 4/5 =0.8 r = +_ sqrt of (bxy. byx)

19. Find the regression equation of x on y and the coefficient of correlation from the following data.
∑x = 60, ∑y = 40, ∑x2 = 4160, ∑y2 = 1720, and ∑xy = 1150 and N = 10.
20. From the following data, find out the probable yield when the rainfall is 29”.
Rainfall Yield
Mean 25” 40 units per hectare
Standard deviation 3” 6 units per hectare
Correlation coefficient between rainfall and production = 0.8
21. The following are the two regression equations. Find the correlation coefficient and mean of the
variables. If s.d. of x is 1.2 then find variance of y.
8x - 10y + 61 = 0 and 40x -18 y – 2/4.
22. A student obtained the following two regression equations. Do yo agree with him?
6x = 15Y + 21 and 21X + 14 Y=56
23. Calculate lines of regressions from the following data.
Sales Advertising Expenditure
revenue 5-15 15-25 25-35 35-45
75-125 3 4 4 8
125-175 8 6 5 7
175-225 2 2 3 4
225-275 2 3 2 2
24. A business Statistics student has taken a random sample of starting salaries and college grade-point
averages for some recently graduated friends of his, to check are good grades in college important
for earning a good salary? The data are as follow:
Starting salary 36 30 30 24 27 33 21 27
($ thousand)
Grade-point 4.0 3.0 3.5 2.0 3.0 3.5 2.5 2.5
average
(i) Plot the scatter diagram and interpret it.
(ii) Develop the estimating equation that best describes these data.
(iii) Predict the starting salary for a student having grade point average 3.5.

25. Fill in the blanks.

(i) If the variables X and Y are independent, the value of regression coefficient
is________.
(ii) The signature property in regression means that the sign of b xy, byx and rxy are
________.
(iii) The property of
1
2
(b + b )  r is known as ________.
xy yx

(iv) r xy
= b xb
yx xy
is known as the ________ property.
(v) If r =1, the relation between bxy and byx is ________.
(vi) If the regression coefficient bxy 1 then byx is ________.
(vii) The paired values plotted on a graph marked by points leads to a ________
diagram.
(viii) The independent variables in regression equation are often called ________
variables.
(ix) The measure of change in independent variable corresponding to an unit change
in independent variable is called ________.
'
(x) If each value of both the variables X and Y is divided by 5, then b yx from coded
values will be ________as byx.
(xi) The range of Pearson’s coefficient of correlation is ________.
(xii) Product moment correlation is called ________.
(xiii) If simple correlation coefficient is zero then regression coefficient is equal to
________.
(xiv) If the regression line of Y on X is 2Y = 3X-6, the estimated value of Y for given
value of X=10 is ________.
(xv) If the lines of regression of Y on X is 4X-5Y +33 =0 and of X on Y is 20X-9Y-
107=0, the mean value x and y are _______.

Assignment On Correlation
100% (1)
Assignment On Correlation
7 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Volume 24 10 NovDec2013 PDF
No ratings yet
Volume 24 10 NovDec2013 PDF
84 pages
InfoWorks ICM Overview 60 Mins PDF
0% (2)
InfoWorks ICM Overview 60 Mins PDF
31 pages
Correlation Analysis
No ratings yet
Correlation Analysis
20 pages
Correlation Analysis: Concept and Importance of Correlation
No ratings yet
Correlation Analysis: Concept and Importance of Correlation
8 pages
Business Statistics Unit 3-5
No ratings yet
Business Statistics Unit 3-5
113 pages
Concept of Correlation (1)
No ratings yet
Concept of Correlation (1)
17 pages
Correlation Regreesion Sums
No ratings yet
Correlation Regreesion Sums
50 pages
Business Project 12 Content
No ratings yet
Business Project 12 Content
33 pages
EContent_11_2024_12_22_14_00_20_UnitIVCorrelationpdf__2024_11_26_15_27_55
100% (1)
EContent_11_2024_12_22_14_00_20_UnitIVCorrelationpdf__2024_11_26_15_27_55
13 pages
Correlation and Regression
No ratings yet
Correlation and Regression
64 pages
Core La Ti On
No ratings yet
Core La Ti On
12 pages
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
No ratings yet
Correlation: Self Instructional Study Material Programme: M.A. Development Studies
21 pages
Scatter plot
No ratings yet
Scatter plot
33 pages
Unit 3 Correlation and Regression
No ratings yet
Unit 3 Correlation and Regression
27 pages
Correlation
No ratings yet
Correlation
22 pages
Strategic Management
No ratings yet
Strategic Management
114 pages
Business Statistics Unit 4 Correlation and Regression
No ratings yet
Business Statistics Unit 4 Correlation and Regression
27 pages
CORRELATION
No ratings yet
CORRELATION
22 pages
Correlation & Regression Analysis
No ratings yet
Correlation & Regression Analysis
16 pages
Business Statistics Unit 4 Correlation and Regression
No ratings yet
Business Statistics Unit 4 Correlation and Regression
27 pages
14 Correlation 08 02 2024
No ratings yet
14 Correlation 08 02 2024
18 pages
Correlation: Definitions
No ratings yet
Correlation: Definitions
24 pages
1504677559module-33 Quadrant-I
No ratings yet
1504677559module-33 Quadrant-I
17 pages
Earthquake Microzonation of Yogyakarta City
No ratings yet
Earthquake Microzonation of Yogyakarta City
23 pages
Business Statistic-Correlation and Regression
No ratings yet
Business Statistic-Correlation and Regression
30 pages
Co Rrelation Eng
No ratings yet
Co Rrelation Eng
7 pages
MA IF - Quantitative Techniques For Business
No ratings yet
MA IF - Quantitative Techniques For Business
114 pages
Correlation Analysis PDF
No ratings yet
Correlation Analysis PDF
30 pages
Correlation 805deee567bf3bca405e2e973070a021
No ratings yet
Correlation 805deee567bf3bca405e2e973070a021
18 pages
Correlation and Regression: Jaipur National University
No ratings yet
Correlation and Regression: Jaipur National University
32 pages
Unit 3
No ratings yet
Unit 3
24 pages
Statistics
No ratings yet
Statistics
21 pages
Notes For Correlation Unit - 3 Business Statistics
No ratings yet
Notes For Correlation Unit - 3 Business Statistics
21 pages
Hypothesis Testing With Anno 1694271763686
No ratings yet
Hypothesis Testing With Anno 1694271763686
45 pages
Business Statistics Chapter 5
No ratings yet
Business Statistics Chapter 5
43 pages
QT-Correlation and Regression-1
No ratings yet
QT-Correlation and Regression-1
3 pages
1 Correlation
No ratings yet
1 Correlation
5 pages
Basics of Correlation_nh
No ratings yet
Basics of Correlation_nh
6 pages
Business Statistics Project On Correlation: Submitted by N.Bavithran BC0140018
No ratings yet
Business Statistics Project On Correlation: Submitted by N.Bavithran BC0140018
17 pages
Peter
No ratings yet
Peter
48 pages
Correlation Analysis
No ratings yet
Correlation Analysis
30 pages
4-Correlation and Regression - Introduction and Motivation-23-01-2025
No ratings yet
4-Correlation and Regression - Introduction and Motivation-23-01-2025
20 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
100 pages
Correleation Analysis: Chapter - 2
No ratings yet
Correleation Analysis: Chapter - 2
23 pages
05correlation Lecture
No ratings yet
05correlation Lecture
14 pages
Correlation
No ratings yet
Correlation
7 pages
UNIT-3 Correlation Is A Bivariate Analysis That Measures The Strength of Association
No ratings yet
UNIT-3 Correlation Is A Bivariate Analysis That Measures The Strength of Association
4 pages
Correlation
No ratings yet
Correlation
27 pages
1929605138eco Ma1 23 February
No ratings yet
1929605138eco Ma1 23 February
4 pages
Correlation Maths
No ratings yet
Correlation Maths
27 pages
UNIT 4
No ratings yet
UNIT 4
34 pages
CH - 2 - Econometrics UG
No ratings yet
CH - 2 - Econometrics UG
15 pages
Correlation
No ratings yet
Correlation
12 pages
Chapter 3 - CORRELATION THEORY
No ratings yet
Chapter 3 - CORRELATION THEORY
9 pages
Correlation and Regression
100% (1)
Correlation and Regression
17 pages
Correlation Analysis
No ratings yet
Correlation Analysis
48 pages
MAT2001-SE Course Materials - Module 3 PDF
No ratings yet
MAT2001-SE Course Materials - Module 3 PDF
32 pages
Chapter 6 PDF
No ratings yet
Chapter 6 PDF
3 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Regression Analysis: A Journey from Simple to Complex
From Everand
Regression Analysis: A Journey from Simple to Complex
Pasquale De Marco
No ratings yet
Time Urgency-The Construct and Its Measurement
No ratings yet
Time Urgency-The Construct and Its Measurement
15 pages
Amplified Voice Changer Using A Raspberry Pi Zero
No ratings yet
Amplified Voice Changer Using A Raspberry Pi Zero
9 pages
Pressure Slides
100% (1)
Pressure Slides
43 pages
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
No ratings yet
Types of Designs 2.1 The Design Can Be Classified in Many Ways. On The Basis of Knowledge, Skill and
5 pages
Social Science-Course Outline
No ratings yet
Social Science-Course Outline
6 pages
RPT Sains Ting 5
No ratings yet
RPT Sains Ting 5
7 pages
Corrected Maths Test
No ratings yet
Corrected Maths Test
2 pages
Database Interview Questions
No ratings yet
Database Interview Questions
21 pages
Dewatering SRC.R1 - Final
No ratings yet
Dewatering SRC.R1 - Final
19 pages
2015 Technical Sessions
No ratings yet
2015 Technical Sessions
3 pages
How To Calibrate ECU and Engine - SAE
100% (4)
How To Calibrate ECU and Engine - SAE
42 pages
8086 Bus Cycle - System Configuration
No ratings yet
8086 Bus Cycle - System Configuration
6 pages
Pmu Rev.5 For D-20C
No ratings yet
Pmu Rev.5 For D-20C
8 pages
Conservation Laws: Reynolds Transport Theorem
No ratings yet
Conservation Laws: Reynolds Transport Theorem
12 pages
Curriculum Theory and Practice
100% (1)
Curriculum Theory and Practice
17 pages
Maths Class Ix Chapter 04 05 and 06 Practice Paper 02
100% (1)
Maths Class Ix Chapter 04 05 and 06 Practice Paper 02
4 pages
Developing Metacognition in Teaching Literature
No ratings yet
Developing Metacognition in Teaching Literature
38 pages
Summative Assessment-1 2014-2015: LM AOD ODB OAC ODB
No ratings yet
Summative Assessment-1 2014-2015: LM AOD ODB OAC ODB
4 pages
Jsa For PCC
No ratings yet
Jsa For PCC
6 pages
Frames Tables Forms HTML
No ratings yet
Frames Tables Forms HTML
27 pages
Forensic Textile Science 1st Edition Carr 2024 Scribd Download
100% (2)
Forensic Textile Science 1st Edition Carr 2024 Scribd Download
65 pages
Smart University
No ratings yet
Smart University
43 pages
Iris Recognition Using Feature Extraction of Box Counting Fractal Dimension
No ratings yet
Iris Recognition Using Feature Extraction of Box Counting Fractal Dimension
17 pages
Sustainability Guide PDF
No ratings yet
Sustainability Guide PDF
120 pages
(Ebook) Rhetoric and the Rule of Law: A Theory of Legal Reasoning (Law, State, and Practical Reason) by MacCormick, Sir Neil ISBN 9780198268789, 9780199571246, 0198268785, 0199571244 - Experience the full ebook by downloading it now
100% (1)
(Ebook) Rhetoric and the Rule of Law: A Theory of Legal Reasoning (Law, State, and Practical Reason) by MacCormick, Sir Neil ISBN 9780198268789, 9780199571246, 0198268785, 0199571244 - Experience the full ebook by downloading it now
55 pages
CV - Amarjeet Chitkara
No ratings yet
CV - Amarjeet Chitkara
1 page
Data Analysis Central Tendency
No ratings yet
Data Analysis Central Tendency
3 pages
Scrum Presentation 2017
No ratings yet
Scrum Presentation 2017
36 pages

Unit 2 - (A) Correlation & Regression

Uploaded by

Unit 2 - (A) Correlation & Regression

Uploaded by

F.Y.B.B.

2) Linear and Non-Linear Correlation:

1. Scatter Diagram Method

Perfect Positive High degree Low degree

Perfect Negative High degree Low degree

No correlation Non linear correlation

Bivariate table with frequency : (for N pairs )

BIVARIATE FREQUENCY TABLE

Here fxy is the frequency of the pair (x,y)

Interpretation: If r xy = +1 means perfect positive correlation between variables x and y,

(b) for grouped bivariate data (frequency data)

An example of calculating Spearman's correlation

We then complete the following table:

Maths Rank Rank

Where d = difference between ranks and d2 = difference squared.

We then calculate the following:

Tie Case in Rank Correlation:

➢ Regression Analysis: It is the mathematical measure of the average relationship between

➢ Properties of Regression coefficients:

point of intersection (X, Y).

X+3y=10 y=5+0x x=-5 y=5 (-5,5)

➢ Uses of Regression analysis:

Demand r = 0.4 r(square) = 0.4 * 0.4 =0.16 = 16%

Y: temp r = 0.9 r2 = 0.9*0.9= 0.81 *100 = 81% 19%

Correlation Analysis Regression Analysis

No. of units sold (in lacks) 31 37 50 45 33 39

7. If coefficient of correlation between X and Y is 0.32 and their covariance is 7.86.

0.32= 7.86/ (3.162* sd y)

i) r(u,v) = -0.92 ii)r(u,v) = -(-0.92) =0.92 iii)r(u,v) = -0.92

Marks Age (in years)

11. Calculate the coefficient of correlation and interpret it.

LET 2x – 3y = 0 IS X ON Y REGRESSION EQUATION X = c +d Y X= 3/2 y bxy =3/2 = 1.5

25. Fill in the blanks.

You might also like

Y: temp r = 0.9 r2 = 0.90.9= 0.81 100 = 81% 19%