0% found this document useful (0 votes)
1 views9 pages

31 Mathematics Correlation Regression

The document provides an overview of correlation and regression in statistics, detailing concepts such as covariance, types of correlation, and the equations for lines of regression. It explains the relationships between variables, including perfect, positive, and negative correlations, and introduces key formulas for calculating correlation coefficients and regression coefficients. Additionally, it covers important points regarding the properties of regression coefficients and their implications in statistical analysis.

Uploaded by

PSINGHUSER01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views9 pages

31 Mathematics Correlation Regression

The document provides an overview of correlation and regression in statistics, detailing concepts such as covariance, types of correlation, and the equations for lines of regression. It explains the relationships between variables, including perfect, positive, and negative correlations, and introduces key formulas for calculating correlation coefficients and regression coefficients. Additionally, it covers important points regarding the properties of regression coefficients and their implications in statistical analysis.

Uploaded by

PSINGHUSER01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Mathematics

Correlation & Regression


Table of Content

1. Introduction.
2. Covariance.
3. Correlation.
4. Rank Correlation.
5. Linear regression.
6. Equations of lines of regression.
7. Angle between two lines of regression.
8. Important points about regression coefficients bxy and
byx.

1
1. Introduction.

“If it is proved true that in a large number of instances two variables tend always to fluctuate in the
same or in opposite directions, we consider that the fact is established and that a relationship exists. This
relationship is called correlation.”

(1) Univariate distribution: These are the distributions in which there is only one variable such as the
heights of the students of a class.

(2) Bivariate distribution: Distribution involving two discrete variable is called a bivariate distribution.
For example, the heights and the weights of the students of a class in a school.

(3) Bivariate frequency distribution: Let x and y be two variables. Suppose x takes the values
x 1 , x 2 ,....., x n and y takes the values y1 , y 2 ,....., yn , then we record our observations in the form of
ordered pairs (x 1 , y 1 ) , where 1  i  n,1  j  n . If a certain pair occurs fij times, we say that its frequency is
fij .
The function which assigns the frequencies fij ’s to the pairs (x i , y j ) is known as a bivariate frequency
distribution.

2. Covariance.

Let (x 1 , x i );i  1,2,....., n be a bivariate distribution, where x 1 , x 2 ,....., x n are the values of variable x and
y 1 , y 2 ,....., y n those of y. Then the covariance Cov (x, y) between x and y is given by

n n n n

   y
1 1 1 1
Cov (x , y)  (x i  x )(y i  y) or Cov (x , y )  (x i y i  x y ) where, x  x i and y  i are
n i1 n i1 n i1 n i1

means of variables x and y respectively.

Covariance is not affected by the change of origin, but it is affected by the change of scale.

2
3. Correlation.

The relationship between two variables such that a change in one variable results in a positive or
negative change in the other variable is known as correlation.

(1) Types of correlation

(i) Perfect correlation: If the two variables vary in such a manner that their ratio is always constant, then
the correlation is said to be perfect.

(ii) Positive or direct correlation: If an increase or decrease in one variable corresponds to an increase
or decrease in the other, the correlation is said to be positive.

(iii) Negative or indirect correlation: If an increase or decrease in one variable corresponds to a


decrease or increase in the other, the correlation is said to be negative.

(2) Karl Pearson's coefficient of correlation: The correlation coefficient r(x , y) , between two variable x
 n   n  n 
Cov (x , y ) Cov (x , y )

 i1

n x i y i    x i   y i 
  i1   i1 

and y is given by, r(x , y )  or , r(x , y ) 
Var(x ) Var(y)  x y n
 n 
2 n
 n 
2

n x
i1
i
2

  x i 
 i1 
n 
i1
2

y i   y i 
 i1 
(x  x )(y  y )  dxdy
r  .
(x  x )2 (y  y )2  dx 2  dy 2

 dx . dy
 dxdy  n
(3) Modified formula: r  , where dx  x  x ; dy  y  y

  dx    dy   dy 
2 2


 dx 2

n
 
n
2

   
Cov (x , y ) Cov (x , y )
Also rxy   .
 x y var(x ). var(y )

3
4. Rank Correlation.

Let us suppose that a group of n individuals is arranged in order of merit or proficiency in possession of
two characteristics A and B.
These rank in two characteristics will, in general, be different.
For example, if we consider the relation between intelligence and beauty, it is not necessary that a
beautiful individual is intelligent also.

Rank Correlation:   1 
6 d 2

, which is the Spearman's formulae for rank correlation coefficient.


n(n 2  1)
Where d 2
= sum of the squares of the difference of two ranks and n is the number of pairs of
observations.

Note: We always have,  d   (x


i i  yi )   x  y
i i  n(x )  n(y )  0 , ( x  y )
If all d's are zero, then r  1 , which shows that there is perfect rank correlation between the variable and which is
maximum value of r.
If however some values of x i are equal, then the coefficient of rank correlation is given by
  1  
6  d 2    (m 3  m )
 12 
r 1  
n(n  1)
2

Where m is the number of times a particular x i is repeated.

Positive and Negative rank correlation coefficients

Let r be the rank correlation coefficient then, if


 r  0 , it means that if the rank of one characteristic is high, then that of the other is also high or if the
rank of one characteristic is low, then that of the other is also low. e.g., if the two characteristics be
height and weight of persons, then r  0 means that the tall persons are also heavy in weight.

 r  1 , it means that there is perfect correlation in the two characteristics i.e., every individual is getting the
same ranks in the two characteristics. Here the ranks are of the type (1, 1), (2, 2),....., (n, n).

 r  1 , it means that if the rank of one characteristics is high, then that of the other is low or if the rank
of one characteristics is low, then that of the other is high. e.g., if the two characteristics be richness
and slimness in person, then r  0 means that the rich persons are not slim.

4
 r  1 , it means that there is perfect negative correlation in the two characteristics i.e, an individual
getting highest rank in one characteristic is getting the lowest rank in the second characteristic. Here
the rank, in the two characteristics in a group of n individuals are of the type (1, n), (2, n  1),....., (n, 1) .

 r  0 , it means that no relation can be established between the two characteristics.

Important Tips

 If r  0 , the variable x and y are said to be uncorrelated or independent.


 If r  1 , the correlation is said to be negative and perfect.
 If r  1, the correlation is said to be positive and perfect.
 Correlation is a pure number and hence unitless.
 Correlation coefficient is not affected by change of origin and scale.
 If two variate are connected by the linear relation x  y  K , then x, y are in perfect indirect
correlation. Here r  1 .
 x2   y2
 If x, y are two independent variables, then  (x  y, x  y)  .
 x2   y2

 u v  n  u . v
1
i i i i
 r(x , y )  , where ui  x i  A, vi  yi  B .
 u  n  u   v  n  v 
2 1 2 1 2
2
i i i i

5
Regression

5. Linear Regression.

If a relation between two variates x and y exists, then the dots of the scatter diagram will more or less be
concentrated around a curve which is called the curve of regression. If this curve be a straight line, then
it is known as line of regression and the regression is called linear regression.
Line of regression: The line of regression is the straight line which in the least square sense gives the
best fit to the given frequency.

6. Equations of lines of Regression.

(1) Regression line of y on x: If value of x is known, then value of y can be found as


Cov (x , y ) y
y y  (x  x ) or y  y  r (x  x )
 x2 x
(2) Regression line of x on y: It estimates x for the given value of y as
Cov (x , y ) x
xx  (y  y ) or x  x  r (y  y )
 2
y y
r y Cov (x , y )
(3) Regression coefficient: (i) Regression coefficient of y on x is b yx  
x  x2
r x Cov (x , y )
(ii) Regression coefficient of x on y is b xy   .
y  2
y

7. Angle between Two lines of Regression.

Equation of the two lines of regression are y  y  byx (x  x ) and x  x  b xy (y  y )

y
We have, m 1  slope of the line of regression of y on x = b yx  r.
x
1 
m 2  Slope of line of regression of x on y =  y
b xy r.  x

6
 y r y

m 2  m1 r x  x (  r 2 y ) x (1  r 2 ) x  y
 tan     = y 2   .
1  m 1m 2 r y  y r x  r y2 r( x2   y2 )
1 .
 x r x
Here the positive sign gives the acute angle  , because r 2  1 and  x ,  y are positive.

1  r 2  x y
 tan   . 2 .....(i)
r  x   y2

Note: If r  0 , from (i) we conclude tan    or    / 2 i.e., two regression lines are at right angels.
If r  1 , tan   0 i.e.,   0 , since  is acute i.e., two regression lines coincide.

8. Important points about Regression coefficients bxy and byx .

(1) r  byx .b xy i.e. the coefficient of correlation is the geometric mean of the coefficient of regression.

(2) If, then b xy  1 i.e. if one of the regression coefficient is greater than unity, the other will be less than unity.

(3) If the correlation between the variable is not perfect, then the regression lines intersect at (x , y ) .

1
(4) b yx is called the slope of regression line y on x and is called the slope of regression line x on y.
b xy

(5) byx  b xy  2 byx b xy or byx  b xy  2r , i.e. the arithmetic mean of the regression coefficient is greater
than the correlation coefficient.
(6) Regression coefficients are independent of change of origin but not of scale.

 y2
(7) The product of lines of regression’s gradients is given by .
 x2

(8) If both the lines of regression coincide, then correlation will be perfect linear.

(9) If both b yx and b xy are positive, the r will be positive and if both b yx and b xy are negative, the r will be
negative.

7
Important Tips


 If r  0 , then tan is not defined i.e.   . Thus the regression lines are perpendicular.
2
 If r  1 or 1 , then tan  = 0 i.e.  = 0. Thus the regression lines are coincident.
bc  d ad  b
 If regression lines are y  ax  b and x  cy  d , then x  and y  .
1  ac 1  ac

If byx, bxy and r  0 then and if bxy, byx and r  0 then (b xy  b yx )  r .


1 1
 (b xy  b yx )  r
2 2
 Correlation measures the relationship between variables while regression measures only the cause and
effect of relationship between the variables.
 If line of regression of y on x makes an angle  , with the +ive direction of X-axis, then tan   byx .
 If line of regression of x on y makes an angle  , with the +ive direction of X-axis, then cot   b xy .

9. Standard error and Probable error.

(1) Standard error of prediction: The deviation of the predicted value from the observed value is

known as the standard error prediction and is defined as S y  



  (y  y p)
2 




n 

Where y is actual value and y p is predicted value.
In relation to coefficient of correlation, it is given by
(i) Standard error of estimate of x is S x   x 1  r 2

(ii) Standard error of estimate of y is S y   y 1  r 2 .

(2) Relation between probable error and standard error: If r is the correlation coefficient in a sample
1  r2
of n pairs of observations, then its standard error S.E. (r)  and probable error P.E. (r) = 0.6745
n

1r 2 
(S.E.)= 0.6745   . The probable error or the standard error are used for interpreting the coefficient
 n 
of correlation.
(i) If r  P. E.(r) , there is no evidence of correlation.
(ii) If r  6 P. E.(r) , the existence of correlation is certain.
The square of the coefficient of correlation for a bivariate distribution is known as the “Coefficient of
determination”.

You might also like