LINEAR
REGRESSION &
CORRELATION
ANALYSIS
CHAPTER 6
CORRELATION ANALYSIS
Correlation is a statistical method used to
determine the strength between two
variables. (x and y)
The value that used to estimate the strength is
called the correlation coefficient.
The symbol for the population correlation
coefficient is (rho).
The symbol for the sample correlation
coefficient is r.
Therange of the correlation coefficient is
from -1 to +1.
Thepositive (+) and negative (-) signs
shows the direction of the relationship
between two variables.
Negative values indicate an inverse
relationship and positive values indicate a
direct relationship.
• Values of -1.00 or +1.00 indicate perfect and
strong correlation.
• Values close to 0.0 indicate weak correlation
Correlation Coefficient
Refer Page 254 : Example 7
Representati
Sales Calls Units Sold
ve
Correlation Coefficient
n XY X Y
r
n X X n Y Y
2 2 2 2
109661 199 408
r
104681 199 1020510 408
2 2
r 0.924
What does this correlation means?
First, it is positive, so we see there is a direct
relationship between number of sales calls and
number of units sold.
The value of 0.924 is really close to 1.00, so
the relationship between number of sales calls
and number of units sold is strong.
So we conclude that there is a strong positive
relationship between sales calls and number of
units sold.
THE SIGNIFICANCE OF THE
CORRELATION COEFFICIENT
Test
on the value of correlation coefficient
need to be done to see whether there is
a significance relationship between the
two variables.
We will test H0 : 0
H1 : 0
Step 1 : Hypotheses
H0 : 0
H1 : 0
Step 2 : Test Statistics
r n2
t
1 r 2
0.924 10 2
1 0.924
2
6.8345
Step 3 : Critical Value
df = n – 2 = 10 – 2 = 8
= 0.05/2 two-tailed test
CV = t
0.05 / 2 ,8 2.306
Step 4 : Decision
Reject H0
Conclusion
There is a significance relationship between
number of sales calls and number of units
sold.
Coefficient of Determination, r2
The proportion of the total variation in the
dependent variable, Y that is explained by the
variation in the independent variable, X.
In previous example, r = 0.924.
So r2 = (0.924)2 = 0.854
85.4% variation in the units sold is explained by
the variation in the sales calls.
The greater value of r2 means that the model
can be a better predictor to predict the value
of Y.
REGRESSION ANALYSIS
Regression is a statistical method used to
describe the nature of the relationship
between variables – that is: positive or
negative, linear or nonlinear.
Regression analysis also can model the
relationship between variables.
Thedependent variable is Y and the
independent variable is X.
Analysis on a linear relationship between
Y and X is called a simple linear regression
analysis.
Ifthe value of the correlation coefficient is
significant, the next step is to determine
the equation of the regression line which is
referred to as the “best-fitting” line.
To get the equation, we will use the least
square method.
Refer to Example 1,
Page 241
Units Sold
80
70
60
Units 50
Sold 40
30
20
10
0
0 10 20 30 40
Sales Calls
Fromthe plot, the data seems to be linear
and the appropriate model to be
construct is y = mx + c
Computing the Slope of the
Line and the Y-intercept
n XY X Y
Slope b
n X 2 X
2
Y X
Y-intercept a b
n n
Refer to Self Example 2,
Page 244
b 2.1387 a 1.7601
Yˆ a bX Yˆ 1.7601 2.1387 X
LINEAR REGRESSION EQUATION
a -- The intercept with the y-axis is below the
origin (0,0).
b – An increase of one call made, will result in
an increase of 2.1387 (= 2) units sold.
Estimates the number of units
sold when 20 calls was made
in a month.
Yˆ 1.7601 2.1387 X
Yˆ 1.7601 2.138720
Yˆ 41.0139
Linear Regression by SPSS
Testing the Significance of the Model @
Global Test – using output
Step 1 : H0 : 0
H1 : 0
Step 2 : p-value = 0.000 (Refer to the ANOVA table)
Step 3 : compare with = 0.05
> p-value
Reject H0
Step 4 : Conclusion
**There is a significance relationship between
advertising expenses and sales revenue.
Testing the Significance of the
Model @ Global Test – manually
Step 1 : H0 : 0
H1 : 0
Step 2 : Test Value, F = 46.597(Refer to the ANOVA table)
Step 3 : Critical value,
F, 0.05, 1, 8 = 5.32
Step 4 : Reject H0 because TV > CV
Step 5 : Conclusion
**There is a significance relationship between
advertising expenses and sales revenue.