C6 Regression
C6 Regression
[email protected]
Recall: Covariance
n
( x X )( y
i i Y )
cov ( x , y ) i 1
n 1
Correlation coefficient
Pearson’s Correlation
cov ariance( x, y )
Coefficient is
standardized
r
covariance (unitless):
var x var y
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y
X X X
r = -1 r = -.6 r=0
Y
Y Y
X X X
r = +1 r = +.3 r=0
Linear Correlation
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Linear Correlation
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Linear Correlation
No relationship
X
Linear regression
Prediction
What’s Slope?
A slope of 2 means that every 1-unit change in X yields a
2-unit change in Y.
Predicted value for an individual…
yˆ b0 b1 x + random errori
180
160
140
120
100
Man-Hour
80
60
(Xi ,Yi )
40
20
0
0 10 20 30 40 50 60 70 80 90
Lot size
SSE ( yi yˆ i ) 2 y b0 b1 x
i 1 i 1
Minimise the sum of square of errors
Using Calculus
( x i x )( y i y ) n xi yi x i yi
b1 i 1
n
i 1
n
i 1
n
i 1
i 1
( xi x ) 2 n x i2 ( x i ) 2
i 1 i 1
or
Sy
b1 r b 0 y b1 x
Sx
The Fit Parameters
Define sums of squares:
Sy
b1 r
Sx
Estimation of Mean Response
Fitted regression line can be used to estimate the
mean value of y for a given value of x.
Example
The weekly advertising expenditure (x) and weekly sales
(y) are presented in the following table.
y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
Point Estimation of Mean Response
From previous table we have:
n 10 x 564 x 32604 2
y 14365 xy 818755
The least squares estimates of the regression coefficients
are:
n xy x y 10(818755) (564)(14365)
b1 10.8
n x 2 ( x ) 2 10(32604 ) (564) 2
1 1
s y. x ( yi yˆ i ) 2
2 2
e
n2 n2
i
s y.x s y.x
2
To estimate standard error, use
Sy/x
Sy/x
Sy/x
Sy/x
Sy/x
Sy/x
Regression Standard Error
Residuals
0.5
points with no 0
individual observations 0 20 40 60
-0.5
or systematic change as
-1
x increases.
Degree Days
Residual plots