0% found this document useful (0 votes)
2 views27 pages

C6 Regression

The document provides an overview of linear regression analysis, including key concepts such as covariance, correlation coefficients, and the regression line equation. It explains the purpose of regression analysis for description, control, and prediction, and details the process of estimating regression coefficients using the least squares method. Additionally, it discusses the importance of residuals, standard error, and the use of ANOVA for comparing group means.

Uploaded by

Sankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views27 pages

C6 Regression

The document provides an overview of linear regression analysis, including key concepts such as covariance, correlation coefficients, and the regression line equation. It explains the purpose of regression analysis for description, control, and prediction, and details the process of estimating regression coefficients using the least squares method. Additionally, it discusses the importance of residuals, standard error, and the use of ANOVA for comparing group means.

Uploaded by

Sankar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Linear Regression Analysis

Dr. Linta Rose

[email protected]
Recall: Covariance
n

 ( x  X )( y
i i Y )
cov ( x , y )  i 1
n 1
Correlation coefficient
 Pearson’s Correlation
cov ariance( x, y )
Coefficient is
standardized
r
covariance (unitless):
var x var y
Scatter Plots of Data with Various
Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
Linear Correlation
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Linear Correlation
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Linear Correlation
No relationship

X
Linear regression

In correlation, the two variables are treated as equals. In


regression, one variable is considered independent (=predictor)
variable (X) and the other the dependent (=outcome) variable Y.

Prediction

If you know something about X, this knowledge helps you predict


something about Y.
Uses of Regression Analysis

 Regression analysis serves Three major purposes.


1.Description
2.Control
3.Prediction
 The several purposes of regression analysis frequently
overlap in practice
What is “Linear”?
 Remember this:
 Y=mX+B? m

What’s Slope?
A slope of 2 means that every 1-unit change in X yields a
2-unit change in Y.
Predicted value for an individual…
yˆ  b0  b1 x + random errori

Fixed – Follows a normal


exactly distribution
on the
line

 The values of the regression parameters b0, and b1 are


not known. We estimate them from data.
Regression Line Statistical relation between Lot size and Man-Hour

180

160

140

120

100

Man-Hour
80

60
(Xi ,Yi )
40

20

0
0 10 20 30 40 50 60 70 80 90
Lot size

 We will write an estimated regression line based on


sample data as
yˆ  b0  b1 x

 The method of least squares chooses the values for b0,


and b1 to minimize the sum of squared errors
n n 2

SSE   ( yi  yˆ i ) 2   y  b0  b1 x 
i 1 i 1
Minimise the sum of square of errors
 Using Calculus

 Solve for b0, and b1 to get the position of the line


n n n n

 ( x i  x )( y i  y ) n  xi yi  x i yi
b1  i 1
n
 i 1
n
i 1
n
i 1


i 1
( xi  x ) 2 n  x i2  (  x i ) 2
i 1 i 1

or
Sy
b1  r b 0  y  b1 x
Sx
The Fit Parameters
Define sums of squares:

The quality of fit is parameterized by r2 the correlation coefficient

Sy
b1  r
Sx
Estimation of Mean Response
 Fitted regression line can be used to estimate the
mean value of y for a given value of x.
 Example
 The weekly advertising expenditure (x) and weekly sales
(y) are presented in the following table.

y x
1250 41
1380 54
1425 63
1425 54
1450 48
1300 46
1400 62
1510 61
1575 64
1650 71
Point Estimation of Mean Response
 From previous table we have:
n  10  x  564  x  32604 2

 y  14365  xy  818755
 The least squares estimates of the regression coefficients
are:
n xy   x  y 10(818755)  (564)(14365)
b1    10.8
n  x 2  ( x ) 2 10(32604 )  (564) 2

b0  1436.5  10.8(56.4)  828


Point Estimation of Mean Response
 The estimated regression function is:
ŷ  828  10.8x
Sales  828  10 .8 Expenditur e

 This means that if the weekly advertising expenditure is


increased by $1 we would expect the weekly sales to
increase by $10.8.
Point Estimation of Mean Response
 Fitted values for the sample data are obtained by
substituting the x value into the estimated regression
function.
 For example if the advertising expenditure is $50,
then the estimated Sales is:
Sales  828  10.8(50)  1368
 This is called the point estimate (forecast) of the mean
response (sales).
Residual

 The difference between the observed value yi and the


corresponding fitted value ŷi
e i  y i  yˆ i

 Residuals are highly useful for studying whether a


given regression model is appropriate for the data at
hand.
Example: weekly advertising expenditure

y x y-hat Residual (e)


1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
Regression Standard Error
 Approximately 95% of the observations should fall within
plus/minus 2*standard error of the regression from the
regression line, which is also a quick approximation of a
95% prediction interval.
 For simple linear regression standard error is the square
root of the average squared residual.

1 1
s y. x     ( yi  yˆ i ) 2
2 2
e
n2 n2
i

s y.x  s y.x
2
 To estimate standard error, use

 s estimates the standard deviation of the error term  in the


statistical model for simple linear regression.
The standard error of Y given X is the average variability around the regression
line at any given value of X. It is assumed to be equal at all values of X.

Sy/x

Sy/x
Sy/x

Sy/x
Sy/x
Sy/x
Regression Standard Error

y x y-hat Residual (e) square(e)


1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04

y-hat = 828+10.8X total 36124.76


Sy.x 67.19818
Analysis of Residual

 To examine whether the regression model is


appropriate for the data being analyzed, we can
check the residual plots.
 Residual plots are:
 Plot a histogram of the residuals
 Plot residuals against the fitted values.
 Plot residuals against the independent variable.
 Plot residuals over time if the data are chronological.
Residual plots
 The residuals should
have no systematic
pattern. Degree Days Residual Plot

 The residual plot to right 1

shows a scatter of the

Residuals
0.5
points with no 0
individual observations 0 20 40 60
-0.5
or systematic change as
-1
x increases.
Degree Days
Residual plots

 The points in this


residual plot have a
curve pattern, so a
straight line fits poorly
Residual plots

 The points in this plot


show more spread for
larger values of the
explanatory variable x,
so prediction will be less
accurate when x is large.
ANOVA

 Analysis of variance (ANOVA) is a statistical technique that is


used to check if the means of two or more groups are
significantly different from each other.
 An ANOVA test is a way to find out if survey or experiment
results are significant.
 Compares the samples on the basis of their means

You might also like