0% found this document useful (0 votes)
117 views68 pages

Chap 10 Regression Analysis

The document provides an introduction to regression analysis including describing the relationship between dependent and independent variables, the simple linear regression model, assumptions of linear regression, estimating the regression model and interpreting the slope and intercept. It also discusses correlation analysis and calculating the coefficient of determination.

Uploaded by

Linh Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views68 pages

Chap 10 Regression Analysis

The document provides an introduction to regression analysis including describing the relationship between dependent and independent variables, the simple linear regression model, assumptions of linear regression, estimating the regression model and interpreting the slope and intercept. It also discusses correlation analysis and calculating the coefficient of determination.

Uploaded by

Linh Nguyễn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 68

1.

Introduction to regression analysis


Regression analysis
- Describe a relationship between two variables in
mathematical terms.
- Predict the value of a dependent variable based on
the value of at least one independent variable
- Explain the impact of changes in an independent
variable on the dependent variable
1. Introduction to regression analysis
Dependent Independent
variable variable

the variable we wish the variable used


to explain to explain the
dependent variable
Names for ys and xs in regression model
Names for y Name for xs

Dependent variable Independent variables

Regressand Regressors
Effect variable Causal variables

Explained variable Explanatory variables


Simple Linear Regression Model

Only one independent variable, x


Relationship between x and y is described
by a linear function
Changes in y are assumed to be caused by
changes in x
Types of Regression Models
Positive Linear Relationship Non-linear relationship

Negative Linear Relationship No Relationship


Population Linear Regression
The population regression model:
Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

y  β0  β1x  ε
Variable

Linear component Random Error


component
Linear Regression Assumptions
Error values (ε) are statistically independent
Error values are normally distributed for any given
value of x
The probability distribution of the errors has
constant variance
The underlying relationship between the x variable
and the y variable is linear
Population Linear Regression

y y  β0  β1x  ε
Observed Value
of y for xi

εi Slope = β1
Predicted Value
Random Error
of y for xi
for this x value

Intercept = β0

xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value
intercept
Independent

ŷ i  b0  b1x variable

The individual random error terms ei have a mean of zero


Least Squares Criterion
b0 and b1 are obtained by finding the values of
b0 and b1 that minimize the sum of the squared
residuals

e 2
  (y ŷ) 2

  (y  (b 0  b1x))
2
The Least Squares Equation
The formulas for b1 and b0 are:

 xy   x y
b1  n
(
x  n
2  x ) 2

and
or
xy  x . y
b1  b0  y  b1 x
x2
Interpretation of the
Slope and the Intercept
b0 is the estimated average value of y when the
value of x is zero

b1 is the estimated change in the average value


of y as a result of a one-unit change in x
Example
A real estate agent wishes to examine the
relationship between the selling price of a home and
its size (measured in square feet)

A random sample of 10 houses is selected


Dependent variable (y)?

house price in $1000s


 Independent variable (x)? square feet
Sample Data for House Price Model
House Price in $1000s Square Feet
(y) (x)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
y x xy x2 y2
245 1400 343000 1960000 60025
312 1600 499200 2560000 97344
279 1700 474300 2890000 77841
308 1875 577500 3515625 94864
199 1100 218900 1210000 39601
219 1550 339450 2402500 47961
405 2350 951750 5522500 164025
324 2450 793800 6002500 104976
319 1425 454575 2030625 101761
255 1700 433500 2890000 65025
2865 17150 5085975 3098375 853423
0
Estimate b0 and b1
xy  x . y 508597.5  1715  286.5
b1    0.1097
x  (x )
2 2 3098375  1715 2

b0  y  b1 x  286.5  0.1097 1715  98.2483

The regression equation is:


yˆ  98.2483  0.1097 x
Graphical Presentation
House price model: scatter plot and regression
line
450
400
House Price ($1000s)

350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet

ŷ  98.2483  0.1097x
Interpretation of the
Intercept, b0
ŷ  98.2483  0.1097x
b0 is the estimated average value of Y when the value
of X is zero
Reflect the portion of the house price not explained
by square feet

Reflect the portion of the


house price caused by factors
other than square feet.
Interpretation of the
Slope Coefficient, b1

ŷ  98.2483  0.1097x
b1 measures the estimated change in the
average value of Y as a result of a one-unit
change in X
Here, b1 = .10977 indicates that the average
value of a house increases by .10977($1000)
= $109.77, on average, for each additional
one square foot of size
2
Coefficient of Determination R
The coefficient of determination is the portion of the
total variation in the dependent variable that is
explained by variation in the independent variable
The coefficient of determination is also called R-
squared and is denoted as

RSS
R 
2 where 0 R 12

TSS
2
Coefficient of Determination R

Coefficient of determination
RSS sum of squares explained by regression
R 
2

TSS total sum of squares
Examples of Approximate
2
R Values
y
R2 = 1

Perfect linear relationship


between x and y:
x
R2 = 1
y 100% of the variation in y is
explained by variation in x

x
R2 = +1
Examples of Approximate
R Values
2

y
0 < R2 < 1

Weaker linear relationship


between x and y:
x

y
Some but not all of the
variation in y is explained
by variation in x

x
Examples of Approximate
Values R 2

R2 = 0
y
No linear relationship
between x and y:
The value of Y does not
R2 = 0
x depend on x. (None of the
variation in y is explained
by variation in x)
y x ŷ ( yˆ  y ) 2 ( y  y )2
245 1400 251.8283 1202.127 1722.25
312 1600 273.7683 162.0962 650.25
279 1700 284.7383 3.103587 56.25
308 1875 303.9358 304.0071 462.25
199 1100 218.9183 4567.286 7656.25
219 1550 268.2833 331.8482 4556.25
405 2350 356.0433 4836.271 14042.25
324 2450 367.0133 6482.391 1406.25
319 1425 254.5708 1019.474 1056.25
255 1700 284.7383 3.103587 992.25
2865 17150 2863.838 18911.71 32600.5
Coefficient of determination
RSS 18911.71
R 
2
R 
2
 0.58
TSS 32600.5

Only 58% of the variation in house price is


due to square feet
Put another way, 42% of variation in house
price is due to factors other than square feet
2. Correlation analysis
Correlation is a technique used to measure the
strength of the relationship between two variables.
The stronger the correlation, the better the
relationship or the better fit the regression line and
vice versa.
Scatter Plot Examples
High degree of Low degree of
correlation correlation

y y

x x

y y

x x
Scatter Plot Examples
No relationship

x
The correlation coefficient (r)
The correlation coefficient is used to
measure the strength of the linear
relationship between two variables
The product moment correlation
coefficient is calculated using the
formula:
The correlation coefficient (r)
r
 ( x  x )( y  y )
[ ( x  x ) ][ ( y  y ) ]
2 2

n xy   x  y
r
[n(  x 2 )  (  x )2 ][n(  y 2 )  (  y )2 ]

xy  x . y
r
 x y
Note
In the single
independent variable
case, the coefficient of
determination is R r 2 2

where
r : simple correlation coefficient
Features of r
Unit free
Range between -1 and 1
The closer to -1, the stronger the negative linear
relationship
The closer to 1, the stronger the positive linear
relationship
The closer to 0, the weaker the linear relationship
Examples of Approximate
r Values
y y y

x x x
r = -1 r = -.6 r=0
y y

x x
r = +.3 r = +1
Example calculation
xy  x . y
r
x 2  ( x )2 y 2  ( y )2

508597.5  1715  286.5


r  0.762
3098375  (1715) 2 85342.3  (286.5) 2

The result shows a fairly strong correlation


Working Productivity
Example experience (items/h)
The data below
1 2
relates the working
experience (years) to 3 8
the productivity of 10 4 9
workers in a small 5 15
firm
6 15
7 20
9 23
12 25
14 22
15 36
Example calculation
 x  76 x 2
 782

 y  175 y 2
 3932

 xy  1722
Estimate b0 and b1
xy  x . y
b1 
x  (x )
2 2

172.2  7.6 17.5


b1   1.918
78.2  7.6 2
Estimate b0 and b1

b0  y  b1 x

b0  17.5  1.918  7.6  2.923


Linear regression equation

Interpretation of b0 and b1?


Coefficient of determination and correlation
coefficient
RSS   ( yˆ  y )  751.9312
2

TSS   ( y  y )  870.5
2

RSS 751.9312
R 
2
  0.8637
TSS 870.5
Coefficient of determination and correlation
coefficient
xy  x . y
r
x 2  ( x )2 y 2  ( y )2

172.2  7.6 17.5


r  0.9293
78.2  7.6 393.3  17.5
2 2

or r  R  0.8637  0.9293
2
The Multiple Regression Model
Idea: Examine the linear relationship between
1 dependent (y) & 2 or more independent variables (xi)
Population model:
Y-intercept Population slopes Random Error

y  β0  β1x1  β 2 x 2    βk x k  ε
Estimated multiple regression model:
Estimated Estimated
(or predicted) Estimated slope coefficients
intercept
value of y

ŷ  b0  b1x1  b 2 x 2    bk x k
Estimates b0, b1, b2,….,bk

 y  nb0  b1  x1  b2  x2  .......  bk  xk

 1  0 1  1  1  b2  x1 x2 .......  bk  x1 xk
2
x y b x b x

 2  0 2  1 1 2  2  2 .......  bk  x2 xk
2
x y b x b x x b x
......................................................................................

 k
  0 k  1 1 k  2 2 k  k k
2
x y b x b x x b x x ....... b x
Interpretation of Estimated Coefficients
Slope (bi)
Estimates that the average value of y changes by bi units
for each 1 unit increase in Xi given that all other variables
unchanged
Intercept (b0)
The estimated average value of y when all xi = 0
Multiple Regression Model
Two variable model
y
ŷ  b0  b1x1  b 2 x 2

x1
e
abl
i
var
r
fo
ope x2
Sl
varia ble x 2
pe fo r
Slo

x1
Multiple Regression Model
Two variable model
y Sample

<yi
observation ŷ  b0  b1x1  b 2 x 2

yi

<
e = (y – y)

x2i
x2

<
x1i The best fit equation, y ,
is found by minimizing the
x1 sum of squared errors, e2
Multiple Regression Assumptions

Errors (residuals) from the regression


model:

<
e = (y – y)

The errors are normally distributed


The mean of the errors is zero
Errors have a constant variance
The model errors are independent
Example
A distributor of frozen desert
pies wants to evaluate factors
thought to influence demand
Data are collected for 15 weeks
Price Advertising
Week Pie Sales ($) ($100s)
1 350 5.50 3.3
2 460 7.50 3.3
3 350 8.00 3.0
4 430 8.00 4.5
5 350 6.80 3.0
6 380 7.50 4.0
7 430 4.50 3.0
8 470 6.40 3.7
9 450 7.00 3.5
10 490 5.00 4.0
11 340 7.20 3.5
12 300 7.90 3.2
13 440 5.90 4.0
14 450 5.00 3.5
15 300 7.00 2.7
Example
Dependent variable (y):Pie sales
Independent variables 1 (x1):Prices ($)
Independent variables 2 Advertising ($ 100s)
(x2):
Estimated (Predicted) regression equation:

ŷ  b0  b1 x1  b2 x2
Estimates b0, b1, b2

 y  nb0  b1  x1 b2  x2

 x1 y  b0  x1  b1  x1  b2  x1 x2
2


 x2 y  b0  x2  b1  x1 x2  b2  x2
2
Example calculation
 y  5990 x x
1 2  345.46

x 1  99.2 x 2
1  675.26

 x2  52.2  2  185
x 2

 x y  39152
1
y 2
 2448500
 x y  21087
2
Example calculation
5990  15b0  99.2b1  52.2b2

39152  99.2b0  675.26b1  345.46b2
21087  52.2b  345.46b  185b
 0 1 2

b0  306.525

b1  24.975
b  74.131
 2
Example calculation
Estimated (Predicted) regression
equation:
yˆ  306.526  24.975 x1  74.131x2

Interpretation b0, b1, b2?


The Multiple Regression Equation

Sales  306.526 - 24.975(Price)  74.131(Advertising)


where
Sales is in number of pies per week
Price is in $
Advertising is in $100’s.
b1 = -24.975: sales b2 = 74.131: sales will
will decrease, on increase, on average,
average, by 24.975 by 74.131 pies per
pies per week for week for each $100
each $1 increase in increase in
selling price, net of advertising, net of the
the effects of effects of changes
changes due to price due to advertising
Using The Model to Make Predictions
Predict sales for a week in which the selling
price is $5.50 and advertising is $350:

Sales  306.526 - 24.975(Price)  74.131(Advertising)


 306.526 - 24.975 (5.50)  74.131 (3.5)
 428.62

Predicted sales
is 428.62 pies
Multiple Coefficient of Determination
Reports the proportion of total variation in y
explained by all x variables taken together

RSS Regression sum of squares


R 
2

TSS Total sum of squares
Multiple correlation (R)
Multiple correlation provides a measure of the overall
strength of the relationship between dependent
variable and independent variables.
It is defined as the positive square root of the
coefficient of the determination

R R 2
Example calculation
RSS   ( yˆ  y )  29459.96
i
2

TSS   ( y  y )  56493.33
i
2

29459.96
R 
2
 0.521 Indication?
56493.33

R  R  0.722
2
Correlation matrix
Provides measures of the strength of the
relationship between dependent variable and each
independent variable

  y x1 x2
y 1
x1 rx1y 1
x2 rx2y rx1x2 1
Example calculation
  Pie Sales Price Advertising
Pie Sales 1
Price -0.44327 1
Advertising 0.55632 0.03044 1
 Price vs. Sales : r = -0.44327
 There is a negative association between
price and sales
 Advertising vs. Sales : r = 0.55632
 There is a positive association between

advertising and sales


Exercise 1
The table below show the data related to a bank’s interest and the
amount of money granted during 2005-2011:

Year Interest (%) Amount of money


granted (billion
VND)
1 9.05 20.1
2 10.1 20.9
3 12.5 19.8
4 14,2 18.3
5 12 17.9
6 11.1 19.4
7 10.2 21.6
- Display the data by scatter plot
- Do the bank’s interest and the amount of money granted have
any relationship?
- If yes, use the regression model to present their relationship
Exercise 2
A motion picture industry analyst wants to estimate the
gross earnings generated by a movie. The estimate will
be based on different variables involved in the film
production. The independent variables considered are
X1 = production cost of the movie (million USD) and X2
= total cost of all promotion activities (million USD).
The analyst obtains information on a random sample of
10 Hollywood movies made within the last 5 years. The
variable Y is gross earnings, in million of dollars. The
data are given in the following table. Use the multiple
regression to display the relationship among those
variables
Exercise 2
Gross earnings Production cost Promotion cost
(m. USD) (m. USD) (m. USD)
72 12 5
76 11 8
78 15 6
70 10 5
68 11 3
80 16 9
82 14 12
65 8 4
62 8 3
90 18 10
Exercise 3
The data below
Cost of No of products
relates cost of advertisement sold
advertising (VND (m. VND) (1000 units)
m)to number of 1 2
products sold (1000 3 8
units). 4 9
Prepare the 5 15
regression model and 6 15
make a conclusion 7 20
about this relation. 9 23
12 25
14 22
15 36
Exercise 4
A CEO considers whether his company can take the
unemployment rate to measure the number of products sold
or not. Write down the regression model displaying the
relationship between those variables and make conclusion
based on your results.
Period 1 2 3 4 5 6 7 8 9 10
Unemploym 1,3 2,0 1,7 1,5 1,6 1,2 1,6 1,4 1,0 1,1
ent rate, %
Number of10 6 5 12 10 15 5 12 17 20
products
sold (1000
units)

You might also like