File4 Session3 Introduction To Regression
File4 Session3 Introduction To Regression
Regression
Introduction to linear regression
● Simple linear regression
● Multiple regression
● Find out relationships between dependent and
independent variable
● Introduction to SPSS
Simple linear regression
y = Dependent variable
X = Independent variable
File: dataspss-s4.1
DETERMINING THE EQUATION OF THE
REGRESSION LINE
= predicted value of y
= value of independent variable for the ith value
= real value of dependent variable for the ith
value
= population slope
= population intercept
= error of prediction for the ith value
Simple Linear Regression Model
Y
Observed
value of Y for
Xi
εi
Predicted Slope =
value of Y for Random error for β1
Xi this Xi value
Intercept =
β0
Xi X
0
Sample Regression
Function (SRF)
wher
e
Sample Regression Function
(SRF) (continued
)
● b0 and b1 are obtained by finding the values of b0 and b1
that minimizes the sum of the squared residuals (minimize the
error). This process is called Least Squares Analysis
1 1 -2 -2 4
2 1 -1 -1 1
3 2 0 0 0
4 2 1 2 1
5 4 2 8 4
Average =3 Average =2 Total=7 10
4 .
3
.
2 .
1 . .
x
1 2 3 4 5
Result of estimation by SPSS
Degree of freedom
k
n-k-1
n-1
Meaning of b0 and b1
● Y-Intercept (b0)
• Average value of individual income (Y) is
-0.1 (10 million VND) when when the experience year
(X) is 0
● Slope (b1)
• Income (Y) is expected to increase by 0.7 (*10 million
VND) for each unit increased in experience year
_
SSR = (Yi - Y)2
_
Y
X
Xi
Result of estimation by SPSS
Coefficient of Determination, r2
The coefficient of determination is the portion of the total variation in
the dependent variable that is explained by variation in the
independent variable.
The coefficient of determination is also called r-squared and is denoted
as r2
Note:
0
The Coefficient of Determination r2
and the Coefficient of Correlation r
0<r2<1
r2 = Coefficient of Determination
Measures the % of variation in Y that is explained by the
independent variable X in the regression model
-1<r<1
r = Coefficient of Correlation
Measures how strong the relationship is between X and Y
r > 0 if b1>0
r < 0 if b1 <0
Examples of Approximate r2
values
Perfect linear
Y Y relationship
between X and Y.
100% of the
variation in Y is
explained by
variation in X.
X
r2 = r2 = X
1 1
Y
No linear relationship between X
and Y.
The value of Y does not depend
on X (None of the variation in Y is
explained by variation in X).
r2 = X
0
0
Examples of Approximate r2
values 0 < r2 <
Y
1
Weaker linear relationships
between X and Y.
X
0
Standard Error of the Estimate
The standard deviation of the variation of observations around the
regression line is estimated by:
Where
SSE = error sum of squares
n = sample size
0
Result of estimation by SPSS
Inferences About the Slope
The standard error of the regression slope coefficient (b1) is
estimated by:
Where
= Estimate of the standard error of the least squares slope.
Sb1 = 0.1914854
Inference about the Slope: t Test
0
Result of estimation by SPSS
Inferences about the Slope: t Test
Example
H0: β1 = 0 Test Statistic: t = 3.66
T critical = +/- 3.182 (from t tables)
H1: β1 ≠0
Decision: Reject H0
Conclusion: There is
d.f. = 5-2 = 3
sufficient evidence that
/2=.025 number of customers
/2=.025 Do not
affects weekly sales.
reject H0
Reject H0 Reject H0
-t /2 0 t /2
-3.182 3.182 3.66
0
F Test for Significance
F Test statistic:
Where
0
Result of estimation by SPSS
F Test for Significance Example
df1= k =1
df2 = n-k-1=5-1-1
H0: β1 = 0 Test Statistic:
H1: β1 ≠ 0
= .05
df1= 1 df2 = 3
Conclusion:
Critical Value: Reject H0 at = 0.05
F = 10.128 There is sufficient evidence that
number of customers affects
= .05 weekly sales.
0
F
Do not reject H0
F.05 = 10.128 Reject H0
0
Introduction to SPSS (file: dataspss-s4.1)
Result of estimation by SPSS
Voice of result
● R-squared ranges in value between 0 and 1
● R2 = 0, nothing to help explain the variance in y
● R2 = 1, all the same points lie on the estimated regression line
● Example: R2 = 0.93 implies that the regression equation explains
93% of the variation in the dependent variable
● Multiple regression
● Find out relationships between dependent and
independent variables
● Dummy variable enclosed
● Solution? and SPSS
Linear regression
0
Measuring Collinearity Variance
Inflationary Factor
The variance inflationary factor VIFj can be used to measure collinearity:
0
Section Summary
• Developed the multiple regression model.
• Tested the significance of the multiple regression model.
• Discussed r2, adjusted r2 and overall F test.
• Discussed using residual plots to check model
assumptions.
• Tested individual regression coefficients.
• Used dummy variables.
• Evaluated interaction effects.
• Evaluated collinearity.
Regression and collinearity
Chọn Statistics để vào
kiểm tra đa cộng tuyến
Chọn Collinearity
diagnostics
Kết quả điển hình từ SPSS
Biến phụ thuộc: Hài lòng