0% found this document useful (0 votes)

25 views

File4 Session3 Introduction To Regression

Uploaded by

Đan Anh Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

File4 Session3 Introduction To Regression

Uploaded by

Đan Anh Pham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Introduction to

Regression
Introduction to linear regression
● Simple linear regression
● Multiple regression
● Find out relationships between dependent and
independent variable
● Introduction to SPSS
Simple linear regression

y = Dependent variable
X = Independent variable

= y-intercept of the line (constant), cuts through the y-axis

= Unknown parameter – Slope of the line

= Random error component

Terminology for simple
regression
y X
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predictor variable
Regressand Regressor
Examples

- With a sample from the population

- denote a random sample of
size n from the population

File: dataspss-s4.1
DETERMINING THE EQUATION OF THE
REGRESSION LINE

● Deterministic Regression Model – mathematical

models that produce an ‘exact’ output for a given
input

● Probabilistic Regression Model- a model that

includes an error term that allows for various values
of output to occur for a given value of input

= predicted value of y
= value of independent variable for the ith value
= real value of dependent variable for the ith
value
= population slope
= population intercept
= error of prediction for the ith value
Simple Linear Regression Model
Y

Observed
value of Y for
Xi
εi
Predicted Slope =
value of Y for Random error for β1
Xi this Xi value

Intercept =
β0
Xi X
0
Sample Regression
Function (SRF)

= estimated value of Y for observation i

xi = value of X for observation i
b0 = Y- intercept
is the value of Y when X is zero
b1 = slope of the regression line
change in Y for 1 unit X

b1 > 0 : Line will go up; positive relationship between X and Y

b1 < 0 : Line will go down; negative relationship between X and Y
SIMPLE LINEAR REGRESSION MODEL
(sample)

Simple Linear Regression:

wher
e
Sample Regression Function
(SRF) (continued
)
● b0 and b1 are obtained by finding the values of b0 and b1
that minimizes the sum of the squared residuals (minimize the
error). This process is called Least Squares Analysis

Yi = actual value of Y for observation i

= predicted value of Y for observation i
ei = residual (error)
● b0 provides an estimate of 0
● b1 provides and estimate of 1
RESIDUAL ANALYSIS
Simple example

1 1 -2 -2 4
2 1 -1 -1 1
3 2 0 0 0
4 2 1 2 1
5 4 2 8 4
Average =3 Average =2 Total=7 10

X = experience years (year)

Y = Income (10 million VND)
Fit to the data

4 .
3
.
2 .
1 . .
x
1 2 3 4 5
Result of estimation by SPSS

Degree of freedom

k
n-k-1
n-1
Meaning of b0 and b1
● Y-Intercept (b0)
• Average value of individual income (Y) is
-0.1 (10 million VND) when when the experience year
(X) is 0

● Slope (b1)
• Income (Y) is expected to increase by 0.7 (*10 million
VND) for each unit increased in experience year

b1 > 0 : Line will go up; positive relationship between X and Y (increase)

b1 < 0 : Line will go down; negative relationship between X and Y (decrease)
Measures of Variation
Total variation is made up of two parts.

Total Sum of Regression Sum of Error Sum of

Squares Squares Squares

Measures the Explained variation Variation attributable to

variation of the Yi attributable to the factors other than the
values around their relationship between X relationship between X
mean Y. and Y. and Y.

/* Other notation for SSyy is SST. They are the same!

Measure of Variation: (continued)
The Sum of Squares
Y
SSE = (Yi - Yi )2
_
SSyy = (Yi - Y)2

_
SSR = (Yi - Y)2
_
Y

X
Xi
Result of estimation by SPSS
Coefficient of Determination, r2
The coefficient of determination is the portion of the total variation in
the dependent variable that is explained by variation in the
independent variable.
The coefficient of determination is also called r-squared and is denoted
as r2

Note:

0
The Coefficient of Determination r2
and the Coefficient of Correlation r

0<r2<1

r2 = Coefficient of Determination
Measures the % of variation in Y that is explained by the
independent variable X in the regression model

-1<r<1
r = Coefficient of Correlation
Measures how strong the relationship is between X and Y

r > 0 if b1>0
r < 0 if b1 <0
Examples of Approximate r2
values
Perfect linear
Y Y relationship
between X and Y.
100% of the
variation in Y is
explained by
variation in X.
X
r2 = r2 = X
1 1
Y
No linear relationship between X
and Y.
The value of Y does not depend
on X (None of the variation in Y is
explained by variation in X).
r2 = X
0
0
Examples of Approximate r2
values 0 < r2 <
Y
1
Weaker linear relationships
between X and Y.

X Some but not all of the

Y variation in Y is explained by
variation in X.

X
0
Standard Error of the Estimate
The standard deviation of the variation of observations around the
regression line is estimated by:

Where
SSE = error sum of squares
n = sample size

0
Result of estimation by SPSS
Inferences About the Slope
The standard error of the regression slope coefficient (b1) is
estimated by:

Where
= Estimate of the standard error of the least squares slope.

= Standard error of the estimate.

0
Result of estimation by SPSS

Sb1 = 0.1914854
Inference about the Slope: t Test

t test for a population slope:

• Is there a linear relationship between X and Y?

Null hypothesis (H0) and Alternative hypothesis (H1)

H0: β1 = 0 (no linear relationship)
H1: β1 ≠ 0 (linear relationship does exist)

Test statistic with d.f. = n-2

Where b1 = regression slope coefficient

β1 = hypothesized slope
Sb1 = standard error of the slope

0
Result of estimation by SPSS
Inferences about the Slope: t Test
Example
H0: β1 = 0 Test Statistic: t = 3.66
T critical = +/- 3.182 (from t tables)
H1: β1 ≠0

Decision: Reject H0
Conclusion: There is
d.f. = 5-2 = 3
sufficient evidence that
/2=.025 number of customers
/2=.025 Do not
affects weekly sales.
reject H0

Reject H0 Reject H0
-t /2 0 t /2
-3.182 3.182 3.66

0
F Test for Significance
F Test statistic:

Where

F follows an F distribution with k numerator and (n – k - 1)

denominator degrees of freedom.
k = the number of independent (explanatory) variables in the
regression model.

0
Result of estimation by SPSS
F Test for Significance Example
df1= k =1
df2 = n-k-1=5-1-1
H0: β1 = 0 Test Statistic:

H1: β1 ≠ 0
= .05
df1= 1 df2 = 3
Conclusion:
Critical Value: Reject H0 at = 0.05
F = 10.128 There is sufficient evidence that
number of customers affects
= .05 weekly sales.

0
F
Do not reject H0
F.05 = 10.128 Reject H0
0
Introduction to SPSS (file: dataspss-s4.1)
Result of estimation by SPSS
Voice of result
● R-squared ranges in value between 0 and 1
● R2 = 0, nothing to help explain the variance in y
● R2 = 1, all the same points lie on the estimated regression line
● Example: R2 = 0.93 implies that the regression equation explains
93% of the variation in the dependent variable

● Sig. (significant): Goodness of fit only if

● Sig. of coefficient < 0.01 significant at 1%, Ho is rejected

● 0.01 ≤ Sig. value < 0.05 significant at 5%, Ho is rejected
● Sig. value > 0.1 significant at 10%, Ho is rejected
Introduction to multiple
regression

● Multiple regression
● Find out relationships between dependent and
independent variables
● Dummy variable enclosed
● Solution? and SPSS
Linear regression

y = Dependent (or response variable)

X1, X2, …., Xn = Independent or predictor variables

= y-intercept of the line (constant), cuts through the y-axis

= Unknown parameter – Slope of the line

= Random error component

Terminology for simple
regression
y X1, x2, …, xk
Dependent variable Independent variable
Explained variable Explanatory variable
Response variable Control variable
Predicted variable Predictor variable
Regressand Regressor
Example
● File: dataspss-s4.2
● Dependent variable ?
● Independent variables
● SPSS program
● Estimate and discuss
Think?
● Survey conducted with variables
● Income
● Age
● Years in experience working
● Education
● Gender
● ………
Think which one is dependent and independent
variables?
Regression with dummy
independent variables
● Independent variable: Gender
● 1= female, 0 = male
● If coefficient estimated of gender is a positive value,
dependent variable is the direction increase with female
● If coefficient estimated of gender is negative value,
dependent variable is the direction increase with male.
Samples of hypotheses

● An increase in education does not cause a rise

in the earning
● People’s earning is not positively influenced by
their age
● There is not a significant relationship between
the earning and the gender
Adjusted R2
Adjusted R square to identify a good regression model once some
variance are added.
The higher Adjusted R-square, the better model

(where/với n = sample size/kích cỡ mẫu, k = number of indendent variables/số

biến độc lập)

• Support to control number of independent variables added/Tạo

trở ngại việc sử dụng vượt mức những biến độc lập không quan
trọng.
• Adjusted R square < R square/ Luôn nhỏ hơn giá trị của r2
Collinearity
Collinearity: High correlation exists among two or more
independent variables.
This means the correlated variables contribute redundant
information to the multiple regression model.
Including two highly correlated independent variables
can adversely affect the regression results.
No new information provided:
• Can lead to unstable coefficients (large standard error
and low t-values).
• Coefficient signs may not match prior expectations.
Some Indications of Strong Collinearity

Incorrect signs on the coefficients.

Large change in the value of a previous coefficient when a
new variable is added to the model.
A previously significant variable becomes non-significant
when a new independent variable is added.
The estimate of the standard deviation of the model increases
when a variable is added to the model.

0
Measuring Collinearity Variance
Inflationary Factor
The variance inflationary factor VIFj can be used to measure collinearity:

VIF – PHStat program

Where R2j is the coefficient of
multiple determination of
independent variable Xj with all other
X variables.

If VIFj =1, Xj is uncorrelated with the other Xs

If VIFj > 10, Xj is highly correlated with the other Xs

(conservative estimate reduces this to VIFj > 5)

0
Section Summary
• Developed the multiple regression model.
• Tested the significance of the multiple regression model.
• Discussed r2, adjusted r2 and overall F test.
• Discussed using residual plots to check model
assumptions.
• Tested individual regression coefficients.
• Used dummy variables.
• Evaluated interaction effects.
• Evaluated collinearity.
Regression and collinearity
Chọn Statistics để vào
kiểm tra đa cộng tuyến

Chọn Collinearity
diagnostics
Kết quả điển hình từ SPSS
Biến phụ thuộc: Hài lòng

VIF > 10 Đa cộng tuyến

Tolerance > 1 Đa cộng

tuyến
Group assignment
● Check database of group assignment
● Develop general regression model (multiple
regression)
● Develop hypotheses
● Test regression model + check collinearity +
write out the estimated regression model
● Present the result of hypothesis testing
● Develop possible solutions and think of solution
ranking

Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
FinQuiz - Curriculum Note, Study Session 2, Reading 4
No ratings yet
FinQuiz - Curriculum Note, Study Session 2, Reading 4
5 pages
Regression Analysis Using SPSS: DR Somesh K Sinha
100% (1)
Regression Analysis Using SPSS: DR Somesh K Sinha
17 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Statistics Overview Part II
No ratings yet
Statistics Overview Part II
29 pages
Regression
No ratings yet
Regression
24 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
14 pages
Week 13
No ratings yet
Week 13
25 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
F_Regression
No ratings yet
F_Regression
65 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Regression
No ratings yet
Regression
66 pages
Business Analytics: Advance: Simple & Multiple Linear Regression
No ratings yet
Business Analytics: Advance: Simple & Multiple Linear Regression
38 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Copyofcopyof1lec25 27simplelinearregression 231224065709 c7c439d0
No ratings yet
Copyofcopyof1lec25 27simplelinearregression 231224065709 c7c439d0
31 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Simple Linear Regression
100% (1)
Simple Linear Regression
50 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
COMM5005 Lecture 8
No ratings yet
COMM5005 Lecture 8
54 pages
Business Stat 10 12 .PDF
No ratings yet
Business Stat 10 12 .PDF
144 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
Chap 013
No ratings yet
Chap 013
16 pages
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
No ratings yet
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
16 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
64 pages
Regression
No ratings yet
Regression
15 pages
Module-11.-Lesson-Proper
No ratings yet
Module-11.-Lesson-Proper
5 pages
Crib Sheet 3
No ratings yet
Crib Sheet 3
2 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
32 pages
Chapter 2 SLRM
No ratings yet
Chapter 2 SLRM
40 pages
Chapter 5
No ratings yet
Chapter 5
73 pages
Introduction To Linear Regression and Correlation Analysis
No ratings yet
Introduction To Linear Regression and Correlation Analysis
47 pages
RiP Final Study Doc
No ratings yet
RiP Final Study Doc
35 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Chapter 7 - Linear Regression
No ratings yet
Chapter 7 - Linear Regression
3 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
Week 1 MOD A Simple Regression Chapter 12 Berenson
No ratings yet
Week 1 MOD A Simple Regression Chapter 12 Berenson
60 pages
Sessions 18 19 - Regression - SLR MLR
No ratings yet
Sessions 18 19 - Regression - SLR MLR
70 pages
Regression
No ratings yet
Regression
56 pages
Chapter 14 Simple Linear Regression
No ratings yet
Chapter 14 Simple Linear Regression
45 pages
Slides - Simple Linear Regression
No ratings yet
Slides - Simple Linear Regression
35 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
lect07
No ratings yet
lect07
13 pages
325unit 1 Simple Regression Analysis
No ratings yet
325unit 1 Simple Regression Analysis
10 pages
Module -05 Statistical Computing and r Programming
No ratings yet
Module -05 Statistical Computing and r Programming
53 pages
PROBLEMS ch05
No ratings yet
PROBLEMS ch05
117 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
10 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Exam CD FEB 2023 MAT112
No ratings yet
Exam CD FEB 2023 MAT112
5 pages
Product Catalogue - Rotary
100% (1)
Product Catalogue - Rotary
36 pages
Sales Contract
No ratings yet
Sales Contract
5 pages
20 Matthew Anglin V Matthew Anglin NOTICE of MOTION 21
No ratings yet
20 Matthew Anglin V Matthew Anglin NOTICE of MOTION 21
2 pages
Hands-On 2
No ratings yet
Hands-On 2
11 pages
Sira16ATEX2219X_Pg5y5_ZE000
No ratings yet
Sira16ATEX2219X_Pg5y5_ZE000
16 pages
Analysis and Design of Library Managemen
No ratings yet
Analysis and Design of Library Managemen
31 pages
A120215909 - 24802 - 17 - 2020 - Cases Sale of Goods Act
No ratings yet
A120215909 - 24802 - 17 - 2020 - Cases Sale of Goods Act
37 pages
Notification Oil India LTD JR Assistant Posts
No ratings yet
Notification Oil India LTD JR Assistant Posts
7 pages
Praxis Multiple Subjects Score Report
No ratings yet
Praxis Multiple Subjects Score Report
3 pages
MN24060510 Imc Pipe, Coupling EJJG (CEEC) - WB
No ratings yet
MN24060510 Imc Pipe, Coupling EJJG (CEEC) - WB
1 page
Print Order # E701088856
No ratings yet
Print Order # E701088856
4 pages
PM Chapter 4 AAMBC
No ratings yet
PM Chapter 4 AAMBC
62 pages
2022.10.19 Thang Van Do ICCAIS-2022 Final Version
No ratings yet
2022.10.19 Thang Van Do ICCAIS-2022 Final Version
6 pages
在线完成作业
100% (1)
在线完成作业
9 pages
SMEDA Fan Guards Manufacturing Unit
100% (1)
SMEDA Fan Guards Manufacturing Unit
19 pages
Gaussian Elimination Spreadsheet
0% (1)
Gaussian Elimination Spreadsheet
2 pages
CDDB01 Barangay Profile Form
No ratings yet
CDDB01 Barangay Profile Form
13 pages
X Data Caterpillar Diagramas Electricos PDFs Tractocamiones DIAGRAMA 3126B
No ratings yet
X Data Caterpillar Diagramas Electricos PDFs Tractocamiones DIAGRAMA 3126B
2 pages
PDF Diagnostics With Autosar and Odx Part 1 DD
No ratings yet
PDF Diagnostics With Autosar and Odx Part 1 DD
4 pages
Disciples Sacred Lands User Manual Gold Edition
No ratings yet
Disciples Sacred Lands User Manual Gold Edition
96 pages
Vaibhav Trivedi Resume-1
No ratings yet
Vaibhav Trivedi Resume-1
1 page
Self Curing Concrete
No ratings yet
Self Curing Concrete
10 pages
Happiness Excellence and Optimal Human Functioning Revisited Examining The Peer Reviewed Literature Linked To Positive Psychology - 2015
No ratings yet
Happiness Excellence and Optimal Human Functioning Revisited Examining The Peer Reviewed Literature Linked To Positive Psychology - 2015
12 pages
8040 Colours On The Street
No ratings yet
8040 Colours On The Street
15 pages
MY PERFORMANCE (How I Will Be Rated)
No ratings yet
MY PERFORMANCE (How I Will Be Rated)
28 pages
Right Form of Verbs - SSC Board Question analysis -Gazi Online School _ Gazi Online School
No ratings yet
Right Form of Verbs - SSC Board Question analysis -Gazi Online School _ Gazi Online School
12 pages
Pharmacy As A Career
0% (1)
Pharmacy As A Career
10 pages
Proforma Invoice - SO-00000758
No ratings yet
Proforma Invoice - SO-00000758
3 pages
Probability and Descriptive Statistics Syllabus
No ratings yet
Probability and Descriptive Statistics Syllabus
3 pages

File4 Session3 Introduction To Regression

Uploaded by

File4 Session3 Introduction To Regression

Uploaded by

Introduction to

= y-intercept of the line (constant), cuts through the y-axis

= Random error component

- With a sample from the population

● Deterministic Regression Model – mathematical

● Probabilistic Regression Model- a model that

= estimated value of Y for observation i

b1 > 0 : Line will go up; positive relationship between X and Y

Simple Linear Regression:

Yi = actual value of Y for observation i

X = experience years (year)

b1 > 0 : Line will go up; positive relationship between X and Y (increase)

Total Sum of Regression Sum of Error Sum of

Measures the Explained variation Variation attributable to

/* Other notation for SSyy is SST. They are the same!

X Some but not all of the

= Standard error of the estimate.

t test for a population slope:

Null hypothesis (H0) and Alternative hypothesis (H1)

Test statistic with d.f. = n-2

Where b1 = regression slope coefficient

F follows an F distribution with k numerator and (n – k - 1)

● Sig. (significant): Goodness of fit only if

● Sig. of coefficient < 0.01 significant at 1%, Ho is rejected

y = Dependent (or response variable)

= y-intercept of the line (constant), cuts through the y-axis

= Unknown parameter – Slope of the line

= Random error component

● An increase in education does not cause a rise

(where/với n = sample size/kích cỡ mẫu, k = number of indendent variables/số

• Support to control number of independent variables added/Tạo

Incorrect signs on the coefficients.

VIF – PHStat program

If VIFj =1, Xj is uncorrelated with the other Xs

If VIFj > 10, Xj is highly correlated with the other Xs

VIF > 10 Đa cộng tuyến

Tolerance > 1 Đa cộng

You might also like