0% found this document useful (0 votes)
43 views

Multiple Linear Regression & Nonlinear Regression Models

1) Multiple linear regression allows modeling of a response variable's relationship to multiple explanatory variables simultaneously. It extends simple linear regression, which considers only one explanatory variable. 2) The multiple linear regression model estimates coefficients for each explanatory variable while controlling for the effects of the other variables. This allows assessing the independent effect of each predictor on the response variable. 3) Least squares estimation is used to calculate the intercept and slope coefficients by minimizing the sum of squared residuals between the actual and predicted response values. This provides the best-fitting regression plane through multiple dimensions.

Uploaded by

xiaohui qi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

Multiple Linear Regression & Nonlinear Regression Models

1) Multiple linear regression allows modeling of a response variable's relationship to multiple explanatory variables simultaneously. It extends simple linear regression, which considers only one explanatory variable. 2) The multiple linear regression model estimates coefficients for each explanatory variable while controlling for the effects of the other variables. This allows assessing the independent effect of each predictor on the response variable. 3) Least squares estimation is used to calculate the intercept and slope coefficients by minimizing the sum of squared residuals between the actual and predicted response values. This provides the best-fitting regression plane through multiple dimensions.

Uploaded by

xiaohui qi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Multiple Linear Regression

1
The General Idea

Simple regression considers the relation


between a single explanatory variable and
response variable

2
The General Idea

Multiple regression simultaneously considers the


influence of multiple explanatory variables on a
response variable Y
The intent is to look at
the independent effect
of each variable while
“adjusting
adjusting out”
out the
influence of potential
confounders
Y = βo + β1 X1 + β2 X 2 + ... + β p X p + ε

3
Regression Modeling

A simple
p regression
g
model (one independent
variable) fits a regression
line in 2
2-dimensional
dimensional
space

A multiple regression
model with two
explanatory variables fits
a regression plane in 3-
dimensional space

4 4
Y = βo + β1 X1 + β2 X2 + ... + β p X p + ε

intercept Partial residuals


R
Regressioni
Coefficients
P ti l R
Partial Regression
i C Coefficients
ffi i t ((slopes):
l )
Regression coefficient of X after controlling
for (holding all other predictors constant)
influence of other variables from both X and
Y.
5
Multiple Regression Model
Intercept α predicts
where the regression
plane crosses the Y
axis
Slope for variable X1
(β1) predicts the
change in Y per unit X1
holding X2 constant
The slope
p for variable
X2 (β2) predicts the
change in Y per unit X2
holding X1 constant

6 6
Common variance explained
by X1 and X2
Unique variance explained by
X2

X2
X1

Unique variance explained by Y


X1 Variance NOT
explained by X1 and X2
7
Simple Regression Model

Regression coefficients are estimated by


minimizing ∑residuals2 (i.e., sum of the squared
residuals) to derive this model:

The standard error of the regression (sY|x) is


based on the squared residuals:

8
Multiple Regression Model

Again, estimates for the multiple slope


coefficients are derived by minimizing ∑residuals2
to derive this multiple regression model:

Again, the standard error of the regression is


based on the ∑residuals2:

9
Polynomial Model

yˆ = b0 + b1 x + b2 x + " + br x
2 r

Linear in-parameter, linear model


Special case of multiple linear regression if
setting x1 = x, x2 = x 2 ,", xr = x r

10
Estimating coefficients

11
The matrix algebra of

Ordinary Least Square


⎡1 x11 x12 " x k 1 ⎤
⎢1 x x 22 " x k 2 ⎥⎥
Intercept and Slopes: X = ⎢ 12
⎢# # # # ⎥
⎢ ⎥
β = ( X ' X ) X 'Y −1 ⎣1 x1 n x 2 n " x kn ⎦

Predicted Values:

Y ′ = Xβ
Residuals:

Y −Y′
12
Example 12.3

13
Example 12.3

14
Regression Statistics
How good is our model?

SST = ∑ (Y − Y ) 2

SSR = ∑ (Y ′ − Y ′) 2

SSE = ∑ (Y − Y ′) 2

SST=SSR+SSE

15
The Regression Picture

ŷi = βxi + α
yi
C A

B
y
B y
A
C
yi

*Least squares
x estimation gave us the
n n n line (β) that minimi
minimized
ed
∑i=1
( y i − y ) 2
= ∑
i=1
( yˆ i − y ) 2
+ ∑
i=1
( yˆ i − y i ) 2 C2

R2=SSreg/SStotal
A2 B2 C2
SStotal SSreg SSresidual
Total squared distance of Distance from regression line to naïve mean of Variance around the regression line
observations from naïve mean of y y Additional variability not explained
Total variation Variability due to x (regression) 16 squares method aims
by x—what least
to minimize
ANOVA
H 0 : β 1 = β 2 = ... = β k = 0
H A : βi ≠ 0 att least
l t one!!

dff SS MS F P-value

Regression k SSR SSR / df MSR / MSE P(F)

Residual n-k-1 SSE SSE / df

Total n1
n-1 SST

If P(F)<α then we know that we get significantly better prediction of Y from


the regression model than by just predicting mean of Y.
Y

ANOVA to test significance of regression


17
If we revisit example 12 12.3
3 and make ANOVA
f=30.98 and the p-value is less than 0.0001
H
How tto interpret
i t t the
th resultlt
Regression is significant
Thi model
This d l iis nott th
the onlyl model
d l th
thatt can b
be
used to explain the data
Th model
The d l may have
h been
b more effective
ff ti withith
inclusion or deletion of variables

18
Regression Statistics

SSE SSR
R = 1−
2
=
SST SST
Coefficient of multiple Determination
to judge the adequacy of the regression model

Drawback
D b k off thi
this concept:
t one can always
l iincrease th
the value
l
of Coefficient of determination by including more independent
variables

19
Regression Statistics
MSE

/(n−k−1) n−1 2
SSE
R =1−
2
=1− (1−R)
/(n−1) n−k−1
adj
SST
n = sample size
k = number of independent variables

djusted R2 a
Adjusted are
e not
ot b
biased!
ased
20
Revisit example 12.3

21
Properties of least squares
i
estimator
Under model assumption that random
errors ε1, ε 2 ,", ε k are iid, we have
b0 , b1 ,", bk are unbiased estimator of
regression coefficients β , β , " , β 0 1 k

The elements of matrix ( X ′X ) σ display −1 2

the variance of estimators on the main


diagonal and covariance on the off off-
diagonal σ = c σ
2
bi ii
2

σ b b = cov ( bi , b j ) = C ij σ 2 , i ≠ j
i j

22
Regression Statistics
Standard Error for the regression model

S e = S = σˆ
2
e
2

SSE SSE = ∑ (Y − Y ′) 2
S =
2

n − k −1
e

S e = MSE
2

23
Hypotheses Tests for Regression
C ffi i t
Coefficients

H 0 : β i = β i0
H 1 : β i ≠ β i0

bi − βi 0 bi − βi 0
t( n − k −1) = =
Se (bi ) 2
Se Cii S 2
e
S xx

24
Considering the importance of X3 in example
12.3.
H 0 : β3 = 0
H1 : β3 ≠ 0
We test by using t-distribution with 9 dof.

j
We can not reject y
the null hypothesis
Variable is insignificant in the presence of other
regressors in the model

25
Confidence Interval on Regression
C ffi i t
Coefficients

bi − tα / 2,( n − k −1) S Cii ≤ β i ≤ bi + tα / 2,( n − k −1) S Cii


2
e
2
e

Confidence Interval for βι

26
Hypotheses Tests for Regression
Coefficients: F test

Regression sum of squares if one variable X1


is removed from the regression model

1| 2, 3, … , 2, 3, … ,

H 0 : β1 = 0 Example 12
12.3:
3:

H 1 : β1 ≠ 0
1| 2, 3, … ,
2

Compare it with
27
Hypotheses Tests for Regression
Coefficients:
ff F test
H 0 : β1 = β 2 = 0
H 1 : β 1 ≠ 0 , or β 2 ≠ 0

C
Comparing
i iit with
ih

28
Confidence Interval on mean response
T-statistic with n-k-1 degrees of freedom

0 10, 20 ,…, 0
1
0 0

A 100(1-α)%
100(1 )% confidence
fid iinterval
t l ffor th
the mean response

1
0 //2 0 0 10, 20 ,,…,, 0

1
0 /2 0 0

29
30
Confidence Interval on observed
response
T-statistic

0 0
1 1
0 0

A 100(1-α)%
100(1 )% confidence
fid iinterval
t l ffor th
the mean response

1 1
0 /2 0 0 0

1 1
0 /2 0 0

31
32
Orthogonality

Designed experiment wherein the


variables Xp and Xq is orthogonal

Contribution of one individual


variable in explaining the
variance is readily given.

33
34
35
Qualitative variables

Qualitative variables provide information on discrete characteristics

The number of categories taken by qualitative variables is general


small.

These can be
Th b numerical
i l values
l b
butt each
h number
b d denotes
t an
attribute – a characteristic.

A qualitative variable may have several categories


Two categories: male – female
Three categories: nationality (French, German, Turkish)
More than three categories: sectors (car, chemical, steel, electronic equip., etc.)

36
Qualitative variables
There are several ways to code qualitative variables with n
categories
Using one categorical variables
Producing
g n - 1 dummyy variables

A dummy variable is a variable which takes values 0 or 1.


We also call them binary variables
We also call dichotomous variables

37
38
Stepwise regression

Avoiding predictors (Xs) that do not contribute


significantly
i ifi tl tto model
d l prediction
di ti

- Forward selection
The ‘best’ predictor variables are entered, one by one.

- Backward elimination
The ‘worst’ predictor variables are eliminated, one by one.

39
Forward selection

STEP 1. Do simple linear regressions of y vs. each x variable


individually. Select the x variable with the largest value of . (Suppose it is
X1.)
Step 2: Do all possible 2-variable regressions in which one of the two
variables is X1.
Choose the variable that when inserted gives the largest increase in
(Suppose it is X2.)
Step 3: Do all possible 3-variable regressions in which two of the three
variables are X1 and X2.
Choose the variable that gives the largest increase of
Repeat the process until the most recent variable inserted fails to induce a
significant increase in the explained regression. Such an increase can be
determined by using appropriate F-test or T-test.

40
41
42
Why use logistic regression?

ƒ There are many important research topics for


which the dependent variable is "limited
"limited.""
ƒ For example: voting, marketing, and
participation data is not continuous or
distributed normally.
ƒ Binary logistic regression is a type of
regression analysis where the dependent
variable is a dummy variable: coded 0 (did
not vote) or 1(did vote)

43
The Linear Probability Model

In the ordinary least squares regression:


Y=α+ β βX + є ; where Y = ((0, 1))
ƒ є is not normally distributed because Y
takes on only two values
ƒ The predicted probabilities can be greater
than 1 or less than 0

44
The Logistic Regression Model

The "logit" model solves these problems:

ln[p/(1-p)] = α + βX + e

ƒ p is the probability that the event Y occurs,


p(Y=1)
ƒ p/(1-p) is the "odds ratio"
ƒ ln[p/(1-p)]
l [ /(1 )] iis th
the llog odds
dd ratio,
ti or "l
"logit"
it"

45
More:
ƒ The logistic distribution constrains the
estimated probabilities to lie between 0 and 1.
ƒ The estimated probability is:

exp(-α - β X)]
p = 1/[1 + exp(

ƒ if you let α + β X =0,


=0 then p = .50
50
ƒ as α + β X gets really big, p approaches 1
ƒ as α + β X gets really small
small, p approaches 0

46
What if β=0 or infinity 47
Maximum Likelihood Estimation
(MLE)
ƒ MLE is a statistical method for estimating the
coefficients of a model.
ƒ The likelihood function ((L)) measures the
probability of observing the particular set of
dependent variable values (p1, p2, ..., pn) that
occur in the sample:
L = Prob (p1* p2* * * pn)
ƒ The
Th higher
hi h the h L L, the
h hi
higher
h the
h probability
b bili off
observing the ps in the sample.

48
ƒ MLE involves finding the coefficients (α, β)
that makes the log of the likelihood function
(LL < 0) as large as possible
ƒ Or,
O finds
f the coefficients
ff that make -2 times
the log of the likelihood function (-2LL) as
small as possible
ƒ The maximum likelihood estimates can be
solved by differentiating the log of likelihood
function with respect to α, β and setting the
partial derivatives equal
p q to zero

49
Interpreting Coefficients

ƒ Since:
[p ( p)] = α + βX + e
ln[p/(1-p)]
The slope coefficient (β) is interpreted as the
rate of change in the "log odds" as X changes
… not very useful
useful.
ƒ Since:
exp(-α - β X)]
p = 1/[1 + exp(

The marginal effect of a change in X on the


probability
b bilit iis:

50
ƒ An interpretation of the logit coefficient which
is usually more intuitive is the "odds ratio"
ƒ Since:
/( ) = exp((α + βX))
[p/(1-p)]
exp(β) is the effect of the independent
variable on the "odds
odds ratio"
ratio

51

You might also like