Multiple Linear Regression & Nonlinear Regression Models
Multiple Linear Regression & Nonlinear Regression Models
1
The General Idea
2
The General Idea
3
Regression Modeling
A simple
p regression
g
model (one independent
variable) fits a regression
line in 2
2-dimensional
dimensional
space
A multiple regression
model with two
explanatory variables fits
a regression plane in 3-
dimensional space
4 4
Y = βo + β1 X1 + β2 X2 + ... + β p X p + ε
6 6
Common variance explained
by X1 and X2
Unique variance explained by
X2
X2
X1
8
Multiple Regression Model
9
Polynomial Model
yˆ = b0 + b1 x + b2 x + " + br x
2 r
10
Estimating coefficients
11
The matrix algebra of
Predicted Values:
Y ′ = Xβ
Residuals:
Y −Y′
12
Example 12.3
13
Example 12.3
14
Regression Statistics
How good is our model?
SST = ∑ (Y − Y ) 2
SSR = ∑ (Y ′ − Y ′) 2
SSE = ∑ (Y − Y ′) 2
SST=SSR+SSE
15
The Regression Picture
ŷi = βxi + α
yi
C A
B
y
B y
A
C
yi
*Least squares
x estimation gave us the
n n n line (β) that minimi
minimized
ed
∑i=1
( y i − y ) 2
= ∑
i=1
( yˆ i − y ) 2
+ ∑
i=1
( yˆ i − y i ) 2 C2
R2=SSreg/SStotal
A2 B2 C2
SStotal SSreg SSresidual
Total squared distance of Distance from regression line to naïve mean of Variance around the regression line
observations from naïve mean of y y Additional variability not explained
Total variation Variability due to x (regression) 16 squares method aims
by x—what least
to minimize
ANOVA
H 0 : β 1 = β 2 = ... = β k = 0
H A : βi ≠ 0 att least
l t one!!
dff SS MS F P-value
Total n1
n-1 SST
18
Regression Statistics
SSE SSR
R = 1−
2
=
SST SST
Coefficient of multiple Determination
to judge the adequacy of the regression model
Drawback
D b k off thi
this concept:
t one can always
l iincrease th
the value
l
of Coefficient of determination by including more independent
variables
19
Regression Statistics
MSE
/(n−k−1) n−1 2
SSE
R =1−
2
=1− (1−R)
/(n−1) n−k−1
adj
SST
n = sample size
k = number of independent variables
djusted R2 a
Adjusted are
e not
ot b
biased!
ased
20
Revisit example 12.3
21
Properties of least squares
i
estimator
Under model assumption that random
errors ε1, ε 2 ,", ε k are iid, we have
b0 , b1 ,", bk are unbiased estimator of
regression coefficients β , β , " , β 0 1 k
σ b b = cov ( bi , b j ) = C ij σ 2 , i ≠ j
i j
22
Regression Statistics
Standard Error for the regression model
S e = S = σˆ
2
e
2
SSE SSE = ∑ (Y − Y ′) 2
S =
2
n − k −1
e
S e = MSE
2
23
Hypotheses Tests for Regression
C ffi i t
Coefficients
H 0 : β i = β i0
H 1 : β i ≠ β i0
bi − βi 0 bi − βi 0
t( n − k −1) = =
Se (bi ) 2
Se Cii S 2
e
S xx
24
Considering the importance of X3 in example
12.3.
H 0 : β3 = 0
H1 : β3 ≠ 0
We test by using t-distribution with 9 dof.
j
We can not reject y
the null hypothesis
Variable is insignificant in the presence of other
regressors in the model
25
Confidence Interval on Regression
C ffi i t
Coefficients
26
Hypotheses Tests for Regression
Coefficients: F test
1| 2, 3, … , 2, 3, … ,
H 0 : β1 = 0 Example 12
12.3:
3:
H 1 : β1 ≠ 0
1| 2, 3, … ,
2
Compare it with
27
Hypotheses Tests for Regression
Coefficients:
ff F test
H 0 : β1 = β 2 = 0
H 1 : β 1 ≠ 0 , or β 2 ≠ 0
C
Comparing
i iit with
ih
28
Confidence Interval on mean response
T-statistic with n-k-1 degrees of freedom
0 10, 20 ,…, 0
1
0 0
A 100(1-α)%
100(1 )% confidence
fid iinterval
t l ffor th
the mean response
1
0 //2 0 0 10, 20 ,,…,, 0
1
0 /2 0 0
29
30
Confidence Interval on observed
response
T-statistic
0 0
1 1
0 0
A 100(1-α)%
100(1 )% confidence
fid iinterval
t l ffor th
the mean response
1 1
0 /2 0 0 0
1 1
0 /2 0 0
31
32
Orthogonality
33
34
35
Qualitative variables
These can be
Th b numerical
i l values
l b
butt each
h number
b d denotes
t an
attribute – a characteristic.
36
Qualitative variables
There are several ways to code qualitative variables with n
categories
Using one categorical variables
Producing
g n - 1 dummyy variables
37
38
Stepwise regression
- Forward selection
The ‘best’ predictor variables are entered, one by one.
- Backward elimination
The ‘worst’ predictor variables are eliminated, one by one.
39
Forward selection
40
41
42
Why use logistic regression?
43
The Linear Probability Model
44
The Logistic Regression Model
ln[p/(1-p)] = α + βX + e
45
More:
The logistic distribution constrains the
estimated probabilities to lie between 0 and 1.
The estimated probability is:
exp(-α - β X)]
p = 1/[1 + exp(
46
What if β=0 or infinity 47
Maximum Likelihood Estimation
(MLE)
MLE is a statistical method for estimating the
coefficients of a model.
The likelihood function ((L)) measures the
probability of observing the particular set of
dependent variable values (p1, p2, ..., pn) that
occur in the sample:
L = Prob (p1* p2* * * pn)
The
Th higher
hi h the h L L, the
h hi
higher
h the
h probability
b bili off
observing the ps in the sample.
48
MLE involves finding the coefficients (α, β)
that makes the log of the likelihood function
(LL < 0) as large as possible
Or,
O finds
f the coefficients
ff that make -2 times
the log of the likelihood function (-2LL) as
small as possible
The maximum likelihood estimates can be
solved by differentiating the log of likelihood
function with respect to α, β and setting the
partial derivatives equal
p q to zero
49
Interpreting Coefficients
Since:
[p ( p)] = α + βX + e
ln[p/(1-p)]
The slope coefficient (β) is interpreted as the
rate of change in the "log odds" as X changes
… not very useful
useful.
Since:
exp(-α - β X)]
p = 1/[1 + exp(
50
An interpretation of the logit coefficient which
is usually more intuitive is the "odds ratio"
Since:
/( ) = exp((α + βX))
[p/(1-p)]
exp(β) is the effect of the independent
variable on the "odds
odds ratio"
ratio
51