Properties and LINE Conditions
Further Topics...
1 Four “LINE” conditions of a simple linear regression model
2 Math Formulas of b0 (intercept) and b1 (slope)
3 Properties of b0 and b1
4 Estimation of σ 2 (population variance)
Simple Linear Regression Model Four “LINE” Conditions
A simple linear regression model for a data set (xi , Yi ) is defined as
Yi = β0 + β1 xi + εi , i = 1, . . . , n.
Four conditions for a simple linear regression model:
1 The mean of the response, E(Yi ) = β0 + β1 xi is a Linear function of the xi .
2 The errors, εi , are Independent.
3 The errors, εi , at each value of the predictor, xi , are Normally distributed.
4The errors, εi , at each value of the predictor, xi , have Equal variances
(denoted σ 2 ).
We are studying “LINE” in this course.
Least Squares Estimates: b0 (Estimate of β0) and b1
(Estimate of β1)
In the previous lecture, we talked about a data set of 10 students, and we have
heights (h) and weights (w) of the 10 students.
The “best fitting line” is shown in the following plot: the intercept b0 = −266.53 and
the slope b1 = 6.14.
(75, 208)
200
(73, 181)
180
Weight
Y = 158.8
160
140
(63, 127)
120
(64, 121) x = 69.3
64 66 68 70 72 74
Height
By differentiation of the least squares criterion
n
X
Q= [Yi − (b0 + b1 xi )]2
i=1
we can get
n
X
(Yi − b0 − b1 xi ) = 0
i=1
n
X
xi (Yi − b0 − b1 xi ) = 0
i=1
Solving the two equations in the previous slide, we get
n
P
(xi − x̄)(Yi − Ȳ )
i=1 Sxy
b1 = n =
P Sxx
(xi − x̄)2
i=1
b0 = Ȳ − b1 x̄
1 Because the formulas for b0 and b1 are derived using the least squares
criterion, the resulting equation
Ŷi = b0 + b1 xi
is often referred to as the “least squares regression line,” or simply the
“least squares line.”
2 Re-arranging the terms in the formula
b0 = Ȳ − b1 x̄,
we can get
Ȳ = b0 + b1 x̄,
which means that the least squares line passes through the point (x̄, Ȳ ).
Some Notations
We use the notations:
1 Sum of squares for x:
n
X n
X
2
Sxx = (xi − x̄) = x2i − nx̄2
i=1 i=1
2 Sum of squares for Y :
n
X n
X
2
Syy = (Yi − Ȳ ) = Yi2 − nȲ 2
i=1 i=1
3 Cross-product sum of squares:
Xn n
X
Sxy = (xi − x̄)(Yi − Ȳ ) = xi Yi − nx̄Ȳ
i=1 i=1
4 Sample mean for x:
n
P
xi
i=1
x̄ =
n
5 Sample mean for Y :
n
P
Yi
i=1
Ȳ =
n
What Does b0 and b1 Tell Us?
1 b0 is the predicted response value when x = 0.
1. In the example of 10 students’ height and weight, b0 tells us that a person who
is 0 inches tall is predicted to weigh -267 pounds, which is not meaningful.
2. This happened because we “extrapolated” beyond the “scope of the model”
(the range of the x values).
2 b1 is the estimate of the change in mean response value E(Y ) for every
additional one-unit increase in the predictor x.
1. In the example of 10 students’ height and weight, b1 tells us that we predict the
mean weight to increase by 6.14 pounds for every additional one-inch increase
in height.
2. In general, we can expect the mean response to increase or decrease by b1
units for every one unit increase in the predictor x.
Understanding the Slope b1
1 If we study the formula for the slope b1 :
n
P
(xi − x̄)(Yi − Ȳ )
i=1
b1 = n
P
(xi − x̄)2
i=1
we see that the denominator is necessarily positive since it only involves
summing positive terms.
2 Therefore, the sign of the slope b1 is solely determined by the numerator.
3 The numerator tells us, for each data point, to sum up the product of two
distances – the distance of the x value from x̄ (the mean of all of the x values)
and the distance of the Y value from Ȳ (the mean of all of the Y values).
When is the Slope b1 > 0?
1 Is the trend in the following plot positive, i.e., as x increases, Y tends to increase?
2 If the trend is positive, then the slope b1 must be positive.
3 The vertical dashed line is x̄. The horizontal dashed line is Ȳ .
(75, 208)
200
(73, 181)
180
Weight
Y = 158.8
160
140
(63, 127)
120
(64, 121) x = 69.3
64 66 68 70 72 74
Height
When is the Slope b1 < 0?
1 Is the trend in the following plot negative, i.e., as x increases, Y tends to decrease?
2 If the trend is negative, then the slope b1 must be negative.
3 The vertical dashed line is x̄. The horizontal dashed line is Ȳ .
Skin Cancer Mortality versus Latitude
220
(33, 219)
Mortality (Deaths per 10 million)
200
180
160
(34.5, 160)
Y = 152.9
140
(43, 134)
120
100
x = 39.5
(44.8, 86)
30 35 40 45
Latitude (at center of state)
Estimation of σ 2 (Unknown Population Variance)
Why should we care about σ 2 ? – One reason is that we want to predict future
response from an estimated regression line.
We have two thermometer brands (A) and (B). The predictor is Celsius and the
response is Fahrenhelt. Will this thermometer brand (A) or brand (B) yield more
precise future predictions?
(A) (B)
120
120
100
100
Fahrenheit
80
80
60
60
40
40
20
0 10 20 30 40 50 0 10 20 30 40 50
Celsius Celsius
Review of Sample Variance
When there is no predictor x, we use Ȳ to estimate E(Y ), and we use the sample
variance s2 to estimate σ 2 .
The sample variance:
n
(Yi − Ȳ )2
P
i=1
s2 =
n−1
0.3
Probability density
0.2
0.1
0.0
-4 -2 0 2 4
In the simple linear regression setting when there is a predictor x. At each x
value, there is a sub-group of data points, and we use
Ŷi = b0 + b1 xi
to estimate
E(Yi ) = β0 + β1 xi .
Population of 200 Students Sample of 20 Students
+
+
College entrance test score
16
+
population regression line +
sample regression line +
20
14
+
+
+
+ +
12
+
15
10
+
+
10
+
6
+
5
+
4
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0
High school gpa High school gpa
Mean Square Error M SE in Simple Linear Regression
The mean square error:
n
(Yi − Ŷi )2
P
i=1
M SE =
n−2
1 The numerator again adds up, in squared units, how far each response yi is
from its estimated mean Ŷi .
2 The denominator divides the sum by n − 2, because we effectively estimate
two parameters - the population intercept β0 and the population slope β1 .
That is, we lose two degrees of freedom.
3 It can be shown that E(M SE) = σ 2 , i.e., MSE is an unbiased estimator of σ 2 .
We can write it as σ̂ 2 = MSE.