0% found this document useful (0 votes)
11 views6 pages

ps_lregression

Uploaded by

100521964
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views6 pages

ps_lregression

Uploaded by

100521964
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Ignacio Cascos Fernández

Departamento de Estadı́stica
Universidad Carlos III de Madrid

Problem sheet. Linear Regression — 2013–2014.

Problem 1. The wind speed is positively correlated with the production rate of a wind farm.
In a sample of size 200, the mean win speed was 5.5 m/s and the mean production rate was
45% of capacity. If the farm is currently working at 60% of capacity, what value would you
expect for the wind speed? Explain your answer.

 an approximate value of 5.5;

 a value greater than 5.5;

 an approximate value of 5.5 + 1 = 6.5;

 a value smaller than 5.5.

Problem 2. In a linear regression problem, we only consider lines through the origin of
coordinates and want to determine the one among them that best predicts Y with information
from X,
yi = αxi + ui i = 1, . . . , n,
where the random component satisfies ui ∼ N(0, σ).
Find the least squares estimator of α.

Problem 3. In 1877, Galton studied the relationship between the diameter of a pea and the
mean diameters of its descendants. The measurements that he obtained (inches×100) are
shown in the table below:

Ascendant diameter 21 20 19 18 17 16 15
Descendants diameter 17.26 17.07 16.37 16.40 16.13 16.17 15.98

a) Determine the regression line that predicts the descendants diameter as a function of
the parents diameter.

b) Does the parents diameter contribute significantly to the regression model? S(β̂1 ) =
0.0386

c) Predict the mean diameter of the descendants of a pea whose mean diameter is 19.68.

d) Predict the diameter of the ascendant of a pea whose mean diameter is 16.68.

1
n
X n
X n
X
n = 7, xi yi = 2082.72 , xi = 126 , x2i = 2296 ,
i=1 i=1 i=1
n
X n
X
yi = 115.38 , yi2 = 1903.236 , t5,0.025 = 2.5706 .
i=1 i=1

Problem 4. a) Consider the regression line y = 3 + 0.2x. If x is increased by one unit, by


how many units will y be increased?
b) Consider the regression line ln y = 3 + 0.2x. If x is increased by one unit, by what
percentage will y be increased?
c) Consider the regression line ln y = 3 + 0.2 ln x. If x is increased by 1%, by what
percentage will y be increased?
d) Consider the regression line y = 3 + 0.2 ln x. If x is increased by 1%, by how many
units will y be increased?
e) We have two different populations with the same variables x and y. For the first
population, we have obtained the regression line ln y = 3.1 + 0.2 ln x, while for the
second, the regression line that we have obtained is ln y = 3 + 0.2 ln x. By what
percentage does y increase on an individual from the second population, with respect
to another from the first population, with the same value of x?

Problem 5. An electric utility company aims to predict the monthly electric energy con-
sumption per household (y in kwh) based on the house size (x in m2 ) by means of a linear
regression model of the type
y = β0 + β1 x + u .
Use the matrix approach to linear regression to estimate the regression coefficients.
Household size Electricity
(m2 ) (kwh)
50 700
80 900
100 1000

What is the design matrix of the model y = α0 + α1 x + α2 x2 + u?

Problem 6. We have studied the relation of the consumption of pork with some other vari-
ables. Based on a sample of size 200, we have fitted the following multiple linear regression
model:
ln y = −0.243 − 0.562 ln x1 + 0.327 ln x2 + 0.219 ln x3 − 0.217 ln x4 ,
with:

2
y = consumption of pork,
x1 = price of pork,
x2 = price of beef,
x3 = price of chicken,
x4 = family income.
The estimated standard errors of the regression coefficients were:
S(β̂1 ) = 0.219 ; S(β̂2 ) = 0.161 ; S(β̂3 ) = 0.157 ; S(β̂4 ) = 0.082 .
a) What variables do contribute significantly to the regression model? (α = 0.05)
b) Provide a numerical interpretation of the model coefficients.

Problem 7. The weekly oil consumption (m3 ) and the mean outside temperature (degrees
Celsius) in a household were observed 26 weeks before and 18 after installing a thermal
insulator.

Three regression models to predict the oil consumption in terms of the outside temper-
ature (temp), and the dummy variable (insulator) that assumes value 1 before the thermal
insulator was installed and value 0 after it was installed, are presented below:
Model I: 3889 − 861(290) temp R2 = 0.17 ;
Model II: 4922 + 1795(104) insulator − 368(19) temp R2 = 0.92 ;
Model III: 4951 + 2263(173) insulator − 249(40) temp + 149(80) insulator × temp
R2 = 0.94 .
Where the numbers in parenthesis are the estimated standard errors of the regression coef-
ficients.

3
a) Does the thermal insulator have any influence on the oil consumption? Explain your
answer.

b) How would you interpret the coefficient of variable insulator × temp in Model III?

c) Which model would you choose?, why?

d) In the chosen model, how much does the oil consumption decrease with thermal insu-
lator, if the outer temperature is 4 degrees?

Problem 8. We are studying the effect of several diet pills on a group of patients. The
response variable to our experiment is the weight loss (Y ) and as covariates (independent
variables), we have age (A), sex (S), and three different diet pills, P1 , P2 , and P3 . Each
patient follows a treatment with one of such pills.

a) Write the (simplest) multiple linear regression model to quantifies the effect of the
covariates.

b) How would you extend the model from a) in order to detect wether the influence of
age on the weight loss in women is different from the influence of age on the weight
loss in men? Write the corresponding multiple linear regression model.

Problem 9. A group of 360 students took an exam that consisted of 40 True/False questions.
Each of the questions was worth one point and all the students answered each single question.
The number of correct answers (Y ) and the number of wrong answers (X) of each student
was computed. If we fit a simple linear regression model with Y as response variable and X
as independent variable, what are the values of β̂0 , β̂1 , R2 , and S 2 (e)?

Problem 10. In 1965, data on the connection between radioactive waste exposure and cancer
mortality was published. The data was collected from 9 counties that were located near an
Atomic Energy Commission facility in Hanford, Washington (USA).
The data give the index of exposure and the cancer mortality rate during 1959-1964
for the nine counties affected. Higher index of exposure values represent higher levels of
contamination. The variable Exposure is the index of exposure and the variable Mortality
is the cancer mortality per 100000 people over a one-year period.
> summary(lm(Mortality~Exposure))
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 114.716 8.046 14.258 1.98e-06
Exposure 9.231 1.419 6.507 0.000332

Multiple R-Squared: 0.8581, Adjusted R-squared: 0.8378

4
a) Which is the dependent variable?, and the independent one? Write the equation of the
regression line.

b) Is there a significant linear relationship between Mortality and Exposure? Provide a


null and alternative hypothesis, a test statistic, a p-value, and a conclusion.

c) What is the expected mortality rate for a county with an exposure index of 3?

d) Interpret the estimated slope of the fitted model.

e) What is the percentage of variability of the response variable explained by the model?

f) If the average Mortality in the nine counties was 157.34, what was the average exposure
index in the nine counties? If it is not possible to compute with the given information,
explain why.

Problem 11. Researchers speculate that the level of a particular type of chemical found in a
patient’s blood affects the size of a hepatocellular carcinoma. Experimenters take a random
sample of 25 patients and both assess the size of their tumours (cm) and test for the levels
of this chemical in their blood (mg/L). The mean chemical level was found to the 45 mg/L.
A simple linear regression is fitted; the R output is below
> summary(lm(y~x))
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 10.2981 0.05134
x -0.15123 0.00987

Multiple R-Squared: 0.895, Adjusted R-squared: 0.874

a) What is the response variable? What is the explanatory variable?

b) Is there evidence, at 5% level, that the chemical level affects tumour size? (provide
null and alternative hypothesis, test statistic, bound for the p-value, and conclusion)

c) Suppose the chemical level in a patient’s blood is 25 mg/L. What do you expect the
tumour size to be?

d) What is the percentage of variability of the response variable explained by the model?

Problem 12. The relationship between forced expiratory volume (FEV), which is measured
in liters, and age, which is measured in years, is evaluated in a random sample of 31 men
between the ages of 20 and 60. A simple linear regression analysis is performed to predict
FEV from age. The following results are published in a paper.

5
> summary(lm(log(FEV)~log(age)))
Coefficients:
Estimate Std.Error
(Intercept) 1.8 0.3
age -0.05 0.02

a) Given the above results, can you ascertain whether the linear relationship between
FEV and age is significant? Why or why not? Write the regression model with the
estimations of the parameters and specify which is the dependent variable and the
independent one.

b) Interpret the coefficient of age in words.

You might also like