MidtermII Preparation Questions
MidtermII Preparation Questions
Example Questions
2) Let’s say we have two regression models, i) The constrained model, yi = β0 + β1 xi + ui and ii)
The unconstrained model, yi = γ0 + γ1 xi + γ2 zi + γ3 wi + vi . Compared with the unconstrained
regression, estimation of a least squares regression under a restriction (say γ2 = γ3 = 0) will result
in a higher R2 if the constraint is true.FALSE
3) I run a regression of y on x and I save the residuals ϵ = y − ŷ. If I find that Cov(x, e) = 0, I
have right to conclude that the variable x was exogenous in my regression. TRUE
4) We will be faced with the perfect multicollinearity if we attempt to use both x and x3 as
regressors. FALSE
5) A good regression model will always have an adjusted-R2 above 0.5. FALSE
7) If we were to exclude the variable X and run a regression of Y on a column of ones, we will end
up with the intercept. TRUE
8) The value of R2 is the leading indicator to evaluate the econometric models. FALSE
9) Including an irrelevant variable in a model has no effect on the unbiasedness of the intercept
and other slope estimators. TRUE
Use this part to answer Q10-Q11. Consider a 2-variable regression model, Yi = βˆ0 + βˆ1 X1i + βˆ2 X2i +
2
ûi . The variance of β̂2 is given by σβ̂2 = P(X σ−X̄ )2 1−p
1
2 where p212 is correlation coefficient for
2 2i 2 12
variables 1 and 2.
10) The standard error of β̂2 decreases if the sample variation of X2 small. FALSE
11) The latter term in σβ̂2 is called variance inflation factor which is high when the absolute value
2
of correlation among regressors is low. FALSE
12) Heteroscedasticity does not alter the unbiasedness and consistency properties of OLS estima-
tors. TRUE
13) The presence of multicollinearity implies the regression outcomes suffer from omitted variable
bias. False
14) For a given sample, average daily wage for females is 60 TL while the average daily wage for
males 74. Consider the following regression model Yi = β0 + β1 M alei + εi where the M alei =1 for
males, zero otherwise. Then β1 is -14. False
15) Consider the regression model: Yi = β0 + β1 Xi + εi . Rescaling the independent variable Xi via
multiplying with a constant does not affect the standard error of the intercept. True
16) Consider the regression model: Yi = β0 + β1 Xi + εi . The slope of the estimated regression line,
β1 and the correlation coefficient among X and Y , ρXY have the same sign. True
17) When the variance of error terms is homoscedastic, using the robust standard errors can cause
bias in estimated parameters. False
Yi = β0 + β1 Xi + ui
Yi = α0 + α1 Xi + α2 Zi + vi
Zi = γ0 + γ1 Xi + κi
where β0 > 0, β1 < 0, α0 > 0, α1 < 0, α2 < 0, γ0 > 0 and γ1 > 0. Then, α1 > β1 . True
Page 2
Other Questions
Question Consider a regression model to explain the salaries of CEOs in terms of annual firm
sales:
salary = β0 + β1 sales + β2 roe + β3 negros + ε
where salary is CEO’s salary in thousands of TL, sales is the firm’s sales in millions of TL, roe
is firm’s return on equity and negros is a dummy variable which is equal to 1 if return on firm’s
stock is negative.
a) State whether each of salary, sales, roe, negros, β0 , β1 , β2 and β3 is parameter (P), dependent
variable (DV) or independent variable (IDV): salary : ..., sales : ..., roe : ..., negros : ..., β0 : ...,
β1 : ..., β2 : ..., β3 : ....
b) Interpret the coefficients. What is the impact of sales on salary? What is the impact of
roe on salary?
c) Do you think that the regression model could possibly suffer from outlier problem? If it
does, what would be your solution?
d) Your friend claims that roe and negros are likely to be correlated. What happens if the
correlation among those variables is high? What happens if the absolute value of the correlation
coefficient is 1? What happens if the absolute value of the correlation coefficient is 0?
e) Do you think the variance of the error terms (of your regression model) are constant across
the values of the independent variables? Does it create a problem if the variance of error terms
varies? If it does, how would you solve that problem?
f ) Why should/should not I include posros, a dummy variable which is equal to 1 if return on
firm’s stock is positive to the regression model?
Page 3
Question A team of environmental researchers is undertaking a comprehensive study to explore
the relationship between car usage and air quality in cities of various sizes. The study focuses on
PM2.5 levels, small particulate matters that significantly impact health and are often associated
with vehicle emissions. To account for other factors that may affect air quality, the researchers
include additional variables in their analysis. The amount of green space in each city is considered,
recognizing its potential role in air purification and pollution reduction. Moreover, the study exam-
ines the impact of environmental regulations, which vary across cities and can profoundly influence
pollution levels through policy measures. The dataset for the study is comprehensive, encompassing
a broad spectrum of cities, from densely populated metropolises to smaller towns. This diversity
provides a rich context for analyzing how urban scale and characteristics interplay with vehicular
emissions and other factors to affect air quality. The below table provides the coefficient estimates
and standard error of coefficients in parantheses.
a) What are your expectations regarding the signs of the coefficients for independent variables
in their relation to PM2.5 levels? Do you anticipate a positive or negative relationship, and why?
b) How would you interpret the coefficient of per capita car usage for different models?
c) How do amount of green space and environmental regulations appear to affect air quality
based on their coefficients?
d) Examine the t-values for each of the coefficients in your regression models. Which variables
are statistically significant? How do these statistics support or challenge your prior expectations?
e) Aside from the variables already included, what other factors might influence the relationship
between car usage and air quality? Consider aspects like public transportation availability, urban
layout, industrial emissions, or seasonal variations.
Page 4
R Questions
a) Set the working directory and import the data into R environment.
b) Check the first 10 observation of the data set to understand whether there is some problem
with the data format, missing values etc.
c) Estimate the following regression model: Yi = β0 + β1 Xi + εi , store and summarize the
regression outcome.
d) Suppose that you want to drop the intercept from the regression model in part (c). Estimate
the regression model without an intercept.
e) Suppose that R is not working well and does not report the R2 . Assume that you have stored
the fitted values (of previous regression
P
model) and assigned the name, fitvals. Calculate the R2
(Ŷ −Ȳ )
using the formula R2 = ESST SS =
Pi i where Ŷi is fitted values and Ȳi is the sample mean.
i Yi −Ȳ
f ) Calculate the correlation matrix for {X, Z, W, K}
g) From the correlation matrix in part (d), you observe that the correlation coefficient among
{X, K} is 0.92. Then, you decide to drop K to avoid multicollinearity problem and run the following
multiple regression model: Yi = β0 + β1 Xi + β2 Zi + β3 Wi + εi .
h) Suppose that you store the coefficient β2 as CoefZ and the variance of the coefficient β2 , σβ22
as Sigma2CoefZ. First, calculate the standard deviation of β2 . Then, calculate the t-value for β2 .
Page 5