0% found this document useful (0 votes)
64 views18 pages

hw3 Spring2024 Solution

Uploaded by

bellance xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views18 pages

hw3 Spring2024 Solution

Uploaded by

bellance xavier
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Econ3005: Solution for Applied Econometrics, Spring 2024

Homework #3
Do not copy and paste the answers from your classmates. Two identical homework
will be treated as cheating. Do not copy and paste the entire output of your statistical
package's. Report only the relevant part of the output. Please also submit your R-script
for the empirical part. Please put all your work in one single le and upload via Moodle.

Part 1 Multiple Choice (24 points, 3 each)

Please choose the answer(s) that you think is(are) appropriate.

(Non-graded excercises) A nonlinear function

a. makes little sense, because variables in the real world are related linearly.

b. can be adequately described by a straight line between the dependent variable

and one of the explanatory variables.

c. is a concept that only applies to the case of a single or two explanatory variables

since you cannot draw a line in four dimensions.

d. is a function with a slope that is not constant.

Answer: d

(Non-graded excercises) The binary variable interaction regression

a. can only be applied when there are two binary variables, but not three or more.

b. is the same as testing for dierences in means.

c. cannot be used with logarithmic regression functions because is not dened.

d. allows the eect of changing one of the binary independent variables to depend

on the value of the other binary variable.

Answer: d

(Non-graded excercises) The interpretation of the slope coecient in the model

Yi = β0 + β1 ln(Xi ) = ui is as follows:

a. 1% change in X is associated with a β1 % change in Y.

b. 1% change in X is associated with a change in Y of 0.01β1 .


c. change in X by one unit is associated with a 100β1 % change in Y.

d. change in X by one unit is associated with a β1 change in Y.

1
Answer: b

1.1 In the regression model Yi = β0 + β1 Xi + β2 Di + β3 (Xi × Di ) + ui , where X is

a continuous variable and D is a binary variable, to test that the two regressions are

identical, you must use the

a. t-statistic separately for β2 = 0, β3 = 0.


b. F-statistic for the joint hypothesis that β0 = 0, β1 = 0.
c. t-statistic separately for β3 = 0.
d. F-statistic for the joint hypothesis that β2 = 0, β3 = 0 .

Answer: d

1.2 To test whether or not the population regression function is linear rather than

a polynomial of order r,

a. check whether the regression for the polynomial regression is higher than that of

the linear regression.

b. compare the TSS from both regressions.

c. look at the pattern of the coecients: if they change from positive to negative

to positive, etc., then the polynomial regression should be used.

d. use the test of (r-1) restrictions using the F-statistic.

Answer: d

1.3 The major aw of the linear probability model is that

a. the actuals can only be 0 and 1, but the predicted are almost always dierent

from that.

b. the regression R2 cannot be used as a measure of t.

c. people do not always make clear-cut decisions.

d. the predicted values can lie above 1 and below 0.

Answer: d

1.4 The following tools from multiple regression analysis carry over in a meaningful

manner to the probit model, with the exception of the

a. F-statistic.

b. signicance test using the t-statistic.

c. 95% condence interval using 1.96 times the standard error.

d. regression R2 .

2
Answer: d

1.5 When estimating probit and logit models,

a. the t-statistic should still be used for testing a single restriction.

b. you cannot have binary variables as explanatory variables as well.

c. F-statistics should not be used, since the models are nonlinear.

d. it is no longer true that the R̄2 < R2


Answer: a

1.6 In the binary dependent variable model, a predicted value of 0.6 means that

a. the most likely value the dependent variable will take one is 60 percent.

b. given the values for the explanatory variables, there is a 60 percent probability

that the dependent variable will equal one.

c. the model makes little sense, since the dependent variable can only be 0 or 1.

d. given the values for the explanatory variables, there is a 40 percent probability

that the dependent variable will equal one.

Answer: b

1.7 In the expression , P r(Y = 1|X1 ) = Φ(β0 + β1 X) ,

a.(β0 + β1 X) plays the role of z in the cumulative standard normal distribution

function.

b. β1 cannot be negative since probabilities have to lie between 0 and 1.


c.β0 cannot be negative since probabilities have to lie between 0 and 1.

d. min(β0 + β1 X) > 0 since probabilities have to lie between 0 and 1.

Answer: a

1.8 The following problems could be analyzed using probit and logit estimation with

the exception of whether or not

a. a college student decides to study abroad for one semester.

b. being a female has an eect on earnings.

c. a college student will attend a certain college after being accepted.

d. applicants will default on a loan.

Answer: b

3
Part 2 Short Questions (29 points in total)

Note: for each sub-question, the answer should not be longer than 7 lines.

(Non-graded exercise) Dr. Qin would like to analyze the Return to Education and

the Gender Gap. The equation below shows the regression result using the 2005 Cur-

rent Population Survey. lnEearnings refer to the logarithem of the monthly earnings;

educ refers to the year of education; DF emme is a dummy variable, if the individual

is female, =1; exper is the working experience, measured by year; M idwest, South
and W est are dummy variables indicating the residence regions, while Northeast is the

ommited region. Interpret the major results(discuss the estimates for all variables and

also address the question that Dr. Qin wants to analyze.

ˆ
LnEarnings = 1.215 + 0.0899 × educ − 0.521 × DF emme + 0.0180 × (DF emme × educ)
(0.018) (0.0011) (0.022) (0.0016)
+0.0232 × exper − 0.000368 × exper2 − 0.058 × M idwest − 0.0078 × South − 0.030 × W est
(0.0008) (0.000018) (0.006) (0.006) (0.006)
¯
n = 57, 863 R2 = 0.242

Answer: The return to education for males is approximately 9% higher for 1 more

year education, and the estimate is statistically signicant at 1% level. For females, the

return of education is slightly higher, approximately 11% (0.0899+0.018). Since the bi-

nary variable for females is interacted with the number of years of education, the gender

gap depends on the number of years of education. For the typical high school graduate

(12 years of education), the gender gap is approximately 30%(-0.521+0.018*12=-0.3),

while for the typical college graduate (16 years of education) the gender gap narrows to

23% (-0.52+0.018*16). The potential experience variable enters in an inverted U-shape,

which is to be expected given the shape of age-earnings proles and the fact that poten-

tial experience depends on the age of the individual. There is a declining marginal value

for each year of potential experience until it eventually becomes negative. Northeast is

the omitted region, and all other regions have lower (log) earnings, ranging from 0.8%

in the South to 5.8% in the Midwest. All coecients are statistically signicant.

(15 points) 2.1 Sports economics typically looks at winning percentages of sports

teams as one of various outputs, and estimates production functions by analyzing the

relationship between the winning percentage and inputs. In Major League Baseball

(MLB), the determinants of winning are quality pitching and batting. All 30 MLB

teams for the 1999 season. Pitching quality is approximated by Team Earned Run

4
Average (teamera), and hitting quality by On Base Plus Slugging Percentage (ops).

Your regression output is:

W inpct = −0.19 − 0.099 × teamera + 1.49 × ops, R2 = 0.92


(0.08) (0.008) (0.126)

(a) (5 points) Interpret the regression. Are the results statistically signicant and

important?

Answer: Lowering the team ERA by one results in a winning percentage increase of

roughly ten percent. Increasing the OPS by 0.1 generates a higher winning percentage

of approximately 15 percent. The regression explains 92 percent of the variation in

winning percentages. Both slope coecients are statistically signicant, and given the

small dierences in winning percentage, they are also important.

(b) (8 points) There are two leagues in MLB, the American League(AL) and the

National League (NL). One major dierence is that the pitcher in the AL does not

have to bat. Instead there is a designatedhitter in the hitting line-up. You are

concerned that, as a result, there is a dierent eect of pitching and hitting in the AL

from the NL. To test this Hypothesis, you allow the AL regression to have a dierent

intercept and dierent slopes from the NL regression. You therefore create a binary

variable for the American League (DAL) and estiamte the following specication:

W inpct = −0.29 + 0.10 × DAL − 0.100 × teamera + 0.008 × (DAL × teamera)


(0.12) (0.24) (0.008) (0.018)
+1.622 ∗ ops − 0.187 ∗ (DAL × ops)
(0.163) (0.160) R2 = 0.92

How should you interpret the winning percentage for AL and NL? Can you tell the

dierent eect of pitching and hitting between AL and NL? If so, how much?

Answer: For AL, lowering the team ERA by one results in a winning percentage

increase of 9.2 (-0.1+0.008) percents, while the number for NL is 10 percents. Increasing

the OPS by 0.1 will increase the winning percentage by 14.35 (0.1622-0.0187) percent

for AL but 16 percent for NL. However, the coecient estimates of both interaction

terms are not statistically signicant. It is dicult to conclude that there is dierent

eect of pitching and hitting between Al and NL.

(2 points) (c) You remember that sequentially testing the signicance of slope coef-

cients is not the same as testing for their signicance simultaneously. Hence you ask

your regression package to calculate the F-statistic that all three coecients involving

5
the binary variable for the AL are zero. Your regression package gives a value of 0.35.

Looking at the critical value from the F-table, can you reject the null hypothesis at the

1% level? Should you worry about the small sample size?

Answer: The critical value of the F-statistic is 3.78 at the 1% level, and hence you

cannot reject the null hypothesis, that all three coecients are zero. However, the

sample size is too small (30 is much smaller than 100) and thus the F-statistic is not

really distributed as F3,∞ , and, as a result, inference is problematic here.

2.2 A study analyzed the probability of Major League Baseball (MLB) players to

survive for another season, or, in other words, to play one more season. The re-

searchers had a sample of 4,728 hitters and 3,803 pitchers for the years 1901-1999. All

explanatory variables are standardized. The probit estimation yielded the results as

shown in the table:


Regression (1) Hitters (2) Pitchers
Regression model probit probit
constant 2.010 1.625
(0.030) (0.031)
number of seasons played -0.058 -0.031
(0.004) (0.005)
performance 0.794 0.677
(0.025) (0.026)
average performance 0.022 0.100
(0.033) (0.036)

where the limited dependent variable takes on a value of one if the player had one

more season (a minimum of 50 at bats or 25 innings pitched), number of seasons played

is measured in years, performance is the batting average for hitters and the earned run

average for pitchers, and average performance refers to performance over the career.

(Note that all variables are standardized, so that the mean is zero, and the variance is

1 )

(4 points) (a) Interpret the two probit equations and calculate survival probabilities

for hitters and pitchers at the sample mean. Why are these so high?

Answer: Note that all variables are standardized, so that the mean is zero. This

results in a survival probability of 0.978 (Φ(2.01) = 0.9778) for hitters and 0.948

(Φ(1.63) = 0.9484) for pitchers. These results are so high because there is a high

probability, in general, for a player to return the following season.

(4 points) (b) Calculate the change in the survival probability for a player who has

6
a very bad year by performing two standard deviations below the average (assume also

that this player has been in the majors for many years so that his average performance

is hardly aected). How does this change the survival probability when compared to

the answer in (a)?

Answer: Since the variables are standardized, this implies a change of two for the

performance variable. The result for hitters is a lowering of the survival probability to

0.66 (Φ(2.01− 2 ∗ 0.794) = Φ(0.42) = 0.6628), and for pitchers to 0.61 (Φ(1.625 −2∗
0.677) = Φ(0.27) = 0.6064).
(6 points) (c) Since the results seem similar, the researcher could consider combining

the two samples. Explain in some detail how this could be done and how you could

test the hypothesis that the coecients are the same.

Answer: After combining the sample for hitters and pitchers, you would allow for

a dierent intercept and slopes by introducing a binary variable for pitchers if hitters

are the default. This binary variable would be introduced by itself and in combination

with each of the above variables, thereby allowing all coecients to dier. You could

then conduct an F-test for the joint hypothesis that all coecients involving the binary

variables are zero. If the hypothesis cannot be rejected, then there is no dierence

between the coecients for hitters and pitchers.

Part 3 Long Questions (47 points in total)

Note: for each sub-question, the answer should not be longer than 10 lines.

(32 points) 3.1 Use the data set CollegeDistance.dta and read the description le

CollegeDistance_DataDescription.pdf to answer the following questions.

(3 points) (a) Run a regression of ed on dist, female, black, hispanic, dadcoll, mom-

coll, tuition and report your result. Interpret the coecient for tuition. Does it makes
sense?

(3 points) (b) Run a regression of ln(ed) on dist, female, black, hispanic, dadcoll,

momcoll, ln(tuition) and report your result. Interpret the coecient for tuition. Does it
make sense? (Note, ln(ed) is the (natural) logarithem of ed , ln(tuition) is the (natural)
logarithem of tuition.
(6 points) (c) If we are interested in the causal eect of tuition on years of education

completed. Considering the available variables in the data, what are the variables that

7
might cause the omitted variables bias? Justify your answer by both economic logic

and empirical evidence (i.e. regressions or tests).

(4 points) (d) After additing the possible omitted variables (in(c)), what does the

coecient of tuition or ln(tuition) mean?

(6 points) (e) Now we are interested in the eect of dist and parents' education on

years of education completed. Generate a dummy variable for those whose fathers are

not college graduates (named as dadnoncoll ) and a dummy variable for those whose

mothers are not college graduates (named as momnoncoll ). Run a regression of ed on

dist, female, black, hispanic, dadnoncoll, momnoncoll, tuition and report your result.
Interpret the coecients for dist, dadnoncoll and momnoncoll.

(4 points) (f ) Whether the eect of dist on ed depends on dad's education and

mom's education background? Use regression(s) and test(s) to justify your discussion.

(6 points) (g) Now we are interested in the eect of the ethic groups on years of

education completed. Base on regression in (a), how to interpret such eect? Does this

eect depend on parents' education background? If so, how? Justify your answer by

regression(s)/test(s).

Answer: The regression results are reported in Table 1.

Table 1

8
Model 1 Model 2 Model 3 Model 4 Model 5 Model 6
(Intercept) 13.530 *** 2.608 *** 2.585 *** 15.189 *** 13.238 *** 13.518 ***
(0.117) (0.004) (0.006) (0.136) (0.133) (0.119)
dist -0.047 *** -0.003 *** -0.003 *** -0.047 *** -0.049 *** -0.047 ***
(0.013) (0.001) (0.001) (0.013) (0.015) (0.013)
female 0.043 0.003 0.005 0.043 0.066 0.045
(0.056) (0.004) (0.004) (0.056) (0.056) (0.056)
black -0.371 *** -0.025 *** -0.019 *** -0.371 *** -0.288 *** -0.356 ***
(0.069) (0.005) (0.005) (0.069) (0.070) (0.075)
hispanic -0.012 0.001 0.006 -0.012 0.058 0.013
(0.085) (0.006) (0.006) (0.085) (0.086) (0.094)
dadcoll 0.992 *** 0.071 *** 0.061 *** 0.759 *** 1.018 ***
(0.080) (0.006) (0.006) (0.100) (0.094)
momcoll 0.667 *** 0.047 *** 0.042 *** 0.667 *** 0.670 ***
(0.091) (0.006) (0.006) (0.118) (0.109)
tuition 0.149 0.149 0.124 0.154
(0.103) (0.103) (0.103) (0.103)
lntuition 0.013 ** 0.012 **
(0.006) (0.006)
incomehi 0.029 *** 0.407 ***
(0.005) (0.069)
ownhome 0.018 *** 0.248 ***
(0.005) (0.072)
dadnoncoll -0.992 ***
(0.080)
momnoncoll -0.667 ***
(0.091)
ddedu 0.069 *
(0.036)
dmedu -0.051
(0.052)
bdedu -0.147
(0.248)
bmedu 0.047
(0.241)
hdedu -0.070
(0.237)
hmedu -0.134
(0.313)
N 3796 3796 3796 3796 3796 3796
Fstatistics 1.9949 0.1854
Pr(>F) 0.1362 0.9461
R2 0.108 0.109 0.121 0.108 0.121 0.109
Standard errors are heteroskedasticity robust. *** p < 0.01; ** p < 0.05; * p < 0.1.

9
(a) The result is reported in column (1). The coecient of tuition is 0.149, suggest-

ing that holding other variables constant, when the average state 4yr college tuition is

$1000 higher, the years of education completed is 0.149 year higher. It makes sense in

reality, since high average tuition usually means the large amount of excellent univer-

sities, people prefer more years of education. However, the estimate is not statistically

signicant and we cannot reject that the eect is actually zero.

(b) The result is reported in column (2). The coecient of lntuition is 0.013. Holding

other variables constant, when the average state 4 yr college tuition increased by 1%,

the years of education completed is increased by 0.013%. The estimate is statistically

signicant at 5% level.

(c) Variables indicating the income level might cause the omitted variable bias. Two

variables are found to represent for the income level, incomehi and ownhome. Adding

the two variables into the regression, we re-estimate it and report the result in column

(3). Both coecients of the two variables are positive and statistically signicant, while

the coecient of lntuition becomes smaller and the signicance level is also lower. It

suggests that, without the two income related variables, there is a positive bias. The

economic logic is that, tuition fees in rich regions are higher while households in rich

regions tend to invest more in children's education.

(d) In column (3), the coecient 0.012 means that when the average state 4 yr

college tuition increaes by 1%, the years of education completed will be increased by

0.012%. The estimate is statistically signicant at 5% level.

(e) The result is reported in column (4). The coecient of dist is -0.047, which

suggests that if the individual lives 10 miles closer to a 4yr college, his/her years of

education completed will be 0.047 year higher. For those whose fathers are not college

graduates, their years of education completed is 0.99 year lower; for those whose mothers

are not college graduates, their years of education completed is 0.67 year lower. All

these estiamtes are statistically signicant at 1% level.

(f ) In column (5), I add the interaction terms between dist and dadcoll (ddedu) ,

between dist and momcoll (dmedu) into the regression model. (It is totally ne for you

to choose any model as the baseline model to add these interaction terms) As the result

shows, only the interaction term between dist and dadcoll is statistically signicant at

10%, which suggests the impact of distance to a 4yr college might depends on daddy's

education but not moms' education. To further investigate, I conduct a F test to test

whether the distance to a 4yr college depends on either parents' education or not,

i.e., the coecients of both interaction terms are jointly equal to zero. The p vaue is

10
0.1362, suggesting that I cannot reject the null hypothesis that the eect of dist does

not depend on parents' education.

(g) By the result of (a), the coecient of black is -0.371, which means that given

other factors the same, if the individual is black, the years of education completed is

0.371 year less. To test whether the eect of ethic groups depends on parents' education,

I add four interaction terms into the regression in (a) (you can also use regression from

bmedu(black ∗momcoll), bdedu(black ∗dadcoll), hdedu(hispanic∗dadcoll),


b, or c or d):

hmedu(hispanic ∗ momcoll). Estimates of all four coecients are not statistically sig-
nicant. Then, I conduct a joint hypothesis test to test whether the four coecients

are jointly equal to zero. F statistics is 0.185 and p value is 0.946 (as reported in the

bottom panel of table 1), which suggests that we cannot reject the hypothesis that the

eect of ethic groups does not depend on parents education.

3.2 We try to study health insurance, health status, age, and employment using

a random sample of more than 8000 workers in the United States surveyed in 1996.

Please download the data set insurance.dta from Moodle to nish the question. Here

is the description of the related variables:

Insured : health insurance binary variable 1=Insured


healthy : self reported health status binary variable 1= Healthy
self emp : employment binary variable 1 = Self Employed
age : age in years
age2 : age^2
f amilysz : family size
male : sex binary variable 1= Male
married : married binary variable 1 = married
deg _nd : education binary variable 1 = No degree
deg _ged : education variable 1 = GED (High School Equivalent)
deg _hs : education variable 1 = High School
deg _ba : education variable 1 = Bachelor
deg _ma : education variable 1 = Masters
deg _phd : education variable 1 = Ph.D
deg _oth : education variable 1 = other
race_bl : race binary variable 1 = Black
race_wht : race binary variable 1 = White
race_ot : race binary variable 1 = Other than Black or White

11
For the following questions, please use observations from those who report their

health status as healthy only.

Answer: All regression results are presented in the following table

12
Model 1 Model 2 Model 3 Model 4
(Intercept) 0.3391 *** -0.5272 *** -0.3157 *** 0.4527 ***
(0.0577) (0.1968) (0.1125) (0.0299)
selfemp1 -0.1795 *** -1.2452 *** -0.7091 *** -0.2822 ***
(0.0144) (0.0858) (0.0492) (0.0319)
age 0.0097 *** 0.0260 *** 0.0151 *** 0.0036 ***
(0.0028) (0.0032) (0.0018) (0.0004)
age2 -0.0001 **
(0.0000)
familysz -0.0183 *** -0.1041 *** -0.0595 *** -0.0183 ***
(0.0033) (0.0208) (0.0121) (0.0033)
male -0.0399 *** -0.3069 *** -0.1646 *** -0.0395 ***
(0.0082) (0.0634) (0.0355) (0.0082)
married 0.1441 *** 1.0212 *** 0.5754 *** 0.1348 ***
(0.0104) (0.0731) (0.0409) (0.0104)
deg_ged 0.1470 *** 0.6801 *** 0.4106 *** 0.1485 ***
(0.0288) (0.1488) (0.0877) (0.0287)
deg_hs 0.2444 *** 1.2873 *** 0.7625 *** 0.2461 ***
(0.0169) (0.0835) (0.0493) (0.0168)
deg_ba 0.3072 *** 1.8568 *** 1.0765 *** 0.3123 ***
(0.0178) (0.1126) (0.0631) (0.0177)
deg_ma 0.3256 *** 2.2679 *** 1.2812 *** 0.3282 ***
(0.0195) (0.2030) (0.1029) (0.0194)
deg_phd 0.3548 *** 2.5232 *** 1.4108 *** 0.3553 ***
(0.0270) (0.3858) (0.1929) (0.0272)
deg_oth 0.2819 *** 1.5989 *** 0.9232 *** 0.2853 ***
(0.0207) (0.1456) (0.0810) (0.0206)
race_wht1 0.0306 ** 0.2192 **
(0.0137) (0.0919)
race_ot1 -0.0248 -0.1751
(0.0271) (0.1784)
reg_ne -0.0130 -0.1226 -0.0641 -0.0114
(0.0116) (0.1020) (0.0562) (0.0116)
reg_so -0.0449 *** -0.3857 *** -0.2093 *** -0.0443 ***
(0.0105) (0.0879) (0.0486) (0.0105)
reg_we -0.0556 *** -0.4365 *** -0.2396 *** -0.0541 ***
(0.0120) (0.0932) (0.0520) (0.0120)
race_wht 0.1287 ** 0.0292 **
(0.0526) (0.0137)
race_ot -0.1077 -0.0256
(0.1021) (0.0271)
selfemp1:married 0.1384 ***
(0.0354)
N 8173 8173 8173 8173
R2 0.1484 0.1503
AIC 6727.8681 6868.5232 6865.1613 6709.6267
BIC 6861.0314 6987.6692 6984.3073 6842.7899
Pseudo R2 0.2164 0.2170
Standard errors are heteroskedasticity robust. *** p < 0.01; ** p < 0.05; * p < 0.1.

13
(4 points) (a) Estimate a linear probability model with insured as the depen-

dent variable and the following regressors: selfemp age age2 deg_ged deg_hs deg_ba

deg_ma deg_phd deg_oth race_wh race_ot reg_ne reg_so reg_we male married.
How does health insurance status vary with age? Is there a nonlinear relationship

between the probability of being insured and age?

Answer: The coecient on linear term age is positive while the coecient on

quardratic term age2 is negative. The probability of being insured is higher as age

increaes, and the eect of a change in age on health insurance status is declining with

age. The eect is greater for young people than for old people. There is a nonlin-

ear relationship between probability of being insured and age since the coecient on

quadratic term age2 is statistically signicant at 5% level.

(4 points) (b) Now please get rid of the variable age2 and estimate a logit model

using the left regressors. How does health insurance status vary with age by this model?

Are the self-employed less likely to have health insurance than wage earners? How does

a white individuals dier with the black individual in terms of having insurance? (Note:

race_bl is missing from the regression.)

Answer: The marginal eects are estimated in the following table.

From the marginal eect calculation, if the individual is one year older, the prob-

ability for this individual to have health insurance is higher by 0.3%. Compared with

14
wage earners, given other factors the same, self-employed individuals are 19.8% less

likely to have health insurance. Compared with a black individual and given other fac-

tors the same, if the individual is white, the probability for him/her to have the health

insurance is 3% higher. All these estimate are statistically signicant at 1% level.

(4 points) (c) Estimate a probit model using the same regressors as in (b). In

terms of having health insurance, how do the white individuals who aged at 25 behave

dierently when he/she is self-employed? How about the white individuals who aged

at 35 if he/she is self-employed? Is the eect of self-employment on insurance dierent

for married workers than for unmarried workers?

Answer: The regression outcome is reported in the third column of the regression

table. And the following tables present the marginal eects based on the probit model.

15
16
For the white individuals who aged at 25, the probability for them to have an health

insurance is 22% lower if they are self-employed, compared with wage earners. For the

white individuals who aged at 35, compared with wage earner, the probability for

them to have an health insurance is 20% lower if they are self-employed. For married

individuals, the probability to have an health insurance is 17.3% lower if they are

self-employed, compared with wage earner. The number is 23.7% lower for unmarried

individuals. Therefore, there is a 6.4% dierence of the eect of self-employment on

17
insurance between married and unmarried individuals.

(3 points) (d) Use a linear probability model to answer the question: Is the eect of

self-employment on insurance dierent for married workers than for unmarried workers

? Is your answer consistent with the answer in (c)?

Answer: The result is reported in the fourth column of the regression table. The

estimate for the interaction term between self-employment and the marriage status is

13.8%, statistically signicant at 1% level. It suggests that the dierence of eect of

self-employment on insurance between married and unmarried individuals is 13.8%,

which is much larger than (c) suggests.

18

You might also like