AM Sample Questions-Statsthon
AM Sample Questions-Statsthon
The labor union at a factory recently received some complaints that women workers were
treated with unfairly lower wage compared to men. In order to investigate if gender
discrimination existed, the labor union selected a sample with 100 workers, and collected
information about their hourly wage, education and gender. In Table 1, wage is the hourly
wage of a worker in dollars; educ is the total year of education for a worker; female indicates
whether a worker is female (=1) or male (=0).
With the information in Table 5.1, how can the gender inequality be tested? Experts at the
labor union suggested using regression with dummy variable. In general, following steps are
involved.
ܽ ᗈᖥ绐 (1)
In Equation (1), is the error term. and are the coefficients to be estimated. They
could be calculated by using Equations (2).
ᗈᖥ绐 ᗈᖥ绐 ܽ ܽ
ᗈᖥ绐 ᗈᖥ绐 (2)
ܽ ᗈᖥ绐
In Equations (2), ᗈᖥ绐 and ܽ represent the total year of education and hourly wage
for worker . ᗈᖥ绐 and ܽ represent the mean of ᗈᖥ绐 and ܽ for all workers in
the sample. Hence, by using data in Table 1, Equation (3) is obtained.
ܽ ൌ ͵ ൌ ᗈᖥ绐 (3)
ܽ ᗈᖥ绐 ܽ (4)
In Equation (4), represents the difference of wage directly resulting from gender
difference. If , then being female makes a worker inferior in wage. If , then
otherwise. By using data in Table 1, Equation (5) is obtained.
ܽ ൌ ൌ ᗈᖥ绐 ൌ ܽ (5)
ܽ ൌ ൌ ᗈᖥ绐 ܽ
(6)
ܽ ൌ ൌ ᗈᖥ绐 ܽ
You may imagine why the experts didn’t add another variable ܽ (=1 if a worker is male;
=0 otherwise) to the base model? This is redundant. As ܽ ܽ , ܽ is a
complete linear function of ܽ . If both ܽ and ܽ are included in the base model,
the problem of multicollinearity occurs. This is the so-called dummy variable trap. So, if there
are 3 categories (i.e., male, female, transgender) in gender, how many dummy variables
should be included? The answer is 2.
ܽ ܽ
ܽ ܽ (7)
ݐ ݎ ݐ ݎ
A third dummy variable can cause the problem of multicollinearity. Accordingly, if a worker
is female, then for her, ܽ ܽ ; if a worker is male, then for him, ܽ
ܽ ; if a worker is transgender, then for him or her, ܽ ܽ .
In Equation (8), represents the difference of wage resulting from the interaction between
gender and education. In other words, for female and male, the influence of education on
wage is supposed to be different in Equation (8). By using data in Table 1, Equation (9) is
obtained.
Unfortunately, in Equation (9), all estimated coefficients are not statistically significant,
except ൌ . Thus, at this factory, the impact of education on wage is not
influenced by gender, and the interaction term doesn’t need to be included in the model.
To conclude, in order to test the influence of a categorical factor, dummy variable needs to be
added into the model as a main factor and/or interaction term. In this example, gender is a
categorical factor, which have 2 categories. So, experts at the factory included one dummy
variable into the base model. If a categorical factor has categories, then dummy
variables should be added. Moreover, in reality, more dependent variables can be put into the
base model to increase the credibility of model.
3. By using sample from another factory, the estimated in Equation (4) equals to
-1.5243 and its p-value is 0.000. What does it mean?
A. The hourly wage of male workers is 1.5243 dollars less than that of female workers
on average.
B. The hourly wage of female workers is 1.5243 dollars less than that of male workers
on average.
C. The hourly wage of male workers with higher education is 1.5243 dollars less than
that of female workers on average.
D. The hourly wage of female workers with higher education is 1.5243 dollars less than
that of female workers on average.
4. According to Equation (9) in the material, which of the following statements is correct?
(1) For female workers, ܽ ൌ ꋌ ൌ ͵ ᗈᖥ绐
(2) For male workers, ܽ ൌ͵ ൌ ᗈᖥ绐
(3) For female workers, ܽ ൌ͵ ൌ ᗈᖥ绐
(4) For male workers, ܽ ൌ ꋌ ൌ ͵ ᗈᖥ绐
A. Only (1)
B. Only (3)
C. Both (1) and (2)
D. Both (3) and (4)
5. If the race of workers in the material could be categorized into White, African, Asian,
Hispanic and Others, how many dummy variables should be included into the base model
in order to investigate the influence of race on workers’ wage?
A. 1
B. 5
C. 4
D. 6
9. How do you think dummy variable ݐshould be added into the base model?
A. Only as main factor
B. Only as interaction term with education
C. Both main factor and interaction term with education
D. Not included into the base model
10. Do you think if non-white workers are faced with race discrimination at the factory?
(1) Yes, because the direct impact of race on wage is significant.
(2) Yes, because the impact of education on wage is larger for white workers.
(3) No, because both lines are increasing monotonously.
A. Only (1)
B. Only (2)
C. Only (3)
D. Both (1) and (2)