0% found this document useful (0 votes)
29 views7 pages

AM Sample Questions-Statsthon

The labor union at a factory investigated potential gender wage inequity by analyzing a sample of 100 workers. Experts built a base regression model of hourly wage predicted by years of education. A dummy variable was then added to indicate female (1) or male (0) gender. The results showed education significantly predicted higher wages but being female predicted wages $1.91 lower than males, indicating unfair treatment of female workers. Adding an interaction term showed gender did not influence the effect of education on wages.

Uploaded by

Saad Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views7 pages

AM Sample Questions-Statsthon

The labor union at a factory investigated potential gender wage inequity by analyzing a sample of 100 workers. Experts built a base regression model of hourly wage predicted by years of education. A dummy variable was then added to indicate female (1) or male (0) gender. The results showed education significantly predicted higher wages but being female predicted wages $1.91 lower than males, indicating unfair treatment of female workers. Adding an interaction term showed gender did not influence the effect of education on wages.

Uploaded by

Saad Hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Sample Questions

Statsthon - Academic Marathon in Statistics

How to Identify Wage Inequity with Dummy Variable?

The labor union at a factory recently received some complaints that women workers were
treated with unfairly lower wage compared to men. In order to investigate if gender
discrimination existed, the labor union selected a sample with 100 workers, and collected
information about their hourly wage, education and gender. In Table 1, wage is the hourly
wage of a worker in dollars; educ is the total year of education for a worker; female indicates
whether a worker is female (=1) or male (=0).

With the information in Table 5.1, how can the gender inequality be tested? Experts at the
labor union suggested using regression with dummy variable. In general, following steps are
involved.

(1) Build a Base Model


Before introducing dummy variable, they built a base model to explain how wage is impacted.
Undoubtedly, the wage of a worker can be influenced by many factors. However, to simplify
the model, they only used ᗈᖥ绐 as the dependent, so the base model was a univariate
regression.

ܽ ᗈᖥ绐 (1)

In Equation (1), is the error term. and are the coefficients to be estimated. They
could be calculated by using Equations (2).

ᗈᖥ绐 ᗈᖥ绐 ܽ ܽ
ᗈᖥ绐 ᗈᖥ绐 (2)
ܽ ᗈᖥ绐

In Equations (2), ᗈᖥ绐 and ܽ represent the total year of education and hourly wage
for worker . ᗈᖥ绐 and ܽ represent the mean of ᗈᖥ绐 and ܽ for all workers in
the sample. Hence, by using data in Table 1, Equation (3) is obtained.

ܽ ൌ ͵ ൌ ᗈᖥ绐 (3)

In Equation (3), ൌ ͵ and its p-value is 0.221, indicating is not statistically


significant. However, ൌ and its p-value is 0.000, indicating is statistically
significant. It means that with one-year rise of education, the hourly wage of workers increase
by 0.9834 dollars on average. Graph 1 (a) depicts the relationship between education and
wage, which is consistent with Equation (3).

(2) Add Dummy Variable as Main Factor


After building the base model, experts at the factory decided to include ܽ as a dummy
variable. By definition, dummy variable is a numeric variable representing categorical data.
But, how can it be included into the base model?

ܽ ᗈᖥ绐 ܽ (4)

In Equation (4), represents the difference of wage directly resulting from gender
difference. If , then being female makes a worker inferior in wage. If , then
otherwise. By using data in Table 1, Equation (5) is obtained.

ܽ ൌ ൌ ᗈᖥ绐 ൌ ܽ (5)

In Equation (5), ൌ and its p-value is 0.470, indicating is not statistically


significant. However, ൌ and its p-value is 0.000, indicating is statistically
significant. It means that with one-year rise of education, the hourly wage of workers increase
by 0.9834 dollars on average. ൌ and its p-value is 0.043, indicating is also
statistically significant with 0.05 as the level of significance. It means that the hourly wage of
female workers is 1.9103 dollars less than male workers on average. Hence, if both male and
female workers do the same type of work with the same amount, it can be assumed that
female workers are treated with unfairly lower wage. Equation (5) can also be expressed as:

ܽ ൌ ൌ ᗈᖥ绐 ܽ
(6)
ܽ ൌ ൌ ᗈᖥ绐 ܽ

You may imagine why the experts didn’t add another variable ܽ (=1 if a worker is male;
=0 otherwise) to the base model? This is redundant. As ܽ ܽ , ܽ is a
complete linear function of ܽ . If both ܽ and ܽ are included in the base model,
the problem of multicollinearity occurs. This is the so-called dummy variable trap. So, if there
are 3 categories (i.e., male, female, transgender) in gender, how many dummy variables
should be included? The answer is 2.

ܽ ܽ
ܽ ܽ (7)
‫ݐ‬ ‫ݎ‬ ‫ݐ‬ ‫ݎ‬

A third dummy variable can cause the problem of multicollinearity. Accordingly, if a worker
is female, then for her, ܽ ܽ ; if a worker is male, then for him, ܽ
ܽ ; if a worker is transgender, then for him or her, ܽ ܽ .

(3) Add Dummy Variable into Interaction Term


In Equation (6), ܽ can only influence the intercept of the line in Graph 5.1(a), but not
the slope. In other words, the impact of education on wage is not influenced by gender. In
order to further investigate the influence of ܽ on the slope, it needs to be added into an
interaction term with ᗈᖥ绐.

ܽ ᗈᖥ绐 ܽ ܽ ᗈᖥ绐 (8)

In Equation (8), represents the difference of wage resulting from the interaction between
gender and education. In other words, for female and male, the influence of education on
wage is supposed to be different in Equation (8). By using data in Table 1, Equation (9) is
obtained.

ܽ ൌ͵ ൌ ᗈᖥ绐 ൌ ͵ ܽ ൌ ܽ ᗈᖥ绐 (9)

Unfortunately, in Equation (9), all estimated coefficients are not statistically significant,
except ൌ . Thus, at this factory, the impact of education on wage is not
influenced by gender, and the interaction term doesn’t need to be included in the model.

Figure 5.1 - Impact of Education on Wage (Female vs. Male)


Graph 5.1(b) depicts the relationship across gender. The intercept for female is significantly
lower than that for male, but the slopes are not significantly different. This finding is
consistent with Equation (6).

To conclude, in order to test the influence of a categorical factor, dummy variable needs to be
added into the model as a main factor and/or interaction term. In this example, gender is a
categorical factor, which have 2 categories. So, experts at the factory included one dummy
variable into the base model. If a categorical factor has categories, then dummy
variables should be added. Moreover, in reality, more dependent variables can be put into the
base model to increase the credibility of model.

Table 5.1 - Wage, Education and Gender of Workers


No. ܽ ᗈᖥ绐 ܽ No. ܽ ᗈᖥ绐 ܽ No. ܽ ᗈᖥ绐 ܽ
1 9.00 10 0 36 12.65 17 1 71 11.25 12 0
2 5.50 12 0 37 11.71 16 1 72 10.00 10 1
3 3.80 12 1 38 13.00 12 0 73 4.00 13 0
4 10.50 12 1 39 12.50 15 1 74 13.51 18 0
5 15.00 12 0 40 22.50 16 0 75 6.75 12 0
6 9.00 16 1 41 11.25 16 1 76 14.29 14 1
7 9.57 12 1 42 15.38 16 0 77 9.75 15 1
8 15.00 14 0 43 7.70 13 1 78 3.35 12 1
9 11.00 8 0 44 11.84 12 0 79 3.40 8 1
10 5.00 12 1 45 5.00 14 0 80 5.62 14 1
11 24.98 17 0 46 6.50 12 0 81 7.00 12 0
12 20.40 17 0 47 9.00 8 0 82 10.00 16 1
13 25.00 14 0 48 3.35 12 0 83 6.10 10 1
14 13.98 14 0 49 4.50 12 1 84 4.17 12 1
15 3.50 12 0 50 10.53 14 1 85 4.80 12 1
16 5.00 14 0 51 3.50 9 0 86 5.75 12 0
17 10.00 16 0 52 15.00 16 0 87 4.75 12 0
18 15.00 16 0 53 8.50 12 1 88 9.37 17 0
19 5.83 13 0 54 9.17 12 1 89 5.25 12 1
20 9.10 13 0 55 9.60 9 0 90 4.95 9 1
21 11.25 17 1 56 6.88 8 1 91 19.38 16 1
22 13.00 12 0 57 13.95 18 1 92 19.98 12 0
23 8.00 14 1 58 10.00 14 0 93 4.00 12 0
24 4.28 12 1 59 3.75 12 1 94 5.00 12 1
25 7.88 16 0 60 8.00 12 1 95 10.58 14 0
26 6.94 12 1 61 11.67 12 1 96 10.00 12 0
27 24.98 17 1 62 8.89 8 1 97 7.38 14 1
28 10.00 14 0 63 4.00 12 0 98 12.50 15 0
29 13.75 14 0 64 5.50 12 1 99 4.00 16 1
30 5.62 13 1 65 3.35 16 0 100 15.03 12 1
31 14.67 14 0 66 10.81 12 1
32 24.98 16 0 67 7.00 12 0
33 12.00 12 0 68 10.00 13 0
34 15.00 18 0 69 5.40 16 1
35 5.77 16 0 70 7.50 12 1
1. By only using data of the first 20 workers in Table 5.1, please estimate and in
Equation (1) again.
A. ൌ and ൌ
B. ꋌൌ and ൌ ͵
C. ൌ and ൌ
D. ൌ and ൌ ͵

2. What is the cause of dummy variable trap?


A. Multicollinearity
B. Heteroscedasticity
C. Auto correlation
D. All of the above

3. By using sample from another factory, the estimated in Equation (4) equals to
-1.5243 and its p-value is 0.000. What does it mean?
A. The hourly wage of male workers is 1.5243 dollars less than that of female workers
on average.
B. The hourly wage of female workers is 1.5243 dollars less than that of male workers
on average.
C. The hourly wage of male workers with higher education is 1.5243 dollars less than
that of female workers on average.
D. The hourly wage of female workers with higher education is 1.5243 dollars less than
that of female workers on average.

4. According to Equation (9) in the material, which of the following statements is correct?
(1) For female workers, ܽ ൌ ꋌ ൌ ͵ ᗈᖥ绐
(2) For male workers, ܽ ൌ͵ ൌ ᗈᖥ绐
(3) For female workers, ܽ ൌ͵ ൌ ᗈᖥ绐
(4) For male workers, ܽ ൌ ꋌ ൌ ͵ ᗈᖥ绐
A. Only (1)
B. Only (3)
C. Both (1) and (2)
D. Both (3) and (4)

5. If the race of workers in the material could be categorized into White, African, Asian,
Hispanic and Others, how many dummy variables should be included into the base model
in order to investigate the influence of race on workers’ wage?
A. 1
B. 5
C. 4
D. 6

6. If the nationality of workers in the material could generally be classified as American,


Canadian and Others, how dummy variables should be defined?
绐ܽ݅ ‫ݐ‬ ‫ݎ‬
(1) and
‫ݐ‬ ‫ݎ‬ ‫ݐ‬ ‫ݎ‬
绐ܽ݅ ܽ݅ܽᗈ ܽ݅
(2) and
‫ݐ‬ ‫ݎ‬ ‫ݐ‬ ‫ݎ‬
A. Only (1)
B. Only (2)
C. Both (1) and (2)
D. Neither (1) nor (2)

7. For Equation ܽ ᗈᖥ绐 ܽ ܽ ᗈᖥ绐 ‫ݐ‬


ܽ ‫ݐ‬
‫ݐ‬ ᗈᖥ绐 , ܽ and ‫ݐ‬ . Which of
‫ݐ‬ ‫ݎ‬ ‫ݐ‬ ‫ݎ‬
the following is the average hourly wage for a non-white female?
A. ᗈᖥ绐
B. ᗈᖥ绐
C. ᗈᖥ绐
D. ᗈᖥ绐

Questions 48-50 are based on the following information:


Graph 1 depicts the relationship between education and wage for both white and non-white
workers at a factory.
Graph 1 - Impact of Education on Wage (White vs. Nonwhite)

8. Which of the following is correct about coefficients in the following equations?


ᗈᖥ绐 ‫ݐ‬
ܽ
ᗈᖥ绐 ݅ ݅ ‫ݐ‬
A. ,
B. ,
C. ,
D. ,

9. How do you think dummy variable ‫ ݐ‬should be added into the base model?
A. Only as main factor
B. Only as interaction term with education
C. Both main factor and interaction term with education
D. Not included into the base model

10. Do you think if non-white workers are faced with race discrimination at the factory?
(1) Yes, because the direct impact of race on wage is significant.
(2) Yes, because the impact of education on wage is larger for white workers.
(3) No, because both lines are increasing monotonously.
A. Only (1)
B. Only (2)
C. Only (3)
D. Both (1) and (2)

You might also like