Mathematics (Statistics) Internal Assessment
Investigating differences in salaries between men and women
Research Question:
"Is there a statistically significant difference between the salaries of men and
women in the sample, and what is the magnitude of that difference?"
(Uses synthetic but realistic dataset for demonstration)
Student: [Student Name]
Candidate Number: [Number]
Date: 2025-10-03
Word count: ~1500 (excluding appendices)
Contents
1. Introduction & Research Question
2. Data source & Sampling
3. Descriptive statistics & visualisation
4. Statistical tests (normality, variance, difference of means)
5. Effect size & Bootstrap CI
6. Regression analysis & diagnostics
7. Sectoral breakdown
8. Conclusion & Evaluation
Appendices: Dataset, Code summary
Introduction & Data
This investigation examines salary differences between men and women using a sample
dataset of employees across six sectors.
Dataset: Synthetic but constructed to mimic realistic salary determinants (experience,
education, sector-specific base pay).
Sample size: 500 employees (250 male, 250 female).
Variables included: Salary (USD), Gender, Age, Education years, Experience years, Sector.
The student should replace the dataset with real data if required by the school and IB
guidelines.
Descriptive Statistics - Salary (USD)
Overall:
count 500.00
mean 81376.33
std 19810.97
min 24948.93
25% 67808.75
50% 81992.72
75% 95640.57
max 132726.53
By gender:
count mean median std min max
Gender
Female 250 78550.22 80269.68 18417.41 24948.93 118480.66
Male 250 84202.45 83910.74 20767.43 29406.45 132726.53
Interpretation:
- Means and medians give an initial sense of difference.
- Standard deviations indicate spread; check boxplots and density plots for distributi
Boxplot: Salary by Gender
120000
100000
Salary (USD)
80000
60000
40000
20000 Female Male
Gender
Kernel density estimate: Salary distributions by Gender
1e 5
Male
Female
1.75
1.50
1.25
1.00
Density
0.75
0.50
0.25
0.00
0 50000 100000 150000
Salary (USD)
Q-Q plot: Male salaries (sample)
140000
120000
100000
Sample Quantiles
80000
60000
40000
2 1 0 1 2
Theoretical Quantiles
Q-Q plot: Female salaries (sample)
120000
100000
Sample Quantiles
80000
60000
40000
2 1 0 1 2
Theoretical Quantiles
Histogram: Salary distributions by Gender (all data)
Male
Female
20
15
Frequency
10
0
20000 40000 60000 80000 100000 120000
Salary (USD)
Scatter: Salary vs Experience (by Gender)
Male
Female
120000
100000
Salary (USD)
80000
60000
40000
20000 0 10 20 30 40
Experience (years)
Salary vs Experience with fitted regression line (all data)
120000
100000
Salary (USD)
80000
60000
40000
20000 0 10 20 30 40
Experience (years)
Statistical tests and results
Normality (Shapiro-Wilk) on samples (n=200):
- Male: W = 0.9899, p = 0.1728
- Female: W = 0.9859, p = 0.0442
Levene test for equal variances:
- Stat = 2.0867, p = 0.1492 (p < 0.05 suggests variances differ)
Welch's t-test (difference in means):
- t = 3.2196, p = 0.0014
Mann-Whitney U test (non-parametric):
- U = 36129.00, p = 0.0025
Cohen's d (effect size):
- d = 0.288 (small ~0.2, medium ~0.5, large ~0.8)
Bootstrap 95% CI for mean difference (Male - Female):
- 95% CI = [2266.28, 9082.10]
ltiple regression: Salary ~ Gender + Experience + Education + Sector dum
Coefficient Std Err t P>|t|
const 26725.344 2935.503 9.104 0.0000
Gender_dummy 4962.346 715.441 6.936 0.0000
Experience_years 1441.462 29.833 48.318 0.0000
Education_years 984.447 172.591 5.704 0.0000
Sector_Finance 9808.235 1212.354 8.090 0.0000
Sector_Healthcare 5726.580 1197.051 4.784 0.0000
Sector_Manufacturing 4091.287 1369.163 2.988 0.0029
Sector_Retail -1459.568 1343.454 -1.086 0.2778
Sector_Technology 10935.071 1173.084 9.322 0.0000
Model summary (abridged):
[' OLS Regression Results ', '=======================================
VIFs:
feature VIF
const 67.82
Gender_dummy 1.01
Experience_years 1.00
Education_years 1.02
Sector_Finance 1.74
Sector_Healthcare 1.76
Sector_Manufacturing 1.51
Sector_Retail 1.54
Sector_Technology 1.85
Regression diagnostic: Residuals vs Fitted
20000
10000
Residuals
10000
20000
30000 40000 50000 60000 70000 80000 90000 100000 110000 120000
Fitted values
Q-Q plot of regression residuals
20000
10000
Sample Quantiles
10000
20000
30000 2 1 0 1 2
Theoretical Quantiles
Mean salary by Sector and Gender
Gender
Female
80000 Male
Mean Salary (USD)
60000
40000
20000
0
Education
Finance
Manufacturing
Healthcare
Retail
Technology
Sector
Conclusions & Evaluation
Conclusions & Evaluation
- Sample size: 500. The sample shows an average difference (Male - Female) of $5652.23
- Statistical tests: Welch t-test p = 0.0014 (significant at 5% level). Mann-Whitney p
- Effect size (Cohen's d) = 0.288, indicating small effect.
- Bootstrap CI for mean difference: [2266.28, 9082.10] does it include 0? No
Limitations:
- Dataset is synthetic and omits some real-world controls (job title seniority, full-t
- Assumptions of tests: independence, representative sampling. Non-normality and unequ
- Regression controls for experience, education, and sector but cannot prove causation
Recommendations for student:
- If using a real dataset, report source, sampling method, and anonymity safeguards.
- Consider matched-pair analysis or ANCOVA to control more directly for covariates.
- Discuss ethical and social implications briefly in evaluation section of IA.
18 Male 42 16.4 21.2 Technology 81020.64
19 Male 38 14.3 16.8 Retail 80203.63
20 Male 22 13.8 4.2 Manufacturing 71773.62
21 Male 25 14.3 6.7 Healthcare 65186.06
22 Male 58 10.2 34.1 Technology 116007.39
23 Male 47 15.5 28.7 Finance 95512.27
24
25 Appendix: Dataset (first 100 rows)
Male
Male
39
38
15.6
14.5
15.0
19.3
Healthcare
Manufacturing
74515.99
84666.19
26 Male 33 17.7 11.6 Education 57778.59
27 Male 22 14.2 0.0 Education 45616.38
28 Male 50 17.8 25.0 Finance 106937.53
29 Male 55 15.3 32.1 Healthcare 90242.22
30 Male 31 12.2 6.3 Education 49533.78
31 Male 49 16.8 27.5 Retail 92387.81
32 Male 35 11.5 17.6 Technology 81464.04
33 Male 48 15.4 31.1 Technology 99395.61
34 Male 58 13.6 33.4 Healthcare 112818.49
35 Male 37 15.7 17.0 Manufacturing 70977.24
36 Male 59 17.7 40.5 Healthcare 100819.62
37 Male 50 14.7 25.0 Technology 89058.62
38 Male 53 15.2 34.7 Technology 115008.24
39 Male 53 13.7 35.0 Finance 109548.22
40 Male 31 13.3 10.5 Finance 82128.16
41 Male 27 14.7 4.4 Technology 56244.1
42 Male 43 14.5 22.5 Technology 86756.16
43 Male 50 15.6 31.0 Technology 105955.38
44 Male 39 13.2 16.1 Finance 72876.8
45 Male 59 13.3 40.4 Manufacturing 105752.43
46 Male 29 12.2 8.7 Finance 52932.97
47 Male 60 13.7 39.2 Technology 123542.32
48 Male 62 14.0 44.0 Manufacturing 104457.68
49 Male 46 15.9 24.0 Technology 87739.81
50 Male 37 14.6 18.2 Technology 75621.88
51 Male 30 12.4 7.6 Manufacturing 67398.9
52 Male 37 14.0 16.5 Healthcare 82327.99
53 Male 47 13.2 23.0 Education 77326.58
54 Male 61 16.7 36.7 Manufacturing 98480.61
55 Male 43 17.5 18.0 Healthcare 87075.27
56 Male 40 15.4 15.3 Retail 87802.54
57 Male 31 15.1 6.3 Retail 29406.45
58 Male 54 14.3 37.6 Healthcare 101149.13
59 Male 56 13.5 29.0 Technology 94504.71
60 Male 58 15.1 41.3 Education 103924.23
61 Male 54 15.5 34.1 Finance 103828.29
62 Male 56 15.5 34.1 Education 93915.22
63 Male 38 17.7 19.2 Retail 78242.19
64 Male 37 18.7 19.3 Manufacturing 76131.66
65 Male 44 17.6 20.5 Retail 76467.8
66 Male 44 17.9 29.3 Manufacturing 83267.47
67 Male 26 12.6 4.1 Technology 64695.09
68 Male 50 15.1 28.9 Technology 106680.09
69 Male 50 14.3 27.2 Finance 76251.15
70 Male 45 15.5 28.6 Manufacturing 92306.83
71 Male 60 15.5 42.2 Finance 132038.57
72 Male 33 15.1 11.5 Healthcare 66138.74
73 Male 25 18.1 2.3 Technology 60590.58
74 Male 52 15.4 31.6 Finance 103650.38
75 Male 26 14.5 9.6 Healthcare 75944.47
76 Male 49 14.9 25.5 Technology 93585.01
77 Male 22 17.0 4.2 Healthcare 56790.79
78 Male 58 15.3 36.7 Healthcare 119943.92
79 Male 34 15.1 7.9 Manufacturing 48811.4
80 Male 43 14.0 23.9 Finance 75487.99
81 Male 22 15.7 1.6 Education 49610.41