Quantitive Research - Assignments
Quantitive Research - Assignments
Assignment 2 ....................................................................................................................1
2.1 ........................................................................................................................................... 2
2.2 ........................................................................................................................................... 2
Assumptions .....................................................................................................................8
Assignment 3 ....................................................................................................................9
Question 3.1 ............................................................................................................................ 9
Question 3.2 .......................................................................................................................... 11
Question 3.3 .......................................................................................................................... 11
Question 3.4 .......................................................................................................................... 16
Question 3.5 .......................................................................................................................... 22
Evaluation ............................................................................................................................. 22
Ex. 4 ...............................................................................................................................23
4.1 ......................................................................................................................................... 23
Assignment for week 12..................................................................................................27
7.1 ......................................................................................................................................... 27
7.2 ......................................................................................................................................... 36
Week 13 .........................................................................................................................37
8.1 ......................................................................................................................................... 37
Week 14 .........................................................................................................................46
Assignment 9 ......................................................................................................................... 46
a) ...................................................................................................................................................................47
Week 17 .........................................................................................................................55
11.1 ....................................................................................................................................... 55
1 ....................................................................................................................................................................55
11.2 ....................................................................................................................................... 56
1 ....................................................................................................................................................................56
2 ....................................................................................................................................................................56
3 ....................................................................................................................................................................56
Assignment 2
2.1
Hypothesis testing
1.
In the first one, we are testing for multiply differences in means, which indicates that our
H_0 will state that all the different means are equal to each other.
H_1 is going to state, that at least two or more means are different from the others.
ANOVA:
One-way ANOVA: The one-way ANOVA concerns itself with the independent variable of at
least three groups, from the same categories
Two-way ANOVA: The two-way ANOVA Also concerns itself with the same as the one-way
ANOVA, however, the two-way ANOVA also tries to describe more through the blocks.
Two-factor ANOVA: The two-factor ANOVA takes two factors, which could be age, and
income, and tries to draw statistical conclusions upon this. However, the two-factor ANOVA
also looks at the relationship between the two factors.
2.2
Template:
Make the estimated model:
Insert dependent, mean, independent and error factor, according to what ANOVA we are
asked to create.
In this case is would be: Wage, Mean, Gender, Education, Interaction and error-factor.
Then make the hypothesis test, also according to the ANOVA test.
From our normality test, we can see that we have two categories, that stand out, because
they definitely do not fit in with a normal distribution, furthermore both of these categories,
have rather low sample sizes. The first being women with schooling years of 9, which has a
sample size of 20, and secondly males with 17 schooling years, which has a sample size of
60.
The other categories don’t exactly fit in with a normal distribution, however they all
represent rather large sample sizes; therefore, we can use the UCLT and conclude that they
will, represent normal distributions.
Now I want to assess the confidence level and critical value, to interpret upon our
hypothesis.
Firstly, I will check for equality of our variances. I will do this by dividing the largest variance
of the 8 groups, with the lowest variance of the 8 groups.
Equality of variance, we use the largest variance divided by the smallest variance.
From Row 1 and 8 we get the smallest, and largest variance. These are now divided to
calculate the difference in variance.
2253.9466102
= 6.97
323.35526316
Now I want to calculate our critical value. To do this we need the specific test statistic for
our two-factor ANOVA.
We can hereby conclude, with 95% certainty that our there is not enough evidence to
support the null hypothesis, stating that wage between the different categories of schooling
and gender, is equal each other.
We thereby say, with 95% certainty, that there is a difference between your gender, and
educational level, that is affecting your wage.
We here from see, that men with 15 and 17 years of education, are significantly different
from the other categories, as they alone are represented respectively by the letters B and A.
Assumptions
H0 is assumed to be true
Data issues (SRS, independence, trustworthiness)
Assignment 3
Question 3.1
Vi kan altså se at ved mænd = 0, har vi et count på 262. Hvilket betyder at der er 262 kvinder,
som har deltaget I undersøgelsen.
For, married / living with a partner, we see that the percentage is 24% of the entire study.
Hvad er det gennemsnitlige ugentlige alkoholforbrug?
We can thereby see, that the mean number of alcohol, consumed by the sample is 6,02
The most frequently answer to the question of the importance of “Helping the poor” is, the
fourth answer option, which correspond to moderately agreeing. Number four option has a
total of 36,55% of the votes.
Question 3.2
The population for this questionnaire could be multiple. In the assignment it is stated, that
the data is a draw from a questionnaire, of second semester HA, SOC and Bscb students
from Aarhus and Herning. The population could therefore be, all students attending HA,
SOC and Bscb, at Aarhus Bss and Herning Bss.
Let’s say that is a total of 15.000 students. Our questionnaire has a total of 591 participants.
Which accounts for about 4% of the “Population” Therefore we could say that N<5%. And
not assume a process, even though a process may be what they are looking to actually use
the data for.
I would not say that the data is random, since it sounds as though every single person on
the second semester has been asked to do the survey.
Assuming that people has done the survey by themselves without the influence of others,
the data should be reliable.
I do however see a problem in the fact that it is stated that three people will win, which
sounds as though it will be based on their answer’s correctness. If that is true, that is a
problem, because people may perceive that they need to answer in a certain way, which
really does not correspond with what they actually feel like themselves.
Question 3.3
Model:
Alcohol consumption = 𝜇 + 𝑔𝑒𝑛𝑑𝑒𝑟 + 𝑐𝑖𝑣𝑖𝑙 + 𝐺𝑒𝑛𝑑𝑒𝑟 ∗ 𝑐𝑖𝑣𝑖𝑙 + 𝜀
Assumptions:
SRS, trustworthiness and independence has been discussed.
When performing the Two-factor ANOVA, I will start by looking for normality.
Here we see the two of the distributions, that look furthest away from a normal distribution.
They are respectively, Gender: women, status: Single and alcohol consumption. And
Gender: Women, status: Married and alcohol consumption.
We do however see, rather large sample sizes, of respectively 125 and 74, we can therefore
apply the UCLT to fulfil the normality assumption.
The other categories don’t exactly fit in with a normal distribution, however they all
represent rather large sample sizes; therefore, we can use the UCLT and conclude that they
will, represent normal distributions.
Now I want to assess the confidence level and critical value, to interpret upon our
hypothesis.
Firstly, I will check for equality of our variances. I will do this by dividing the largest variance
of the 6 groups, with the lowest variance of the 6 groups.
Equality of variance, we use the largest variance divided by the smallest variance.
From these data, of the variance, we see that the value of 10,822, in row 3 is the smallest
value, while the variance of 170,767 in row 5 is the largest.
170,767
𝐸𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = = 15,78
10,822
Now I want to calculate our critical value. To do this we need the specific test statistic for
our two-factor ANOVA.
We can hereby see, that our critical value is 1.9503, which 15,78 is very far away from.
Our p-value, is also extremely low.
This also means that we with 95% confidence can say, that there is not enough evidence to
support our H_0 hypothesis. Meaning that we reject H_0, thereby saying that there is a
difference, between the different factors effect on each other. Still with 95% certainty.
This can be further visualised using the students t-distribution table, which shows that not a
single one of the groups, have any significance due to the fact, that no group has a letter for
itself.
While the diagram is a bit hard to read, it does show to a large degree, that the variables
follow each other.
I will therefore be removing the interaction between the two variables, to see what results
that might bring.
Here we can see, that both the gender and civil status, are significant to the test, with
gender being the most significant.
Question 3.4
The categories we need to make this experiment is Gender (Male), Civil(Status) and the
question “Helping poor people” where the ladder, should be changed to a continuous
variable.
To answer this, I will be doing a Two-Factor ANOVA, because I want to look at weather
gender has an effect on sociability, while also seeing, if social status plays a role.
These two distributions are the two, that looks the least like normal distributions. They are
the distributions of: Women in a relationship but living alone. And Men who are single.
Both the distributions are left skewed, with quite similar means of 3.54 and 3.523
respectively, showing that the mean for both these distributions are slightly closer to a four,
than a three in the questionnaire.
Even though both these distributions don’t fit in a normal distribution, we see that they
respective sample sizes of 63 and 195 are sufficiently large enough to apply the UCLT, which
means that due to the large sizes, these will approach normal distributions. This is also true
for the rest of the categories, where we for every one of them, apply the UCLT to make
them approach normal distributions.
Now I want to assess the confidence level and critical value, to interpret upon our
hypothesis.
Firstly, I will check for equality of our variances. I will do this by dividing the largest variance
of the 6 groups, with the lowest variance of the 6 groups.
Equality of variance, we use the largest variance divided by the smallest variance.
We see that all of the variances are quite low, and rather close to each other. The smallest
however, can be found I row, 3. This one has a variance of 0.8286. The largest is found in the
fourth row and has a value of 1.127.
1.127
Equality of variances = 0.8286 = 1.36
Now I want to calculate our critical value. To do this we need the specific test statistic for
our two-factor ANOVA.
We can hereby see that our critical value is at 1.743, with a p-value of 0.0641 which is
relatively close to the threshold of <5%. However, it is not quite there.
This also means, that we with 95% confidence can say that we fail to reject H_0 because it
would seem as though there is enough evidence to support the hypothesis that the
variables are sufficiently equal each other, to not see any significant difference.
We can see from this, that neither the Interaction between Gender and civil status, seem to
have any significant influence on the test.
From this Least square means students t-distribution. We can further visualize that there is
no significance of civil status affecting the sociability of the people in the questionnaire.
I will therefore also be removing the Civil variable, so the test ends up being a one-way
ANOVA, with one factor and a variable (Gender)
We now see that the F-Ratio is 3.178 and the probability has dropped, so the Gender of a
person, is now of no significance wrt. The level of Sociability.
This is further illustrated by this Students-t distribution table, telling us that none of the
groups in the gender category, are having a significant effect on the sociability of a person.
We thereby can see, that in the Two-way ANOVA, the gender of a person had a significant
effect on the level of sociability. That was where the civil status, still played a role. Civil
status however, showed that it had no significant influence on a person’s sociability.
Now the answer was quite different. Because when civil status had been removed as a
variable. The gender of a person suddenly, was no longer significant to that persons degree
of sociability, with regards to helping poor people.
Question 3.5
Evaluation
When doing a ANOVA, and removing from that ANOVA, a new set of hypothesis,
assumptions and model has to be setup.
Alpha being divided by 2, and k(k-1) being divided by two cancels out. Therefore, the
𝛼
Bonferroni adjustment for F-tests, end up looking like this:
𝐾(𝐾−1)
Ex. 4
4.1
Step 1: Hypothesis
𝛼 = 0.05
𝑟 𝑐 2
(𝑓𝑖𝑗 − 𝑒𝑖𝑗 ) 2
∑∑ ≈ 𝑥(𝑟−1)∗(𝑐−1)
𝑒𝑖𝑗
𝑖=1 𝑗=1
• SRS
• Independence
• Trustworthiness
2
𝑋𝑜𝑏𝑠
(156 − 177.8)2
≈ 2.672891
177.8
2 2
𝜒(𝑟−1)∗(𝑐−1)∗𝛼 = 𝜒(3−1)∗(
Step 6: P-value
Step 7: Conclusion
𝐴𝐶 A
𝐶
𝐵 𝑃(𝐴𝐶 ∩ 𝐵 𝐶 ) = 0.7 − 0.4 = 0.3 𝑷(𝑨 ∩ 𝑩𝑪 ) = 𝟎. 𝟒 𝑃(𝐵 𝐶 ) = 1 − 0.3 = 0.7
B 𝑃(𝐴𝐶 ∩ 𝐵) = 0.3 − 0.15 = 0.15 𝑷(𝑨 ∩ 𝑩) = 𝟎. 𝟏𝟓 𝑷(𝑩) = 𝟎. 𝟑
0.45 0.55 1
Step 1: Hypothesis
Step 3:
𝑅𝑜𝑤 𝑡𝑜𝑡𝑎𝑙 ∗ 𝐶𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙 129 ∗ 262
𝑒𝑖𝑗 = = = 57.1878
𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒 591
• SRS
• Independence
• Trustworthiness
• H_0 is true
Test value:
Assignment for week 12
7.1
1. Discuss linearity.
The line does not look particularly linear. For one, we have
2. Model
Level-level model
Assumptions:
SRS: 29 successive days, does not seem random. We have no way of reacting or knowing of
any particular seasonality, sales ect.
Trustworthiness: What is the data collected for? Maybe a sales manager makes the
numbers look better. If the data has been subtracted from a database, it is hard to
manipulate the numbers of price and quantity. Furthermore, both price and quantity is easy
to measure.
Variation of X. Variation is not equal at many points. In the low quantity ranges, there are
skewed observations, in the high price-ranges, above our best-fit line. In the middle
quantities between 10-40 there is a skewness below our best-fit line.
Level-level:
• Zero conditional mean (𝑬(𝝐 𝒍 𝒙) = 𝟎): Given any x, we do not seem to have zero
conditional mean of errors. Our errors float above 0 in the start – below 0 from 3-8
and above from 8 -. At no point do we really have zero conditional mean.
• Homoscedasticity:
Estimation of Level-level model
1. Hypothesis
𝐻0 : 𝛽1 = 0: 𝑇ℎ𝑒𝑟𝑒 𝑖𝑠 𝑛𝑜 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑎𝑛𝑑 𝑝𝑟𝑖𝑐𝑒
𝐻1 : 𝛽1 ≠ 0: 𝑇ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠ℎ𝑖𝑝 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑡𝑤𝑜.
Significance level:
𝛼 = 0.05
−0.178687−0
𝑡𝑜𝑏𝑠 = = −8.01
0.022307
Critical value:
±𝑡𝑛−𝑘−1,𝛼
2
𝑡𝑛−𝑘−1 = 𝑡29−1−1 = 27
𝛼 0.05
= = 0.025
2 2
𝑡27,0.025 = 2.052
Conclusion:
There is a relationship between price and quantity – Our p-value is <0.0001.
R^2 =0.7038 which means, that 70.38% of the variation in Price can be explained by the
variation on quantity.
191.19759
𝑅2 = = 0.7038
271.65252
Formular
2
1 (𝑥𝑔 − 𝑥̅ )
𝑌 𝑙 𝑋 = (𝑏0 + 𝑏1 ∗ 𝑥𝑔 ) ± 𝑡𝑛−2,𝛼 ∗ 𝑠𝜖 ∗ √1 + +
2 𝑛 𝑆𝑆𝑥
4.
Using jmp.
5.
Level-level
Log-level
Level-log
Log-log
Model:
Assumptions:
Nothing new regarding SRS, trustworthiness and
Variation in x
Zero conditional mean:
Homoscedasticity:
−0.5598−0
Test stat: 𝑇𝑜𝑏𝑠 = = −13.33
0.04198
Conclusion_ We firmly reject H_0
R^2 explains that 86.8% of the variation log(price) can be explained by the variation in
log(quantity).
Intercept
B1
The -0.56 b1 means that a 1% increase in quantity, leads to a 0.56% decrease in price. Which
means there is a negative relationship between the two.
8.1
Template:
True model:
𝐵 = 𝛽0 + 𝛽1 ∗ 𝐶 + 𝛽3 ∗ 𝐸 + 𝜖
1) Model formulation:
1) Model formulation:
• Trustworthiness
Since the data has been collected randomly, we can assume that the data has been
collected from a database and therefore the data is trustworthy.
However, there can be biases as to differences in how different nationalities might count
(Years of education) and
• Normal distribution of (𝜖)
We have no issues here with multicollinearity, our largest correlation is between age and
female, at -0.0756 which is far from being of high correlation.
3: Evaluation of the model.
Hypothesis:
We assume 𝐻0 to be true.
𝑀𝑆𝑅
Test statistic: 𝑀𝑆𝐸 ~𝐹𝐾:𝑛−𝑘−1
What is K? : Number of X’s. in this instance that is = 3.
𝑀𝑆𝑅 2839756
= ≈ 99.14
𝑀𝑆𝐸 28643
Conclusion:
We are very sure of ourconclusion – the risk of saying our H_0 is true when in fact it isn’t
since our p-value is basically = 0%. We firmly reject the H_0 since our F_obs is way above
our critical value. At least one of B_1, B_2 and B_3 differs from 0.
5) Test of parameters:
Hypothesis:
𝑏𝑗 −𝛽𝑗0 1.21−0
Test statistic: 𝑡𝑛−𝑘−1:𝛼 = ≈ 4.352517986
2 𝑆𝑏𝑗 0.278
Assumptions: We had some heteroscedasticity and some issues with normality which could
affect the validity of the test somewhat.
Our data on age, years of education and gender is explaining 14.8% of the variation in
income.
B)
Intercept:
B1: Holding Age and education constant, increasing years of education by 1, would indeed
increase your annual income by 18.310 DKK.
B3: The interpretation of the female variable, says that when you are a Female relative to
being a male, your income will decrease by 75.380 DKK.
Answer the following questions, regardless of problem with assumptions or not.
= 𝛽0 + 𝛽1 ∗ 𝐶 + 𝛽3 ∗ 𝐸 + 𝛽4 ∗ 𝐶 2 + 𝛽5 ∗ 𝐷2 + 𝜖
(see note 2. P 6)
Test:
𝐻0 : β4 = 𝛽5 = 0
𝐻1 : 𝐴𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝑜𝑓 𝑡ℎ𝑜𝑠𝑒 𝑎𝑟𝑒 𝑛𝑜𝑡 𝑒𝑞𝑢𝑎𝑙 𝑡𝑜 0
Test stat:
2
(𝑅𝑈𝑅 − 𝑅𝑅2 )/𝑞
~𝐹
2
(1 − 𝑅𝑈𝑅 )/(𝑛 − 𝑘 − 1) 𝑞:𝑛−𝑘−1;𝛼
K = 5 (Unrestricted)
N = 1711
2
(𝑅𝑈𝑅 − 𝑅𝑅2 )/𝑞 (0.2658 − 0.1484)/2
2 = ≈ 136.3163988
(1 − 𝑅𝑈𝑅 )/(𝑛 − 𝑘 − 1) 1 − 0.2658
1711 − 5 − 1
Restricted Unrestricted
𝐹𝑞:𝑛−𝑘−1;𝛼 = 𝑓2;1711−5−1;0.05 = 3.00
Conclusion:
Our observed value (136.31) is way above our critical value of 3.00, so we reject the null
– meaning
Only change from uncentred is that the variables that are not squared, but have squared
“brothers” are telling the slope of X. The squared variables are telling the relationship of
either convexity or concavity.
When we log our variables, we can in most instances smoothen out the data variance, in
order to better fit a normal distribution and homoscedaticity.
Week 14
Assignment 9
a)
Estimated model:
̂
log(𝐹𝐴𝑀𝐼𝑁𝐶) = 𝑏0 + 𝑏1 ∗ 𝑊𝐻𝑅𝑆 + 𝑏2 ∗ 𝐻𝐻𝑅𝑆 + 𝑏3 ∗ 𝑊𝐸𝐷𝑈 + 𝑏4 ∗ 𝐻𝑈𝑆𝐸𝐷𝑈𝐶 + 𝑏5
∗ 𝑊𝐴𝐺𝐸 + 𝑏6 ∗ 𝐻𝐴 + 𝑏7 ∗ 𝐶𝐼𝑇 + 𝜖
Assumptions:
Random sample and reliable data: We are given no info on whether or not the sample is
randomly collected. We assume the data is collected from some database randomly.
Trustworthiness should be good
We see that there is a large correlation between wife – and husband age, at 0.8881. The
correlation makes sense but might confuse our model with regard to each of the variables
effect on the log(FAMINC).
Education (wife and husband): all else equal – Husbands effect on family is 5.1% per
increase in education years. For wifes, this effect is 3.34% for every year-increase in the wife
educational attainment.
We can see that wife-education, confidence interval is: 𝑊𝐸𝐷𝑈 𝜖 [0.016 ; 0.051]
𝐻𝑢𝑠𝐸𝐷𝑈 𝜖 [0.037 ; 0.065]
Here we can see, that the confidence intervals forwife and husbands educational
attainment and their affect on log(FAMINC) – we see that the confidence intervals for the
parameter estimates are clearly overlapping, meaning that we no statistical evidence to
infer that these effects are different for wives and husbands.
CIT: By living in a large city, we can see that your Family income will increase by just short of
21%
Uncentered polynomials:
We wont reduce this model either to allow for testing of joint significance.
Test of model:
Unrestricted Restricted
What does it mean when we have a negative b_2 = concave. Goes up till turning point, then
down.
If positive = convex. Goes down till turning point, then down.
𝑏 −0.0128094
Wife’s age = -2∗𝑏5 = 2∗(−0.000341) = 56.68
8
𝑏6 −0.08847
Husband’s age = − = = 48.07
2 ∗ 𝑏9 2∗(−0.000881)
Week 17
11.1
General model:
𝑦 = 𝛽0 + 𝛽1 ∗ 𝐶 + 𝛽3 ∗ 𝐸 + 𝜖
Estimated model:
𝐿𝑜𝑦𝑎𝑙𝑡𝑦 =
11.2
3
Week 18 logistic regression and time series
1 Explanatory variables
Variables could be: Quality, Brand, Price and Male.
1: Model: Formulation:
𝑝
𝐿𝑜𝑔𝑖𝑐(𝑀𝐴𝐶) = ln( )
1−𝑝
= 𝛽0 + 𝛽1 ∗ 𝑀𝑎𝑙𝑒 + 𝛽2 ∗ 𝐵𝑟𝑎𝑛𝑑 + 𝛽3 ∗ 𝑆𝑡𝑢𝑑𝑦𝑡𝑖𝑚𝑒 + 𝛽4 ∗ 𝐶𝑎𝑟𝑒𝑒𝑟 + 𝛽5 ∗ 𝑇𝑎𝑡𝑜𝑜 + 𝛽6
∗ 𝑄𝑢𝑎𝑙𝑖𝑡𝑦 + 𝜖
2: Checking assumptions:
Trustworthyness: Could be dependent on each other since they could take the survet
together. Do they take the survey seriously? The more engaged students are more likely to
respond to the survey than the less engaged. The perception of the scales for the scaled
variables.
SRS: No – the data is based on a survey. People can choose tnot to participate. Not that
random and generalize with only 2^nd semester students.
Binary Y variable:
This is within the 80/20 range that is required.
The x-variables doesn’t seem particularly evenly distributed. However our larger sample size
of 591 give us somewhat reasonable number of observations across the outcomes.
Multicollinearity:
To assess the multicollinearity assumption, we use the multivariate, to assess the
collinearity of our model. We are looking for any correlations at or above 0.8. We see that
there are no variables correlating to other variables at this high of a level. There are
however two variables explaining large amount of each other. Male and alcohol, have a 0.32
correlation, while quality and brand have a 0.138 correlation. This means that alcohol
explain 32% of the male variable and brand explain 13.8% of the quality variable.
3: Estimation of model
Further reducing
As the assignment says, our model should only include 3 significant variables, I will be
further reducing once more, removing tattoo.
381+39
Hit ratio total = 381+20+39+151 = 0.71 = 71%
Our empty model allows us to predict correctly 67.85% of the time while our model based
on studytime, male and brand allows us the hit rate to increase to 71%, which is a
percentage point increase of only 71.06-67,85=3.21
71.06
= 1.04 = 4.7%
67.85
5 Interception:
The negative parameter estimate for male, signifies a negative effect on whether or not you
have a mac.
Odds ratio:
We are to focus on the male variant, since this is of the highest significance.
If we use the odds ratio for interpretation, we can calculate the effect on the odds value for
a marginal change in the variables.
The effect of a “1 unit” increase in Male / being male as opposed to being a woman: Being
male, decrease your change of owning a mac by 0.418 -1 = -0.582 = - 58.2%. It decreases the
odds of owning a mac by a factor 0.418
Using the reciprocal for the male variable:
Value of reciprocal 2.3883. Going from being a male, to female would increase the odds of
you owning a mac by a factor of 2.388. or a percentage change of: (2.38 - 1) * 100 = 138%
Decreasing male by one unit (being female as opposed to being a male) increase the chance
of you owning a mac by 138%.
Time series:
Very similar to cross-sectional regressions, we already have worked with. Where we sample
a given point in the time across individuals or units.
However, Time series data is NOT randomly samples, in same way as cross-sectional
1: Model formulation
Estimated model:
̂
𝐸𝑛𝑟𝑜𝑙𝑙𝑚𝑒𝑛𝑡 = 𝑏0 + 𝑏1 ∗ 𝑇𝑖𝑚𝑒 − 𝑣𝑎𝑟𝑖𝑎𝑏𝑒𝑙
Note: we can use Year, obtain same coefficient, however the intercept alters quite a bit and
it may be more intuitive to use “time variable” rather than “Year”
2: Assumptions
Trustworthyness:
Student enrolment is most likely found at some university database, in that regard the data
should be trustworthy. The cost of fraud might be higher than the gain from manipulating
the numbers.
Variation in X:
17 years of observations which should be alright for predicting the trend of enrolment.
Now we move on to examine the assumptions regarding the error terms in a time series
setting.
The zero conditional mean on time series data demands the model to fit the data perfectly –
in other words, no other variable than time variable, should affect enrolment in order for
the zero conditional mean to be fulfilled.
In this case, we simply have both positive and negative errors, therefore zero conditional
mean, is not fulfilled.
No serial(Auto) correlations,
New in time-series. Serial correlation comes when errors from one time period
is carried over into future time periods.
To test for independency between the error terms, we test for autocorrelation
with the Durbin Watson test:
1. Test statistic:
∑𝑛𝑖=2(𝑒𝑖 − 𝑒𝑖 − 1)2
𝑑=
∑𝑛𝑖=1 𝑒𝑖2
Here 𝐾 = 1 𝑎𝑛𝑑 𝑛 = 17
Critical limits:
We reject the null – and see positive autocorrelation in our test as our Durbin Watson
test statistic is 0.66 while our critical limit are 1.13 and 1.38 – thus falling out, into the
rejection region
[ Add new variable -> formular -> insert variable -> double click -> Square variable (^2) ]
Here k = 2 and n = 17
Which gives us 𝑑𝑖 = 1.02 𝑎𝑛𝑑 𝑑𝑢 = 1.54
Our test statistic falls into the inconclusive region – Meaning that at least now we don’t
have clear positive autocorrelations and the quadratic term helped.
Enrollment = y
Exercise 12.3
Level-level, log-level, level-log, log-log