Econometrics notes Final
Econometrics notes Final
determine how well a regression model fits a set of data. R-squared measures the variation in a
model, while adjusted R-squared adjusts that value for the number of predictors in the model.
Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of
predictors in the model.
While r-squared measures the proportion of variance in the dependent variable explained by the
independent variables, it always increases when more predictors are added. Adjusted r-squared
adjusts for the number of predictors and decreases if the additional variables do not contribute to
the model's significance.
Adjusted values reduce the number of predictors, thus their values may be negative, indi- cating
that fitted variables explain less variation than expected in the case of random predictors.
an R-squared above 0.7 would generally be seen as showing a high level of correlation, whereas
a measure below 0.4 would show a low correlation.
R-squared
Measures the proportion of variance in the dependent variable explained by the independent
variables
Increases or remains the same when new predictors are added to the model
Values range from 0 to 1
Adjusted R-squared
Adjusts the R-squared value to account for the number of predictors and the sample size
Penalizes the inclusion of irrelevant predictors
Can decrease if a new predictor does not improve the model
Helps determine the goodness of fit
When to use
Investors use R-squared and adjusted R-squared to measure the correlation between a portfolio
or mutual fund and a stock index
Pizza owners can use adjusted R-squared to see if additional input variables contribute to their
model.
When the null and alternative hypotheses are stated, it is observed that the null hypothesis is a
neutral statement against which the alternative hypothesis is tested. The alternative hypothesis is
a claim that instead has a certain direction. If the null hypothesis claims that p = 0.5, the
alternative hypothesis would be an opposing statement to this and can be put either p > 0.5, p <
0.5, or p ≠ 0.5. In all these alternative hypotheses statements, the inequality symbols indicate the
direction of the hypothesis. Based on the direction mentioned in the hypothesis, the type of
hypothesis test can be decided for the given population parameter.
When the alternative hypothesis claims p > 0.5 (notice the 'greater than symbol), the critical
region would fall at the right side of the probability distribution curve. In this case, the right-
tailed hypothesis test is used.
When the alternative hypothesis claims p < 0.5 (notice the 'less than' symbol), the critical region
would fall at the left side of the probability distribution curve. In this case, the left-tailed
hypothesis test is used.
In the case of the alternative hypothesis p ≠ 0.5, a definite direction cannot be decided, and
therefore the critical region falls at both the tails of the probability distribution curve. In this
case, the two-tailed test should be used.
Distributions
In econometrics, distributions describe how data points are spread across a range of values.
They are used to identify patterns, trends, and anomalies. This information is important for
making predictions and inferences, and for econometric analyses like hypothesis testing,
policy evaluation, and predictive modeling.
Types of distributions
Normal distribution :(Z Distribution)
Also known as the Gaussian distribution, this distribution is symmetric around the
mean, and appears as a bell curve.
Poisson distribution
Binomial distribution
This distribution is represented by 𝐵(𝑛,𝑝), where 𝑛 is the number of trials and 𝑝 is the
probability of success in a single trial.
Chi-squared distribution
Exponential distribution
This continuous distribution is used to measure the expected time for an event to occur.
Student t-distribution
T-Distribution
This distribution is used when the sample size is small or when not much is known
about the population.
The t-distribution has fatter tails than a normal distribution, which accounts for the
greater uncertainty in smaller samples.
The t-distribution's shape depends on the degrees of freedom (df), which is related to
the sample size.
As the d𝑓 increases, the t-distribution curve becomes taller and thinner, and more
When the sample size is around 30 or more, the t-test and 𝑍-test results are very
similar to the standard normal distribution (𝑍-distribution).
similar.
F-Distribution
The F-distribution is a statistical distribution used to test hypotheses in econometrics. It
can be used to compare variances, evaluate portfolio risks, and compare stock returns.
There are two sets of degrees of freedom; one for the numerator and one for the
denominator. For example, if F follows an F distribution and the number of degrees of
freedom for the numerator is four, and the number of degrees of freedom for the
denominator is ten, then F ~ F 4,10.
Example
Homoscedasticity: The variance of the error terms should be constant across all
levels of the independent variables.
Exogeneity: The regressor variables should not be correlated with the error term.
Explanation
Non-normality of errors
If the error terms are not normally distributed, the standard errors of the OLS estimates
will not be reliable.
Heteroscedasticity
If the variance of the error terms is not constant, this is called heteroscedasticity.
Endogenous regressors
If the regressor variables are correlated with the error term, they are called endogenous.
This can cause the OLS estimator to be biased
Correlation
Correlation is a statistical measure that expresses the extent to which two variables are
linearly related
Types of correlation
Positive correlation: When both variables increase or decrease in the same
direction
Negative correlation: When one variable increases as the other decreases, or vice
versa
Correlation coefficient
The correlation coefficient is a statistical measure of how much one variable changes in
relation to another
Correlation refers to the statistical relationship between the two entities. It measures the
extent to which two variables are linearly related. For example, the height and weight of
a person are related, and taller people tend to be heavier than shorter people.
Auto correlation
Autocorrelation, also known as serial correlation, is a statistical method that measures
how similar a variable is to itself over time. It's a key tool in econometrics for
analyzing time series data.
Autocorrelation refers to the degree of correlation of the same variables between two
successive time intervals.
For example, the temperatures on different days in a month are autocorrelated. Similar
to correlation, autocorrelation can be either positive or negative.
How it works
Autocorrelation measures the relationship between a variable's current value and its
past values.
It's a mathematical representation of the similarity between a time series and a delayed
version of itself.
It can help identify when data is not random, which may indicate a need for time series
analysis or regression analysis.
Example
Stock prices
Stock prices tend to move up and down together over time, which is an example of serial
correlation. This means that if stock prices are high today, they are likely to be high
tomorrow.
General to specific Model
How it works
Start with a general model that includes all the variables that are thought to be
important
Stationarity
A common assumption in many time series techniques is that the data are stationary. A
stationary process has the property that the mean, variance and autocorrelation structure do not
change over time.
Explanation
Statistical properties: These include the mean, variance, and covariance of the
data.
Stationary time series: A time series where these statistical properties remain
constant over time.
How it works
The model's current value of the dependent variable is dependent on its own past
values and the current and past values of other explanatory variables
Benefits
The ARDL method can produce consistent estimates of long-term coefficients
The ARDL cointegration technique can be used to obtain realistic estimates of a model
Dummy Variables
In econometrics, a dummy variable is a numeric variable that takes on a value of either
0 or 1 to represent a qualitative variable. Dummy variables are used in regression
analysis to include categorical variables in models.
dummy variable (binary variable) D is a variable that takes on the value 0 or 1. Note that the
labelling is not unique, a dummy variable could be labelled in two ways, i.e. for variable
gender: – D = 1 if male, D = 0 if female; – D = 1 if female, D = 0 if male.