0% found this document useful (0 votes)
27 views10 pages

Econometrics notes Final

The document discusses statistical measures like R-squared and adjusted R-squared, which assess the fit of regression models, with adjusted R-squared accounting for the number of predictors. It also covers hypothesis testing in econometrics, detailing its steps, types of tests, and the significance of distributions in analyzing data. Additionally, it explains concepts such as correlation, autocorrelation, stationarity, and the use of dummy variables in regression analysis.

Uploaded by

batoolasma793
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views10 pages

Econometrics notes Final

The document discusses statistical measures like R-squared and adjusted R-squared, which assess the fit of regression models, with adjusted R-squared accounting for the number of predictors. It also covers hypothesis testing in econometrics, detailing its steps, types of tests, and the significance of distributions in analyzing data. Additionally, it explains concepts such as correlation, autocorrelation, stationarity, and the use of dummy variables in regression analysis.

Uploaded by

batoolasma793
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

R-squared and adjusted R-squared are statistical measures that help

determine how well a regression model fits a set of data. R-squared measures the variation in a
model, while adjusted R-squared adjusts that value for the number of predictors in the model.
Adjusted R-squared is a modified version of R-squared that has been adjusted for the number of
predictors in the model.

While r-squared measures the proportion of variance in the dependent variable explained by the
independent variables, it always increases when more predictors are added. Adjusted r-squared
adjusts for the number of predictors and decreases if the additional variables do not contribute to
the model's significance.

Adjusted values reduce the number of predictors, thus their values may be negative, indi- cating
that fitted variables explain less variation than expected in the case of random predictors.
an R-squared above 0.7 would generally be seen as showing a high level of correlation, whereas
a measure below 0.4 would show a low correlation.

R-squared
 Measures the proportion of variance in the dependent variable explained by the independent
variables
 Increases or remains the same when new predictors are added to the model
 Values range from 0 to 1
Adjusted R-squared
 Adjusts the R-squared value to account for the number of predictors and the sample size
 Penalizes the inclusion of irrelevant predictors
 Can decrease if a new predictor does not improve the model
 Helps determine the goodness of fit
When to use
 Investors use R-squared and adjusted R-squared to measure the correlation between a portfolio
or mutual fund and a stock index
 Pizza owners can use adjusted R-squared to see if additional input variables contribute to their
model.

Unadjusted R-squared The unadjusted R-squared is a measure of model fit between 0


and 1 which does not account for the number of variables included in a model. It is available
only for linear models.
Hypothesis testing in econometrics is a statistical method used to analyze economic
data and make decisions about population parameters. It involves comparing a null hypothesis to
an alternative hypothesis to determine which is more likely to be true.
Steps in hypothesis testing:
 State the hypothesis: Formulate a hypothesis based on research and evidence
 Specify the significance level: Choose a critical threshold for rejecting the null
hypothesis
 Collect data: Gather data to calculate the test statistic
 Calculate the test statistic: Use the data to calculate the test statistic
 Determine the p-value: Compare the p-value to the significance level
 Make a decision: Accept or reject the null hypothesis
 Interpret the results: Draw a conclusion based on the statistical evidence

What hypothesis testing is used for?


 Analyzing economic theories and relationships
 Estimating the relationship between two statistical variables
 Testing that individual coefficients take a specific value

TYPES OF HYPOTHESIS TESTS


There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed:

 When the null and alternative hypotheses are stated, it is observed that the null hypothesis is a
neutral statement against which the alternative hypothesis is tested. The alternative hypothesis is
a claim that instead has a certain direction. If the null hypothesis claims that p = 0.5, the
alternative hypothesis would be an opposing statement to this and can be put either p > 0.5, p <
0.5, or p ≠ 0.5. In all these alternative hypotheses statements, the inequality symbols indicate the
direction of the hypothesis. Based on the direction mentioned in the hypothesis, the type of
hypothesis test can be decided for the given population parameter.

 When the alternative hypothesis claims p > 0.5 (notice the 'greater than symbol), the critical
region would fall at the right side of the probability distribution curve. In this case, the right-
tailed hypothesis test is used.

 When the alternative hypothesis claims p < 0.5 (notice the 'less than' symbol), the critical region
would fall at the left side of the probability distribution curve. In this case, the left-tailed
hypothesis test is used.
In the case of the alternative hypothesis p ≠ 0.5, a definite direction cannot be decided, and
therefore the critical region falls at both the tails of the probability distribution curve. In this
case, the two-tailed test should be used.
Distributions
In econometrics, distributions describe how data points are spread across a range of values.
They are used to identify patterns, trends, and anomalies. This information is important for
making predictions and inferences, and for econometric analyses like hypothesis testing,
policy evaluation, and predictive modeling.

Types of distributions
 Normal distribution :(Z Distribution)

Also known as the Gaussian distribution, this distribution is symmetric around the
mean, and appears as a bell curve.

 Poisson distribution

This discrete probability distribution describes the probability of an event happening a


certain number of times within a given time or space.

 Binomial distribution

This distribution is represented by 𝐵(𝑛,𝑝), where 𝑛 is the number of trials and 𝑝 is the
probability of success in a single trial.

 Chi-squared distribution

This distribution is often used when testing hypotheses in regression models.

 Exponential distribution

This continuous distribution is used to measure the expected time for an event to occur.

 Student t-distribution

 T-Distribution
This distribution is used when the sample size is small or when not much is known
about the population.

The t-distribution is a probability distribution that is used in econometrics to calculate


probabilities and model financial returns. It is used in t-tests to determine if there is a
significant difference between sample and population means.
Explanation

 The t-distribution has fatter tails than a normal distribution, which accounts for the
greater uncertainty in smaller samples.
 The t-distribution's shape depends on the degrees of freedom (df), which is related to
the sample size.
 As the d𝑓 increases, the t-distribution curve becomes taller and thinner, and more

When the sample size is around 30 or more, the t-test and 𝑍-test results are very
similar to the standard normal distribution (𝑍-distribution).

similar.

F-Distribution
The F-distribution is a statistical distribution used to test hypotheses in econometrics. It
can be used to compare variances, evaluate portfolio risks, and compare stock returns.

What is the F-distribution?

 The F-distribution is a ratio of two estimates of variance.


 It has two degrees of freedom, one for the numerator and one for the denominator.
 The F-distribution is asymmetric, with a minimum value of 0 and no maximum value.
 The F-distribution is less spread out as the degrees of freedom increase.

How is the F-distribution used in econometrics?

 The F-distribution is used to establish a framework for hypothesis testing in financial


analysis.
 It can be used to compare stock returns and evaluate portfolio risks.

There are two sets of degrees of freedom; one for the numerator and one for the
denominator. For example, if F follows an F distribution and the number of degrees of
freedom for the numerator is four, and the number of degrees of freedom for the
denominator is ten, then F ~ F 4,10.

Example

 A researcher wants to determine if different amounts of exercise impact weight


loss. The researcher sets up two groups, one that exercises for 30 minutes a day and the
other that exercises for 60 minutes a day. The researcher can use the F-distribution to
analyze the data for both groups.

 Ordinary Least Squares regression (OLS)


is a common technique for estimating coefficients of linear regression equations which
describe the relationship between one or more independent quantitative variables and a
dependent variable (simple or multiple linear regression), often evaluated using r-
squared.
 OLS minimizes the sum of squared errors, which is the difference between observed
and predicted values.
 It creates a single regression equation to represent the relationship between the
variables

 Simple linear regression:


 Y=a+bX+u

The assumptions of ordinary least squares (OLS) regression


are:
 Linearity: The relationship between the dependent and independent variables must
be linear.

 Independence of errors: The residuals, or error terms, should be uncorrelated


with each other.

 Homoscedasticity: The variance of the error terms should be constant across all
levels of the independent variables.

 Normality of errors: The error terms should be normally distributed.

 No multicollinearity: The independent variables should not be highly correlated


with each other.

 Exogeneity: The regressor variables should not be correlated with the error term.

Explanation
 Non-normality of errors
If the error terms are not normally distributed, the standard errors of the OLS estimates
will not be reliable.
 Heteroscedasticity
If the variance of the error terms is not constant, this is called heteroscedasticity.
 Endogenous regressors
If the regressor variables are correlated with the error term, they are called endogenous.
This can cause the OLS estimator to be biased
Correlation
 Correlation is a statistical measure that expresses the extent to which two variables are
linearly related

Types of correlation
 Positive correlation: When both variables increase or decrease in the same
direction

 Negative correlation: When one variable increases as the other decreases, or vice
versa

 No correlation: When there is no linear relationship between the variables

Correlation coefficient
 The correlation coefficient is a statistical measure of how much one variable changes in
relation to another

 The correlation coefficient is represented by the letter r

 the value of the correlation coefficient ranges from -1 to +1

 A value of +1 indicates a perfect positive correlation, while -1 indicates a perfect


negative correlation

 A value of 0 indicates no linear correlation

 Correlation refers to the statistical relationship between the two entities. It measures the
extent to which two variables are linearly related. For example, the height and weight of
a person are related, and taller people tend to be heavier than shorter people.

Auto correlation
 Autocorrelation, also known as serial correlation, is a statistical method that measures
how similar a variable is to itself over time. It's a key tool in econometrics for
analyzing time series data.
 Autocorrelation refers to the degree of correlation of the same variables between two
successive time intervals.
 For example, the temperatures on different days in a month are autocorrelated. Similar
to correlation, autocorrelation can be either positive or negative.

How it works
 Autocorrelation measures the relationship between a variable's current value and its
past values.
 It's a mathematical representation of the similarity between a time series and a delayed
version of itself.

 Autocorrelation can be positive or negative. A perfect positive correlation is


represented by an autocorrelation of +1, and a perfect negative correlation is
represented by an autocorrelation of -1.

Why it's important


 Autocorrelation can help identify repeating patterns in data.

 It can help identify fundamental features of a time series, such as stationarity,


seasonality, and trends.

 It can help identify when data is not random, which may indicate a need for time series
analysis or regression analysis.

Where it's used


 Autocorrelation is used in econometrics, signal processing, and demand prediction.
 It's also used in technical analysis in the capital markets.
 Serial correlation is a common feature of time-series data in econometrics. It occurs
when the errors in a regression model are correlated with each other, or when the
residuals are not independent.

Example
Stock prices
Stock prices tend to move up and down together over time, which is an example of serial
correlation. This means that if stock prices are high today, they are likely to be high
tomorrow.
General to specific Model

In econometrics, general-to-specific (Gets) modeling is a methodology that starts with a


general model and then reduces it to a more specific model.

How it works
 Start with a general model that includes all the variables that are thought to be
important

 Reduce the model by successively removing variables until it's parsimonious, or


simple
 Test the model's assumptions against the data

Why it's important


 Gets modeling can help simplify complex phenomena
 It can help ensure that the model is statistically adequate

Stationarity
A common assumption in many time series techniques is that the data are stationary. A
stationary process has the property that the mean, variance and autocorrelation structure do not
change over time.

What is stationarity test in econometrics?


The 'Stationarity Test' function test if the time series has a stationarity property, i.e., the
statistical properties like error means, variances and moments, do not change with time.

Why is stationarity important?


 Stationarity makes time series data easier to analyze, model, and forecast.
 Stationary time series are predictable and suitable for certain econometric models.

How to check for stationarity?


 The Levin-Lin-Chu (LLC) test can be used to determine if a time series is stationary.
 The LLC test has a null hypothesis that the data has a unit root, which means it's not
stationary.
 A low p-value from the LLC test indicates that the data is likely stationary .
Stationarity in data means that the statistical properties of a time series do not change over
time. This assumption is important because it allows for simpler analysis, modeling, and
forecasting.

Explanation
 Statistical properties: These include the mean, variance, and covariance of the
data.

 Stationary time series: A time series where these statistical properties remain
constant over time.

 Non-stationary time series: A time series where these statistical properties


change over time.
 Stationarity assumption: The assumption that data is stationary is a common
assumption in many time series techniques.

Autoregressive Distributed Lag (ARDL)


is a model used in econometrics to analyze the relationship between time series data. It's a
single-equation framework that can be used to estimate long-term coefficients.

How it works
 The model's current value of the dependent variable is dependent on its own past
values and the current and past values of other explanatory variables

 The variables can be stationary, nonstationary, or a combination of the two

 The ARDL cointegration technique can be used to determine the long-term


relationship between variables with different orders of integration

Benefits
 The ARDL method can produce consistent estimates of long-term coefficients
 The ARDL cointegration technique can be used to obtain realistic estimates of a model

Dummy Variables
 In econometrics, a dummy variable is a numeric variable that takes on a value of either
0 or 1 to represent a qualitative variable. Dummy variables are used in regression
analysis to include categorical variables in models.

Why are dummy variables used?


 They allow for more sophisticated modeling of data
 They can help control for confounding factors
 They can improve the validity of results

What are dummy variables used for?


 Representing qualitative variables like race, marital status, political party, age group,
and region of residence
 Representing seasonal effects
 Representing the occurrence of wars or major strikes in time series analysis

How are dummy variables used?


 Dummy variables can be used as explanatory variables or as the dependent variable

 Multiple dummy variables can be created to represent each level of a categorical


variable

 Only one dummy variable takes on a value of 1 for each observation

dummy variable (binary variable) D is a variable that takes on the value 0 or 1. Note that the
labelling is not unique, a dummy variable could be labelled in two ways, i.e. for variable
gender: – D = 1 if male, D = 0 if female; – D = 1 if female, D = 0 if male.

You might also like