0% found this document useful (0 votes)
23 views

Hypothesis Testing

Hypothesis testing in regression analysis is used to formally test the quality of results and theories. It involves defining a null hypothesis (H0) of no relationship between variables and an alternative hypothesis (H1). Common tests include the t-test, F-test, and Chow test. The F-test is used to test if multiple regression coefficients are equal to zero by comparing restricted and unrestricted models. p-values indicate if the null hypothesis can be rejected at different significance levels like 5%.

Uploaded by

grahn.elin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Hypothesis Testing

Hypothesis testing in regression analysis is used to formally test the quality of results and theories. It involves defining a null hypothesis (H0) of no relationship between variables and an alternative hypothesis (H1). Common tests include the t-test, F-test, and Chow test. The F-test is used to test if multiple regression coefficients are equal to zero by comparing restricted and unrestricted models. p-values indicate if the null hypothesis can be rejected at different significance levels like 5%.

Uploaded by

grahn.elin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Hypothesis testing

Hypothesis testing in regression analysis is basically a formal way of testing the quality of our results.
It can be used to preform standard tests, or to test your own theories in a formal way.

Before doing hypothesis testing, we should consider if our model can stand up to the classical linear
model assumptions. If not, the results from the hypothesis test will likely be invalid.

Procedure

Hypothesis testing is generally performed through the following steps:

1. Define the null hypothesis:

The null hypothesis (H0) is a statement that there is no relationship between two things. It’s
sometimes called the “no difference” hypothesis, because if true, it means that the claimed
relationship doesn’t exist.

It’s often written like this:

H0: β = 0

Which, in this example, would mean that our null hypothesis is that some coefficient (β) is equal to
zero.

An important part of hypothesis testing is designing a good null hypothesis, as it will form the basis
for the whole test.

2. Define the alternative hypothesis:

The alternative hypothesis (H1) is generally he opposite of the null hypothesis, i.e. when the null
hypothesis is not true. Thus, it’s often just defined as the opposite of the null hypothesis.

For example:

H0: β = 0

H1: β ≠ 0

As H1 is just the opposite of H0, we could even write: H1: H0 is not true.

This is called the two-tailed test because all values that are not the null hypothesis are covered. A
one-tailed test would have only half of the values covered, e.g:

H1: β > 0 or H1: β < 0

3. Decide which test is appropriate.

Concept

t-distribution: Similar to normal distribution.


Null hypothesis (H0): There is no relationship between two variables. This conclusion is often
reached by testing the variables, possibly rejecting it.

Alternative hypothesis (H1): Ther opposite of the null hypothesis.

Significance level: The probability of rejecting H0 when it is in fact true. In other words, the
probability of the result being incorrect. Most commonly 5%. “With a 5% significance level, the
definition of “sufficiently large” is simply the 95th percentile in a t distribution with n-k-1 df.” H0 is
rejected if t > c. We get c by looking at tables for significance level and degrees of freedom.

Degrees of freedom (df): (n-k-1), i.e. number of observations minus number of explanatory variables
minus one.

t-value: Lower means less significance of variable(?)

p-value: The smallest significance level at which we would be able to reject the null hypothesis.
Given the observed value of the t statistic, what is the smallest significance level at which the null
hypothesis would be rejected? The p-value are evidence against the null hypothesis. If the p-value is,
say 0.04, we might say there’s significance at the 5% level (actually at the 4% level) but not at the 1%
level (or 3% or 2% level).

Confidence interval (CI): Provides a range of likely values for the unknown βj.

F test: Testing multiple restrictions. Similar to t-test, but with several hypothesis at the same time.
Commonly test that all coefficients = 0, Example:

H0 : B1 = 0, B2 = 0, B3 = 0, …, Bn = 0

H1: H0 is not true.

If all are equal to 0, none would explain the model, i.e. same as saying no of the included variable
explain the model. We would only have constant. We want to test that our original model explains
more than the model under the null hypothesis, i.e. without influence from any of the variables (the
restricted model). If we get a low p-value/significance level, this means they’re not likely to be 0, i.e.
null hypothesis can be rejected at log significant levels.

Unrestricted model: Original model with all variables.

Restricted model: Some variables removed to impose restrictions on the model.

2
R : Higher when more variables are included, so lower for restricted models. R =1−
2
( SSR
SST )
. In

other words, SSR decreases with more variables, and increases with the restricted model.

If SSR increases a lot when you exclude variables, those variables have significant explanatory power,
and should not be omitted.

Chow test: Doing an F test for a restricted model setting all coefficients to equal.
F-test in Stata:

Frist to regression, ten directly after type:

“test excluded_var_1 excluded_var_2”

Automatically defines null hypothesis for these variables and does the F-test. Returns F-value “F(x,y)”
and p-value “prob > F”. Top-right of a regression, the F-value for all variables = 0 are shown, as well
as the p-value.

Compare with c (found in table), if F-value > c we can reject H0.

F value = F(q, df, p)

Q = number of restricted variables

Df = denominator degrees of freedom = n-k-1

n = number of observations

k = Number of variables in regression

p = Significance level between 0 – 1

Stat command to get c-value

invttail(n, p)

invftail(q,n,p)

When reporting the regression, we should include at least standard errors and t-statistics.

You might also like