0% found this document useful (0 votes)
24 views76 pages

Yegnanew Alem

This thesis examines the performance of heteroscedasticity consistent covariance matrix (HCCM) estimators in linear regression models through Monte Carlo simulations. Specifically, it compares the HCO, HCI, HC2, and HC3 estimators proposed by White (1980) and White and Mackinnon (1985) over different sample sizes. The results show that the HC2 and HC3 estimators outperform the others for small sample sizes, with HC3 performing best for samples under 100. For samples over 250, any HCCM estimator can be used. The thesis also finds that using HC3 is more advantageous than the ordinary least squares covariance matrix even when little heteroscedasticity is present.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views76 pages

Yegnanew Alem

This thesis examines the performance of heteroscedasticity consistent covariance matrix (HCCM) estimators in linear regression models through Monte Carlo simulations. Specifically, it compares the HCO, HCI, HC2, and HC3 estimators proposed by White (1980) and White and Mackinnon (1985) over different sample sizes. The results show that the HC2 and HC3 estimators outperform the others for small sample sizes, with HC3 performing best for samples under 100. For samples over 250, any HCCM estimator can be used. The thesis also finds that using HC3 is more advantageous than the ordinary least squares covariance matrix even when little heteroscedasticity is present.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

ADDIS ABABA UNIVERSITY

OFFICE OF GRADUATE PROGRAM


FUCULITY OF SCIENCE
DEPARTMENT OF STATISTICS

A COMPARATIVE SIMULATION STUDY OF THE


- - --HIIE'f-E-R0 S€:EBA-s-T-I-eI-'f-Y-€0NS-I-S-l'-E-N-'F-€0 VARb\-NEE-MA-'FRl-X:- - - - -
ESTIMATORS IN THE LINEAR REGRESSION MODEL

BY
YEGNANEW ALEM

A Thesis Submitted to the Office of Graduate Program of Addis Ababa


University in Partial fulfillment of the Requirement for the Degree of
Master of Science in Statistics

July, 2008
ADDIS ABABA UNIVERSITY
OFFICE OF GRADUATE PROGRAM
FUCULITY OF SCIENCE
DEPARTMENT OF STATISTICS

A COMPARATIVE SIMULATION STUDY OF THE


HETEROSCEDASTICITY CONSISTENT COVARIANCE MATRIX
ESTIMATORS IN THE LINEAR REGRESSION MODEL

,
YEGNANEW ALEM

Approved by the Board of Examiners:

s~i~l ~ £~
Department Head Signature

h it: bP-&
Internal Examiner Sig~at; "e
frwWY~l1 t/~ .
""
)
External Examiner Signature

Addis Ababa, Ethiopia


ACKNOWLEDGEMENT

I wou ld like to extend my deepest appreciation to my advisor Olusanya E.Olubusoye


(Ph.D) without whose contributions I would have had a lesser product.

My sincere gratitude goes to my family especially to Abeba, for their persistent love and
encouragement.

My special thanks go to Dr. Enunanuel G.Yohannes and Ato Milion Atsbeha who helped
me in writing the code for simt lation using Matlab 7.0.

I would also like to thank all of my friends typically of which Ato Amare, Ato Eshetu,
Ato workalemaw and Ato So lomon to their many excellent suggestions.

Most of all I want to express my gratitude to my Heavenly Father for granting me the
opportunity to study at this university and for giving me the strength to push past my own
limitations.
ACRONYMS

. HCCM: Heteroscedasticity Consistent Covariance Matrix


.CLRM: Classical Linear Regression Model
. BLUE: Best Linear Unbiased Estimators
.OLSCM: Ordinary Least Square Covariance Matrix
. HCSEs: Heteroscedasticity Consistent Standard Errors
. GLS: Generalized Least Square
.WLS: Weighted Least Square

. LM: Lagrange Multiplier

II
ABSTRACT

III the context of econometric methods of estimation the variances of OLS estimates
derived under the assumption of homoscedasticity are not consistent when there is
heteroscedasticity and their use can lead to incorrect inferences. Thus, this paper sets out
to examine the performance of several modified versions of heteroscedasticity consi stent
covariance matrix (HCCM) estimator (namely HCO, HC I , HC2 , and HC3) of White
(1980) and white and MackiJU10n (1985) over a range of sample sizes. Most applications
that use HCCM appear to rely on HCO, yet tests based on the other HCCM estimators are
found to be consistent even in the presence of heteroscedasticity of an unknown form .
Based on Monte Carlo experiments which compare the performance of the t statistic, it
was found out that HC2 and HC3 estimators precisely out perform the others in small
samp les. In particul ar HC3 estimator for samples of size less than 100 was found to be
better than the other HCCM estimators ; when samples are 250 or larger, other versions of
the HCCM can be used. Added to that, it was cost advantageous to employ HC3 instead
of ordinary least square covariance matrix (OLSCM) even when there is li ttle evidence of
hetreoscedastici ty.

Key words

Wh ite estimator, Monte Carlo Simulation, Linear Regression, Heteroscedasticlty

iii
TABLE OF CONTENTS

A(;~()\I{~~[)(J~lvl~Nll--------------------------------------------------------------------- i
A(;Fl()~S-- - ------------ - -------------- --- --- ------ - - -- -- ------ ---------------- --------------ii

J\J3S11FlA(;ll------------------------------------------------------------------------------------i ii

~llFl()[)LJ(;llI()~ ------------------------------------------------------------------------------ 1

1.1 Background of the Stud y------------------------------------------------------------ 1

1.2 Statement of the Prob lem------------------------------------------------------------ 5

1.3 ()bj ectives of the Study -------------------------------------------------------------- 5

1.3. 1 (Jeneral ()bjectives --------------------------------------------------------- 5

1.3.2 Specifi c ()bjectives -------------------------------------------------------- 5

I .4 Flesearch Questi ons ----------------------------------------------------------------------5

--------'l-:-5-l:;jmitation-of-the-S-ttldy'""'-~
- -~---
--~--------~
- - - - - --

I .6 Signi fi cance of the Slud y --------- --------------------------------------------------- 6

2. llHE()~n(;A~ FFlAlvl~W()ru<. AND LIll~FlA ll~ ~VIE\I{ -- ------------------7

2.1 llhe ~ature ofHeteroscedasticity ---------------------------------------------------7


2.2 11he (;onsequences of Heteroscedasticity ------------------------------------- - --8
2.3 [)etecti ng Heteroscedastici ty ------------------------------------------------- -- --10
2.3. I Informal lvlethods ------------------------------------------------ ----- 10
2.3.2 Fonnal lvlethods -------------------------------------------------------- 11
2.4. H(;(;lvl for the ~in ear Flegression lvlodel ----------------------------------------1 6

2.5 (;ontrolling for Heteroscedasticity and ~stimati on of Co vIjJ )------------------20


2.5.1. llhe lvlethod of (Jenerali zed (\I{eighted) ~east Squares-------------21
2
2.5.2. \l{hen ai is not Known ---------------------------------------------- 23
2.6 Fleview of Simulati on Studies Involving the H(;(;lvl ~stimator ---------------25
2.6.1 \/{hite, 1980 ------------------------------------------------------------- 25

iv
2.6.2 Mackilllion and White, 1985 ------------------------------------------ 27
2.6.3 Davidson and Mackinnon, 1993 ------------------------------------ 28
2.6.4 Long et aI., 2000 ------------------------------------------------------ 29
3. METHODOLOGY AND DATA GENERA TION----------------------------------------------3 1

3. 1 Monte Carlo Experiments ------------------------------------------------------------31

3.2 Data Structures --------------- ------- -------------- ------------------------------------34

3 .3 Data Gen erati on ------------------------------------------------------------------- --- -34

3.3. 1 Simulation -------------------------------------------------------------- 34

3.3.2 Code for Simul ation ----------------------------------------------------35

4. ANALYSIS AND INTERPRETATION OF RESULTS---------------------------- --- - 36

4.1 Resu lts of Experiments ---------------------------------------------------- --------- - 36

_ _ _ _ _-"4.2-H.o.J.n.os.ce.das.tic.Errors ___ 00 ____________ 00 ______ 00____________ - 00 ______ 0000 . . . . . . . . . 37

4.3 Heteroscedasti c En·ors ------- ----------------------------- -------------------------- 42

5. SUMMARY AND RECOMMEND ATION-- ---------------------------------------------- 53

5.1 Conclusions and Recommendations ------------------------------------------------- 53

5.2 F1II1her Research ------------------------------------------------- -------------------- - 54

REFERENCES ----------------------------------------------------------------------------------- 55

APPENDIX A: MONTE CARLO SIMULATION CODE IN MATLAB---------- 57

A. I OLSCM Estimator Code for Simulation----------------------------------------------57

A.2 HCO and HCI Estimator Code for Simulation--------------------------------------59

A.3 .HC2 and HC3 Estimator Code for Simulation---------------------------------------63

AP P END IX B: Plot of Y -------------------------------------------------------------------- 68

v
CHAPTER ONE

1. I NTRODUCTION

Thi s chapter is divided into six sections. The first section gives the background of the
study, the second secti on is about statement of the problem, the third secti on gives the
obj ectives of the study, the fourth secti on states research questions, the fi ft h secti on
expl ains the li mitation of the study and the final section is abo ut signi ficance of the study.

1.1 Background of the Study

App lied econometri cians extensively use the linear regression model. Together with its
num erous generalizations, it constitutes the fo undation of most empiti cal work in
econometrics.

Despite this fact, little is known about properties of inferences made from this model
when standard assumpti ons are violated. In partic ul ar, classical teclmiques req uire one to
assume that the error terms have a constant variance.

This assumption IS often not very plausible. Nevertheless, a way of consistently


estimating the vari ance-covariance matri x of ordinary least sq uares estimates in the face
of heteroskedasticity of unknown fo rm is avail able (e.g. see White (1980)).

Th is heteroskedasti city-consistent covari ance matrix esti mator allows one to make vali d
inferences provided the sampl e size is suffi cientl y large.

By the assllmpti on of the classical norm al linear regress ion mode l, we have

Since the mean of OJ is assum ed to be zero, we can write,


Var (c;) = cr 2 for all i.

This feature of the regress ion disturbances is known as homoscedasticity. It implies that
the vari ance of the di sturbance is co nstant for all observations.

This assumptions may not be trouble some for models involving observations on
aggregates over time, since the values of the explanatory variable are typicall y of similar
order of magn itude at all points of observation, and the same is true of th e values of the
dependant variable. For example, in an aggregate consumption function, the level of
consumption in recent years is of a similar order of magnitude as the level of
consumption twenty years ago, and the same is true of income.

U nless there are som e special circumstances, the assumption of homoscedasticity in


aggregate models seems plausible. However, when we are dealing with micro economic
data, the observations may involve substantial differences in magnitude, as, for example,
_ _ _ _ _ _.IJilll.-Ltbl1!e.J:llse ofdata on income and expendi ture of individual families .

Here the assumptions of homoscedsticity are not very plausibl e on a priori ground s since
we wou ld expect less variation in consumpti on for low- income fam ilies than for high-
income families. At low levels of income, the average level of consumption is low, and
variation around thi s level is restricted : consumption cannot fall too far below the average
level because thi s might mean starvation, and it cannot ri se too far above the average
because the asset and the credit positi on do not allow it:

These constraints are likely to be less binding at higher JI1come levels. Empirical
evidence suggests that these priori considerations are in accord with actual behavior. The
appropriate model in thi s and other simi lar cases is then one with heteroscedastic
disturbances. Regression disturbances whose vari ances are not constant across
observations are heteroscedastic. Heteroscedasticty arises in numerous applications, in
both cross-section and time selies data. However, it is most commonly ex pected in cross-
sectiona l data. For exampl e, even after accounting for firm sizes, we expect to observe

2
greater variation in the profits of large firms than in those of small ones. The variance of
profits might also depend on product di versification, research and development
ex penditure, and industry characteristics and therefore might vary across firm s of similar
sizes.

W hen analyzin g famil y spending patterns, we find that there is greater variation 111

expenditure on cel1ain com modity groups among hi gh income famili es than low ones due
to the greater di scretion allowed by hi gher incom es.

In the heteroscedastic regression model,

i=l ,. . , n

We conti nue to assume that the di sturbances are pair wise uncorrelated.

That is, (i;tj )

When heteroscedasticity alone occurs, there are n + k unknown parameters; n unknown


vari ances and k e lements in ~ vector. Without some additional assumptions, esti mation
from n sample points is clearly impossible. Add itional assumptions are usuall y made
about the disturbance process.

The ordinary least squares (OLS) lin ear regress ion model is w idely used throughout
the physical, natural and social sciences. In OLS linear regression, a vector of regression
coe ffi cients, p, in a model of the form

Y=X ~+E

is estimated, where Y is a co lumn vector of dependant vari able to be estimated, X is a


matri x of predi ctor variable values, and E is a vector of etTOrs. The elements in ~ provide
in folmation about a predictor variable 's unique or pal1ial relationship with the dependant
vari abl e, contro lling for the oth er predi ctor variables .

3
It is wel l known that the presence of heteroscedasticity in the d isturbances of an
otherwise properly specifi ed linear model leads to consistent but inefficient parameters
estimates and inconsistent covariance matri x estimates. As a result, faulty inferences will
be drawn when testi ng statisti cal hypotheses in the presence ofheteroscedasticity.

Researchers often are interested in testing the nu ll hypothesis that a specific element in ~

is zero, or forming a confidence interval (eI) around the estimate. It is well known that
such inferential methods assum e homoscedasticity in the erro rs. Vio lations of
homoscedasticity can yield hypothesis tests that fail to keep fa lse rejections at the
nom inal level, or confidence intervals that are either too narrow or too wide. Given that
homoscedasticity is often an unrea li stic assumption or clearly violated based on the data
availabl e, the researcher should be sensitive to if and how hi s or her results may be
affected by heteroscedasticity.

Based on the work of Long and Ervin (2000), several HCCM methods of estimating the
standard error of regression coefficient that can be used if the researcher is concemed
about the effects ofheteroscedasticity on hypothesi s tests and confidence intervals. All of
the HCCM methods described are based on an approxi mation of the variance covariance

matri x of the estimated regression coefficients using the square of the residuals (e: ,
where e, = y, - x,fJ from an OLS linear regression.

The four HCCM methods HCO, HCI , HC2 and HC3 differ in how those squared
residuals are used in the esti mation processes. Once the heteroscedasticity-consistent
covanance matrix (HCCM) is estimated, the standard errors for the regression
coefficients are si mply the sq uare root of the diagonal elements of the HCCM. Since
covariance matrix estimators are most frequently used to construct test stati stics, we
focu s on the behavior oft statistic constructed using these different est imators.

In thi s study, we tri ed to investigate the perfonnance of the HCCM methods over a range
of sampl e sizes. In other words, we wished to empiri call y assess their perfonnance.

4
1.2. Statement of the Problem

Since covanance matri x estimators are used to comp ute test statistics, we wish to
empiri ca ll y assess the small and large sample performance of the four HCCM and
OLSCM t statistics to test hypoth eses that particular element of ~ assume specified
valu es.

1.3. Objectives of the Study

1.3.1 General Objectives:

To compare the performance of various HCCM estimators over a range of sampl e si zes.

1.3.2 Specific Objectives:

To generate disturbances with four different structures of heteroscedastici ty and at


merent samp le sI zes.

To app ly the OLSCM and the four I-ICCM methods to the simu lated data generated under
the above cond ition.

To evaluate the performance of the OLSCM and the HCCM methods over a range of
sample sizes.

1.4. Research Questions

Thi s paper will answer the follo wing research questions:

1. Which HCCM performs better for sample sizes: 10,25 ,50, 100,250,500,600 and 10007

2 . What is the consequence of using HCCM tests when there is no heteroscedasticity7

3. What is th e consequence of using OLSCM test when there is hetero scedasticity7

5
1.5 Limitation of the Study

Whil e no Monte Carlo simu lation can cover all variations that might influence the
propel1ies of the statistics being stud ied, our simulations explore a medium range of
statistics that are common in cross- sectional data.

1.6 Significance of the Study

It is well known that violations of homoscedastic ity can yield hypothesis tests that fail to
keep fa lse rejections at the nom inal level , or confidence intervals that are either too
narrow or too wide. Given that homoscedasticity is often unrealistic assumption or
clearly vio lated based on the data avail able, the researcher should be sensitive to if and
how his and her results may be affected by heteroscedasticity.

The use of HCCM allows a researcher to easily avoid the adverse effects of
- - -_ _-'.I.etero.scedasticjt~ en when nothing is known about the fOlm ofheteroscedasticity.

The HCCM provides a consistent estimator of the covariance matrix of the regression
coefficients in the presence ofheteroscedasticity of unknown fOfm.

This is particularly usefu l when a suitabl e varian ce- stabilizing transfOlmation or weights
calmot be found , or weights cannot be estimated for use in GLS .

Tn thi s paper we have added our own contributions based on Monte Carlo simulation and
recommended that HC2 or HC3 should be used in favor of HCO.

6
CHAPTER TWO

2. THEORETICAL FRAMEWORK AND LITERATURE REVIEW

In order to fully consi der each of the estimators discussed in the previous chapter, the
theoretical framework and literature review detailing these methods will be examined.
This chapter is divided into fi ve sections. The first section is about the nature of
heteroscedasticity, the second section discuses the consequences of heteroscedasticity,
the third sectio n is about detecting heteroscedasticity, the fourth section is controlling for

heteroscedasticity and estimation of Cov(t}), and the final section is a review of previous
simulation studies invo lving the HCCM estimator.

2.1 The Nature of Heteroscedasticity

To illustrate the nature of heteroscedasticity, recall the two-variable regression model of


consumption:

(2.1.1 )

where Y; = monthly household expenditure; and X; = monthly household incom e. Now,


previously we assumed, at least implicitly, that the popu lation error term, Ci, is
homoscedasti c, that is the variance of each Ci, conditional on X;, is some constant eq ual to
(J2 , or E (d) = (J2 .

Intuitively, what this says is that the spread, or di spersion, of monthly expenditures
among low-income households is the same as among hi gh income households. But
maybe this assumption is a little strong, for hi gh income households have more scope for
choice about the disposition of their income and thus more choices about their
consumption behavior. Thus, we might expect a greater dispersion of monthly
expenditures for higher-income househo lds than for lower-income households. In this

7
case, the stoch asti c di sturbance tern1 , Ei, is not homoscedastic, but rather heteroscedastic,
wi th the variance of Y; increases as X; increases, that is E (Ei 2) = ()i
2
for all i.

Now in general, there are several reasons why stochastic disturbance term s may be
heteroscedastic, five of whi ch are:

1. Learning by doing: As people learn, their errors of behavior become smaller over ti me. In
2
thi s case ()i wou ld be expected to decrease. For examp le, consider the relationship
between productivity (wages) and experience.

2. Scope for choice: As incomes grow, peop le have more discretionary incom e and thus
more choices about cons um ption and sav ing behavior. Similarly, companies with larger
profits are gen erally expected to show greater variability in their dividend policies than
companies with lower profits.
2
3. Improvement in data collecting techniques: As data collecting techniques improve, ()i is
ua"'t.da_ _ _ _--<
------I·i·ke+y-te-G€GF€las~hus,___ins t.i.tutiQll&---aJld-Ql:gani6ati.onS-thaUllU!.e-5ophi.stic.ate.ud_d

processing equipment are likely to commit fewer errors.

4. Outli ers: An outlying observation, or outl ier, is an ob servation that is much different in
relation to the other observations in the sample. The inclusion or exclusion of such an
observation, especially if the sample size is small, can substantially alter the results of
regression analysis.

5. Misspecified regression models: Very often what looks like heteroscedasticity may be
due to the fact that some important variables are omitted from the model.

2.2 The Consequences of Heteroscedasticity

Suppose our data on househo ld consumption expenditure IS plagued by


heteroscedasticity. What happens to our OLS estimators and their variances? To answer
this question, let us just focus on the slope coefficient P 1in our two-variable model of

hou seho ld consll mption:

8
(2 .2.1)

where, y ; = y ;-Y and X; = X ;- x

. While the coefficient estimate is unaffected, its variance is:

(2.2.2)

where had the data not been plagued by heteroscedasticity, the variance would be:

(2.2.3)

ence, le con sequences oi heterosceaastlcity are as fo-Ilrrot<w"'s-. - - - - - - - - - - - - - - -

1. OLS estimators and forecasts based on them remain unbiased and consistent.

2. However, OLS estimators are no longer BLUE because they are no longer efficient. As a
result forecasts will be inefficient.

3. Because regression coefficient variances and covanances are biased and inconsistent,
tests of hypothesis, that is I-tests and F tests, are in valid.

In short, if we persist in using the usual testing procedures despite heteroscedasticity,


whatever conclu sions we draw or inferences we make may be very misleading.

9
2.3 Detecting Heteroscedasticity

2.3.1 Informal Methods

The probl em of heteroscedasticity is li kely to be more common in cross-sectional than in


tim e series data. In cross-sectional data, one usually deals wi th members of a population
at a given point in time, such as ind ividual consumers and their families, firm s, industri es,
or geographical subdi visions such as state, country, city, etc. Moreover, these members
may be di fferent sizes, such as small , medium, or large firms or low, medium or high
income. In tim e seri es data, on the other hand, the variables tend to be of similar orders of
magnitude, because one generall y co llects the data for the same entity over a period of
time.

1. Nature of the problem

---------''J-pry-o·ft-ell- the--naruni}- Gf- the.......pwbJ.em J!I]der consideration suggests whether


heteroscedasticity is likely to be encountered. In cross-sectional data invo lving
heterogeneous units, heteroscedasticity may be the rule rather than the exception. As a
rule of thumb, the greater is the degree of heterogeneity in a sample, the more likely
heteroscedasticity will be present.

2. Graphical Methods

If there is no a priori or empirical infonnation about the nature of heteroscedasticity, in


practice one can do the regression anal ysis on the assumption that there is no
heteroscedasticity and then do a postmortem examination of the residual sq uared e ;2 to

see if they exhibit any systematic pattern. Although e ;2 are not the same thing as Ei 2, they
can be used as prox ies especiall y if the sample size is suffici entl y large. To carry out this

infolmal method, simpl y scatter plot e ;2 against y, or one or more of the explanatory
variabl es, or both.

10
2.3 .2. Formal Methods

1. White's Heteroskedasticity Test

This is a test for heteroskedasti city in the residuals from a least squares regress ion
(White, 1980). Ord inary least squares estimates are consistent in the presence
heteroskedasticity, but the conventi onal computed standard errors are no longer valid . If
you fi nd evidence of heteroskedasticity, you should either choose the robust standard
errors option to correct the standard errors or you should model the heteroskedasticity to
obtain more efficient estim ates using weighted least squares .

White' s test is a test of the null hypothesis of no heteroskedasticity agai nst


heteroskedasticity of some unknown general fo rm . The test stati sti c is computed by an
auxi li ary regression, where we regress the squared residual s on all possible (non
redundant) cross products of the regressors. For example, suppose we estim ated the
o owmg regressIOn:

(2 .3. 1)

The test statistic is then based on the auxi li ary regression:

(2.3 .2)

For exampl e the software Eviews reports two test statistics fTom the test regression. The
F -statistic is an omitted variable test for the joint significance of all cross products,
exc luding the constant. It is presented for comparison purpo ses. The n.R2 stati stic is
White 's test statisti c, computed as the number of observations tim es the centered fro m the
test regression. The exact finite sampl e distribution of the F-statistic under Ho is not
known, but White's test stati stic is asymptoti cally distributed as a "l with degrees of
freedom equal to the number of slope coeffici ents (exc luding the constant) in the test
regression. White also describes th is approach as a general test for model

11
misspeci fication, since the null hypothesis underl ying the test assumes that the errors are
both homoskedasti c and independent of the regressors, and that the li near speci fication of
the model is CO lTec!. Failure of anyone of these conditions co uld lead to a signifi cant test
stati stic. Conversely, a non-significant test stati stic imp li es that none of the tlu'ee
conditi ons is violated.

a. Features of White's test

• Does not require any prior knowl edge about the source of heteroscedasticity.

• It is a large sample Lagrange Multiplier (LM) test

• Does not depend on the nonmality of population errors.

2. Park, Glesjer, and Breusch-Pagan-Godfrey Tests

I tIu'ee onhese tests are si mil'ar,-su-we-wi+I-address-+heln-aiHI-gl'0uI'l~bike-Whi.te2.s-testr


, -----l

each of these tests are LM tests and thu s fo llow the same general procedure. Given the
following regression model, carry out the following steps:

Steps

i. Gi ven the data, estimate the regression model and obtain the residuals.

ii. Next, estim ate the following auxiliary regression models and obtain their R2s.

Park Test

(2.3.3)

G lesjer Test

(2.3.4)

12
Breusch-Pagan-Godfrey Test

2
,.....,, 2 e
i
(2.3.5)
ei= 'Lei

where in each auxiliary regression, the Zs may be some or all of the Xs.

iii. Compute the LM test statistic: Under the null hypothesis that there IS no
heteroscedasticity, it can be shown that th e sample size n times the R2obtained
fr om the aux iliary regressions asymptotically follows the chi -square
di stribution w ith degrees of freedom eq ual to the number of regressors (not
including the constant term) in the auxili ary regression. That is,

(2.3.6)

- - - - - - - - ----tt-i· ri·mportanHo-note-ti1aHi1e-test lahstiG-s-oo-gi1'laU-y-p~QP Qse<Lb.y--.EaJ:k...an,,--_ _ __

G lesj er are Wald test statistics, and the test statistic originall y suggested in the
B reusch-Pagan-Godfrey Test is one-half of the auxili ary regress ion 's
expl ained sum of squares, di strib uted as chi -sq uare with p degrees of freedom.
However, as pointed out by Gluarati (1 99 5), since all of these tests are simpl y
large-sampl e tests, they are all operationall y equivalent to the LM test.

iv. PerfonTI the LM test by comparing n.R2 to the chi-square critical valu ex2n, p. If II
.R2>i 2 2
n, p, the conclusion is that there is heteroscedasticity. If n.R <X n, p,

there is no heteroscedasticity, which is to say that

13
b. Features of the Park, Glesjer, and Breusch-Pagan-Godfrey Tests

• Th ey all require knowledge about the source of heteroscedasti city that is the Z
vari abl es are known to be responsible for the heteroscedasticity,

• They are all , in essence, large samp le Lagrange Multiplier (LM) tests

• In the Park test, the error term in the auxili ary regress ion may not satisfy the CLRM
assumptions and may be heteroscedastic itself. In the Glejser test, the error ternl is
nonzero, is seriall y correlated, and is iron icall y heteroscedastic, In the Breusch-Pagan-
Godfrey test, the error term is quite sensiti ve to the nornlality assumption in small
sampl es,

3. Goldfeld-Quandt test

An alternative and popular test fo r heteroscedastici ty works from the intuition of the
problem: If popul ation errors are homoscedastic, and thus share the same variance over
all observations, then the vari ance of residual s from a part of the sampl e observations
should be equal to the vari ance of residuals from another part of the sample observations,
Thus, a "natural" approach to testing for heteroscedasticity wou ld be to perfornl and F-
test for the equali ty of residual variances, where the F-statistic is simpl y the ratio of two
sampl e variances, Like the other tests we have di scussed, thi s on e too involves a number
of steps, Consider the fo llowing regression model:

(2 3 ,7)

Steps

I. Identify a variable to which the population elTor variance is related, For the
sake of ill ustration, we wi ll assume that XI is rel ated to val' (Ei) positively,

II. Order or rank the observations accord ing to the values of X l, beginning with
the lowest X value,

14
111. Omit c central observations, where c is specified a priori , and divide the
remaining /1 - c observations into two groups each of (/1 - c)/2 observations,

The choice of c, for the most part, is arbitrary; however, as a rule of thumb
will usually lie between one-sixth and one-third of the total observations,

IV, Run separate regressions on the first (/1 - c)/2 observations and the last (/1 -
c)12 observations, and obtain the respective residual sum of sq uares: ESS I
representing the residual sum of squares from the regression correspondi ng to
the smaller XI va lues (the small variance group) and ESS z that fro m the larger
Xz values (the large variance group),

v, Compute the F-statistic

(2,3,8)

where

d.j.= n-c-2(k + l)
2

and k is the number of estimated slope coefficients,

VI. Perform the F-test. If Ej are nonnally distributed, and if the homoscedasticity
assumption is valid, then it can be shown that F follows the F di stribution with
degrees of freedom in both the numerator and denominator. If F >Fa, then we
can reject the hypothesis of homoscedasticity, otherwise we cannot reject the
hypothesis of hOl11oscedasticity.

15
Features of the Goldfeld-Quandt test

• The success of this test depends importantly on the value of c and on identifying the
correct X variable with which to order the observations .

• This test can not accommodate situations where the combination of several variabl es is
the source of heteroscedasticity. In thi s case, because no single variable is the cause of
the probl em, the Goldfeld-Quandt test will likel y concl ude that no heteroscedasticity
ex ists when in fac t it does.

2.4. HCCM for the Linear Regression Model

Th is paper deals exclusi vely with the linear regression model of the fonn

y= XjJ + E (2.4.1 )

where y is an 11 x 1 vector of observations on a dependent variable, X IS taken as a non-


stochastic 11 x k matrix of observations on independent variables (assumed to be of full
rank) and E is an n x 1 vector of disturbances.

E (E) = 0 and E (EE') = n, a positive definite matrix.

The ordinary least squares estimator for this model is

p=(xxJ'X'y (2.4.2)

which is best linear unbiased and has covariance matrix:

cov(fJ)=(XXJ' X'nX (XXJ' (2.4.3)

Suppose we assume the errors are homoscedastic,

That is, (2.4.4)

16
Then equation (1.2.3) simplifies to:

(2.4.5)

This can be conventiona lly estimated as

(2.4.6)

Since X is non-stochastic and E is normall y distributed, inferences in finite samples can

then be based on the t or F distribution. If ej denotes the OLS residuals, Y, - X} ,


where Yj denote the ith observation on the dependant variable and Xj denote the ith row of
the X matrix, then the OLS covariance matrix estimate is given by:

OLSCM = Le:
- - - - - - _1.
(XXr (2.4.7)
-k---'-_-'--_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

where n is the samp le size and k is the size of the p vector.

This is a classical technique that requires one to assume that the eITor telTl1s have a
constant variance (homoscedasticity) . In this case, an OLSCM estimator is appropriate
for hypothesi s testing and computing confidence intervals. This assumption is often not
very plausible. If the regression disturbance is heteroscedastic, we have E (Ej2) = Oj2, for
all i.

Thi s impli es that the van ance of the disturbance may vary from observation to
observation, and we want to know how this behavior of the variance affects the properties
of the least square estimators of the regression coefficients. A heteroscedasticity-
cons istent covari ance matrix estimator (hereafter, HCCM) wh ich allows one to estimate
(2.4.3) consistently under general condition is

HCO =(XXr x6x (xxr (2.4.8)

17
" . 2 2 2
where Q=dlag(e,e,
I 2
·· ·,e) II

The square roots of the elements on the principal diagonal of HCO are the estimated
standard errors of the OLS coefficients. They are often referred to as heteroscedasticity-
consistent standard errors (HCSEs). The usual t and F tests are now only valid
asymptotically. As shown by White (1980) and others, HCO is a consistent estim ator of

Cov{jJ) in the presence of heteroscedasticity of an unknown form.

A number of studies have sought to improve on the White estimator for OLS. The
asymptotic prop etii es of the estimator are unambi guous, but its usefu lness in small
samp les is open to question . In other wo rds, the White procedure has large sample
validi ty. It may not work very well in finite samples. The possible problems stem from
the general result that the squared OLS residuals tend to under estimate the squares of the

true disturbances [that is why we use Xn _k) rather than X in computing S2).
The end result is that in small sampl es, at least as suggested by some Monte Carlo studies
[e.g. , Mackinnon and White (1985)) , the White estim ator is a bit too optimistic and its
asymptotic t ratios are a little too large. Davidson and Mackinnon (1993) suggest a
number of fixes, which include som e evidence about corrections to ej2 that can improve

finite sample performance. The simp lest correction, is to replace ej2 by, _11 _ e' . This
11 - k '

makes degrees of freedom correction that infl ates each resi dual by a factor of ~ n-k
n . In

other words it is a cOtTection that includes scali ng up the end result by a factor _ 11 _ .
l1-k
This yields the modified estimator, wh ich is known as HCI:

HC l = _11_(XX)-1 XCliag [e~~(XX)-1 = _11_ HCO (2.4.9)


l1 -k l1-k

18
This cOlTection is similar to the one conventionally used to obtain unbiased estim ator of
(J2 Recall that nin equation (1.2.8) is based on the OLS residuals e, not the etTOr E. Even
if the elTors are homoscedastic, the residuals may not be. A better cOlTection is to use the

squared res idual scaled by its true variance ~ hu' instead of ei 2, where hii = Xi(X'Xr
IX;'. The rational of thi s cOlTection may be shown as fo llows. It is known that

e=My

where M = I - X (X'Xr lX' , which is a symmetric, id empotent matrix with the


properties

MX = 0 and ME = e

By assumin g homoscedasticity, the variance matrix of the OLS residual vector is

E (ee') = E (MEE'M) = (J2M (2.4. 10)

The i'lt element on the principal diagonal of the matrices in equation (2.4.10) gives

(2.4 .11)

The mean squared residual thus underestimates (J2, whi ch suggests the second correction

given in thi s paragraph. The term mii is the ith diagonal element of the matrix X (X'Xr

2 2
IX' . Equation (2.4.11) suggests that while ei is a biased estim ator of (Ji , ~ hiwi ll
be less biased. Thus, Horn and Duncan (1975) suggest using

~ hi;
.--.2 e. (2.4 .12)
(J j = 1 1-

as an almost unbiased estimate for crr Fo llowin g their approach, Mackinnon and White
(1985) propose the estim ator and refer to th is estimator as HC2:

19
HC 2 = (XXJ' x fix (XXJ' (2.4.13)

h
were r.
>l=
( ~2~' ,,, ,(j"
d'wg(j"(j" ~, )

It is imm edi ate from (2.4.11) that HC2 will be unbi ased when the e; are \0 fact
hom oscedasti c.

Mackinnon and white (1985) also suggest a third correction, :(_)'


% 1 h II..
as an

approxim ation to an estimator based on the "Jackknife" technique, but their advocacy of
thi s estimator is much weaker than that of the other two. They refer to this covariance

matri x estimator as HC3. To get this estim ator, replace e;2 by e:/
( _ )' as
/ (1 hu

He3 - (X'XJ~{tla{7« hii)']~~I--'-----1~-411--_ _


It is evident that HC3 is asymptotically equivalent to HCO, HC1 , and HC2, since

Yn times the middle factor clearly converges to Yn times X'DX.


Dividing e;2 by (1- hiiY further inflates e;2, which is thought to adjust for the "the over
influence" of observations with large variances_

2.5 Controlling for Heteroscedasticity and Estimation of Cov(,e ):


Remedial measures

Heteroscedasticity does not remove the unbiasedness and consistency properties of the
OLS estimators, but they are no longer effici ent, not even asymptoticall y (i.e., large
sample size). This lack of effici ency makes the usual hypothesis testing procedure of

20
dubious value. Therefore, remedial measures may be called for. Actually there are no
hard and fast rul es for detecting heteroscedasticity, only a few rules of thu mb. Now given
that we have detected heteroscedasticity, what in the world can we do about it? In other
words, is there a remedial measure, whi ch wi ll permit our regress ion models to comply
w ith the CLRM assumptions and thus obtain BLUE estimates? Fortunately, there are two
approaches to remediation: When aj2 is known and when a/ is not known .

2.5.1. The Method of Generalized (Weighted) Least Squares: When a?


is Known.

Ideall y, we would like to devise the estimating scheme in such a manner that
observation s coming from populations w ith greater variability are given less weight than
those coming from populations with smaller variability. Unfortunately, the usual OLS
method does not follow this strategy and therefore does not make use of the
"infomlation" contained in the unequal variability of the dependent variable: It simply
assigns equal weight or importance to each observation.

But a method of estimation, known as generali zed least squares (GLS), takes such
infomlation into account exp licitly and is therefore capab le of producing estimators that
are BLUE. To see how this is accomplished, let us recall the general three-variable
regressIOn model:

(2.5.1)

Now let's assume that the heteroscedastic variance aj2 is known. Di viding our regression
model through by aj , we obtain

(2.5.2)

which for th e sake of exposition we express as

21
• .. * • •.•
y , = /3 X o'+ /3 1 Xli + /3,X 2i+ G, (2.5.3)

where the stalTed or transfomled variab les are the original variables divided by the

known cr; , and X '. = _1_


DO
. I have starred the regression coefficients to distinguish them
(J,

from the usual OLS parameters. Now the reason why we transfonn the variables this way
is because the transformed error terms, E; ' are homoscedasti c ! To see this, noti ce the
following :

va,l,-;j.E(e:] .E(;J
=~E(G~} (2.5.4)
(J ,
I ,

this is constant l Since with the transfOlm ed variables Ei


• sati sfies the CLRM

assumptions, if we appl y OLS to the transformed model , it will produce estimators that
~ . ~

are BLUE. In other words, /3j for) = 0, 1, 2 are BLUE, but /3j are not. Th is procedure

oftransfonning the original vari ab les in such a way that the transformed variables satisfy
the assumptions of the classical model and then appl ying OLS to them is known as the
method of Generalized Least Squares, and the estimators produced by this method are
titled GLS estimators. Thi s case of GLS is al so cal led Weighted Least Squares (WLS)
where the variable transformations are simply a "weighting" of the original vari ables by

the factor W, = _1_ . Notice that the tranSf01111ed model no longer includes a constant term.
(J ,

Hence, under GLS, the R2 calmot be interpreted. W11ile in practice cr;2 is almost never

22
known , if it is known then it will take on a particular fonn, specifica lly it will likely be
proportional to one or more of the explanatory variables. For example, let us suppose that

O" i is proportional to XI , that is

Var(&J =O"~ = 0"' x :, or eq ui va lently (J , = (Y Xli (2.5 .5)

then transfomling the original model by multipl yi ng each vari ab le (including the

constant) by the weight W = _ 1_ yields


/ Xli

Y' = (3' X ' .+ (3' + (3' X ' .+ co'


I Q 01 J 2 21 G I
(Includes a constant teml),

where

(2.5.6)
,
' 0" , ,
0" -- , = 0"
0" ,

2,5.2. When ol is not known


Well given that cr/ is rarely ever known , is there a way of obtain ing consistent estimates
of the variances and covariances of OLS estimators if there is heteroscedasticity? The
answer is yes and the remedy is titled Wh ite's HCSEs (also known as HCCM). White
(1980) introduced a consistent estimator for the situation in which Q is di agonal but
possibl y heteroscedastic. This method uses the estimate

23
cov(p) = (xxrx&(xxr (2.5.7)

Where

o o
o o
(2.5.8)

o o

Notice that thi s estimate of n does not allow for correlation among observations. Thi s
estimate has several names, including the sandwich estimator, the consistent variance
------esti·mat-(}rTth~bllst_esti.mator~n cLthe.-White..'.s..es.timator In this thesis, it will be referred
to as the White estimator.

The Wh ite estimator is biased, but it is consistent (Kauermann 200 1). One of the benefits
of thi s estimator is that it does not rely on a specific fomlal model of the
heteroscedasticity for its consistency (White 198 0). The White estim ator is computed
using ordinary least squ ares residuals, which tend to be too small (White 1985).
Kauermann (2001) provides an option to reduce bias when using the White estimator;
namely, substitution of

(2.5.9)
&= ( V;
, I-hii) 1

for

(2.5. 10)

24
where m jj is the ith di agonal element of the hat matrix

(2. 5.1 1)

A generali zation of the White estimator is often used when Q is block-diagonal, with
blocks (Q;) of equal si ze. One can use the least squares estimator (2.4.2) with

cov(p) =(XXrX'QbX(XXr (2.5. 12)

Where

(y,-x"B Xv,- x"B) 0 o


o (Y,-x) Xv,- x) ) o
--------Q.~I-____________~________~________~---~(~
2.~
5.~
1 3~)------

o o (y"- x)Xv" - X) )

2.6 Review of Simulation Studies of the HCCM Estimator

Few simulati on studies have been done to investigate the properties of the HCCM
estim ator (e.g. Wh ite 1980, Mackinnon and White 1985 , Davidson and Mackinnon 1993,
and Long et al. 2000).

2.6.1 White, 1980

Wh ite (1980) has presented general condi tions under wh ich a consistent estimator of the
OLS parameter covariance matrix can be obtained, regardless of the presence of

25
heteroscedasticity in the di sturbances of a properl y specifi ed linear model. Since this
estimator does not require a formal modeling of the stmcture of the heteroscedasticity
and since it requires on ly the regressors and the estimated least squares residuals for its
computation, the esti mator that is given by

flCO = (xx t x6x (xx t (2.6.1)


= (XXtx£iiag(e~)x(XXt

should have wide applicability. Additional conditions are gIven which allow the
in vestigator to test directly for the presence of heteroscedasticity. If found , elimination of
the heteroscedasticity by a more careful modeling of the stochastic structure of the model
can yield improved estimator efficiency. According to Wh ite (1980) , one had either to
model heteroscedasticity correctly or suffers the consequ ences.

The fact that the covariance matrix estimator and heteroscedasticity test given by Wh ite
(1 980) do not reqUIre formal modeling-of-the-heteroseedashe-sfFHsfuf I-s- a-gr,eal.--_ _ __
convenience, but it does not relieve the investigator of the burden of carefully specifying
hislher models . Instead, we hoped that the stati stics presented here wou ld enabl e
researchers to be even more careful in specifying and estimating econom ic models.

T hus, when a formal model for heteroscedasticity is availab le, application of the tool s
presented by White (1980) wi ll allow one to check the validity of this model, and
undertake further modeling if indicated. But even when heteroscedasticity catmot be
comp letely eliminated, the heteroscedasticity covariance matrix of equation (2.4.7)
allows correct inferences and confidence intervals to be obtained.

White (1980) introduced thi s idea to econometrici ans and deri ved the asymptoticall y
justified fOlm of the R CCM known as RCO. R CO is the most commonly used form of
the R CCM and is referred to as the White 's estimator. As shown by White (1980) and

others, R CO is a consistent estimator of Cov(jJ )in the presence of heteroscedast icity of

an unknown form.

26
White has shown that thi s estimate can be performed so that asymptotically valid (i.e.,
large-sample) statistical inferences can be made about the true parameter values.
Nowadays, several computer packages present White's heteroscedast icity corrected
variances and standard errors along with the usual OLS variances and standard errors.
Incidentally, Whi te ' s heteroscedasticity corrected standard errors are also known as
robust standard errors . White's estimator is consistent under both homoscedasticity and
heteroscedasticity of unknown form, but it can be quite biased when the sample size is
small.

2.6.2 White and MacKinnon, 1985

White and MacKilllion (1985) considered three altemative esti mators designed to
im prove the small samp le properties of HCO. They examined several modified versions
of the heteroscedasticity- consistent covariance matri x estimator of White (1980) on the
basis of sampling experiments which compare the performance of t statistics. They found
that one estimator, based on the jackknIfe, perlonns better 1I1 small s amp les1lrnnlrete··~.- - - -
In every single case, the standard deviation of the t-statistics based on HCI exceeded that
for HC2, which in tum exceeded that for HC3. Since there was celi ainly no tendency for
HC3 to have too small variances, thi s imp li es that HC3 is the covariance matrix estimator
of cho ice. The di fference between HC 1 and HC3 is often striking, and the difference
between HCO and HC3 would, of course, be even more striking. The usual OLS
covariance estim ator can be very seriously misleading in the presence of
heteroscedasticity. When it is, HC3 is also likely to be misleading if the sample size is
small, but much less so than OLS. When there is no heteroscedasticity, all the HCCM
estimators are less reliabl e than OLS, but HC3 does not seem to be much less reli able.
They studi ed the properties of several altemative tests for heteroscedasticity. They found
that they often lack power to detect damaging levels of it. Thi s fact, together with other
results, suggests that it may wise to use HC3 when there is little evidence of
heteroscedastici ty.

27
2.6.3 Davidson and MacKinnon, 1993

Davidson and MacKinnon 1993 rai sed the follo wing two crucial questions:

1. What is the difference between the conventional and the correct estimates of the standard
elTors of the OLS coefficients?

2. What is the difference between the COlTect OLS standard etTors and the GLS standard
errors?

They provide some Monte Carlo evidence on these questions. They considered the
fo llowing simple model,

(2.6.2)

They assumed that Pl= l, pz= 1 and Ci -N (0, Xia), with n= 100, Xi unifom11y distributed
- - - - - -beUlteen..-QancLl,-'ffid q is a parameter that takes on various values. As the last expression
shows, they assumed that the error variance is heteroscedastic and is related to the value
of the regressor X with power a.

If, for examp le, a = l , the error variance is propOliional to the value of X; if a =2, the
error variance is proportional to the square of the va lue of X, and so on. The ineffici ency
increases substantiall y with a . Based on 20,000 replications and allowing for various
values for a, they obtain the standard errors of the two regress ion coefficients using OLS,
OLS allowing for heteroscedasticity, and GLS.

The most striking feature of this result is that OLS, with or without corrections for
heteroscedasticity, consistently overestimates the true standard etTOr obtained by the
(correct) GLS procedure, especiall y for large values of a, thu s estab li shing the
superiority of GLS. These results also show that if we do not use GLS and rely on OLS-
all owi ng for or not all owing for heteroscedasticity - the picture is mixed. The usual OLS

28
standard enors are either too large (for the intercept) or generally too small (for the slope
coefficient) in relation to those obtained by OLS all owing for heteroscedasticity.

The message is clear: in the presence of heteroscedastic ity, use GLS . However, 111

practi ce it is not always easy to app ly GLS. The White procedure has large-sample
valid ity. It may not work very well in finite samples.

2.6.4 Long et aI., 2000

White and MacKilillon (1985) proposed three alternatives to reduce the bias of the
Stand ard HCCM Estimator (2.5.7). They did th is after noti cing that the ori ginal HCCM
estimator introduced by White (1980) did not take into accou nt that ordinary least squares
residuals tended to be too smal l. The Standard HCCM Estimator and three alternatives
introduced to reduce bias are given by HCI, HC2 and HC3 . Each of these three
alternatives is designed to reduce the bias of no(2.5.12). The properties of these three
- - - -_ _al.tematiYes are described by White and MacKi nnon (1985) and applied to data from
various experiments. They ran a simulation study to compare the HCCM estimator HCO
to the tlu·ee alternatives HCl, HC2 and HC3. They also tested the ordinary least squares
estimator with the usual estim ate for the covariance matrix OLSCM. They tested each
estimator under varying degrees of heteroscedastici ty. They also varied the sample sizes
from 25, 50, 100, 250, 500 and 1000. They generated 100,000 observations for each
independent variable. They next generated random enors with the desired en or structure .
Then, with these errors, they computed the dependent variables. Random samples of the
proper size were drawn without rep lacement. Each of these combinations were run using
the Standard HCCM Estimator HCO, the three altern atives to the Standard HCCM
Estimator HCI , HC2 and HC3 , and the ordinary least squares estimator OLSCM. Since
the ordin ary least squares estimator is designed for homoscedastic data, it had the best
properties for homoscedastic data. The propel1ies for th e HC3, were nearl y the same as
the properties of the ordinary least squares estimator, even at the smallest sample size (II
= 25). The tests for the other three estimators had varying degrees of distortion for sample
sizes n::: 100. All of the tests had nearl y th e same properties for n> 250. Under milder

29
forms of heteroscedasticity, the ordinary least squares estimator worked well for all
sample sizes. With more extreme form s of heteroscedasticity, the ordinary least squares
estimator performed increasingly worse as the sample size increased. When n ~ 500, the
four different estimators perfomled similarly. Long et al. (2000) concluded that the
estimator HC3 worked better than all of the other estimators under both
heteroscedasticity and hOllloscedasticity. Lon g et al. (2000) proposed that the estimator
HC3 should always be used.

30
CHAPTER THREE

3. METHODOLOGY AND DATA GENERATION

Thi s chapter consists of three different sections. The first sect ion is about Monte Carlo
experiments; the second section is data structures, the last section hi ghli ghts the
procedure for data generation.

3.1 Monte Carlo Experiments

There are many things that faster computers have made possible in recent years. For
scienti sts, engineers, stati sticians, managers, investors, and others, computers have made
it possible to create models that simulate reality and aid in making predictions. One of
the methods for simulating real systems is the ability to take in to account randomness by
investigating hundreds of thousands of different scenarios. The results are then compiled
and used to make decisions. Thi s is what Monte Carlo simulation is all about.

In this study, Monte Carlo simulation is used to examine the behavior of tests using the
ord inary least square covariance matrix (OLSCM) and the four versions of the HCCM
presented above. Our experiments si mulate a variety of data and error structures that are
likely to be encountered in cross-sectional research. To thi s end, we considered errors
that were normal. The independent variables were constructed with a variety of
distributions, including uniform, chi-square, normal, binomial, and binary. Finally, a
variety of different fonns and degrees ofheteroscedasticity were considered.

Each simulation involved the followin g steps:

I . Independent variables: 10,000 observations for two independent variables were


constructed and used for each experiment. The independent variables were constructed to
include a few distributions.

31
2. En'ors: Four en'or structures were chosen to represent common types of homoscedasticity
and heteroscedasticity. 10,000 observations were generated for each error type.

3. Dependent variab les: The dependent vari able was constructed as a linear combination of
two independent variables plus the enor term . The comb inati on of the independent
variabl es, the etTor, and the dependent variable made up the population for each structure.

4. Simulations: From each population, a random samp le without replacement was drawn.
Since a different random sample is used for each replication, the design matrix will vary.
Regressions were estimated and tests of hypotheses were computed for each sample. This
was done 10,000 times each for sample sizes of 10,25, 50, 100,250,500,600 and 1,000.

5. Summary: Th e results were summarized across the 10,000 replications for each sample
size trom each population.

Details of our simulations are now given as follows:

· - -_ _ _ _ _ _ _ _ _ _ _ __
------llnrl'j-ufuurexperiments-;-we-tI-ti-.i2ecl-the-fE}~IGwi-n_g_f.HGd€-l.;..,

Yi = I + IX2i + IX3i + Ei i = 1,2 . .. n (3.1.1)

Where characteri stics of the x's and E'S varied to simulate data typically found in cross
sectional research. The independent variab les have a variety of distributions, including
unifonTI , chi -sq uare, etc. suppose we consider a sample of size N and it is given (X r,X2, .
. . ,XN) which remain the same in all replications. We also fixed the values of the
parameters (~,O'\ That is, ~r =~2 = P3 = 1.0 and 0'2=1.0. Therefore, Ei_ N (0, 0' i\ The null
hypothesis under test is Ho: P l=~2 = ~3 = 1.0 and the study was conducted under the null
hypothesis and using normal enors.

Heteroscedasticity is introduced by allowing the variance of the etTors to depend on the


independent variables in three ways conesponding to structures 1 to 3 as shown by table
3.1. Thi s resulted in 3-heteroscedastic en'or structures that represent few degrees and

32
types of heteroscedasticity that might be found in practi ce. There were one hundred
sixty sets of experi ments, in each of which the Ei were chosen differentl y.

Table 3.1: Error Structures Used in the Simulations.

Structure Heteroscedasticity functi on

0 Ei= Ei*

I Ei=X2iEi*

2 Ei= (Xli + 1.5)Ei*

3 Ei= X 2i (Xli + 2.5)Ei*

Note: E* has a z distribution

It is more interesting to ask what proportion of the tim e each of the test stati sti cs exceeds
- - - - ----,."'Ttairn::Titi"c·a+..values:-In-()the l~w()rclS;_we-oom]3aftlEl-tA (}-f.Iom.inal-signi.£icanceJ.e.v"'e...IJt.uDJt=-_ _ __
proportion of times that the correct Ho was rejected over the 10,000 replications at a
given sample size. The critical va lues we chose were the 5% and 1% levels; absolute
critical values for the standard nom1al at these levels are 1.96 and 2.576, respectively.
Since the findings were similar onl y results for the 5% level are presented.

The obvious way to estim ate th ese rej ection frequenci es is to use thi s estim ator

g=!!..., where R is the observed number of rej ections and N is the number
N
ofrepli cations (here 10,000). A consistent estimate of the vari ance of thi s estimator is

g (l .. g)/N (3 .1.2)

Davidson and MacKinnon (1 98 1) have proposed thi s technique for doing so, McKinnon
and White (1985) used it based on jackknife resampling technique, which we utili ze here
too. Fo r detail s, see Davidso n and MacKimlOn (198 1).

33
3.2 Data Structures

The first step was to generate observations for two independent random variables with
the following distributi ons. With a sample size N, Ll.2 is unifonnly distributed between 0
and 1 and Ll.3 is from l distribution with one degree of freedo m or from standard nornlal
distribution .

Long and Ervin (2000), used independent variab les from different distributions directly
from Binomi al, Binary, Unifonn, etc. But here we used some linear combinations of
variables from unifornl distributions and standard nonnal distributions to confi rm the
consistency of the HCCM estimators.

We have also included standardized variab les as a regressor to see the effect of using
such ki nd of variab les as independent variab les.

- - - - -----4 Llat-is r, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __

Ll.2 - Unifonn (0, 1)

(3.2.1)

The Ll.'s were combined to construct the two independent variabl es:

3.3 Data Generation

3.3.1 Simulation: It can be used as a means for extension of data co llection from
empi rical studies. A sim ulation model is developed, based on the data from experiments,
and new data is generated from the simul ation model. It is argued that the data col lection

34
process is the most crucial stage in the model building process. This is primarily due to
the influence that data has in providing accurate simu lation results. Data collection is an
extremely tim e consuming process predominantly because the task is manually
orientated. Hence, automating thi s process of data coll ection would be extremely
advantageous. We, therefore, in this study have used si mul ation as a means of data
co ll ection from empirical results. For each error structure in tab le 3.1 and combination of
types of vari ables, we ran simulations as fo llows : 10,000 observations for the
independent variab les (x's) were co nstructed . Random errors E were generated accord ing
to the error structure being evalu ated . These were used to construct the dependant
variable y accordin g to equation (3. 1.1).

Each expeliment involved 10,000 replications, and there were one hundred sixty
experiments in all (for each of n=10, 25, 50, 100, 250, 500, 600 and 1,000). All results
are based on 10,000 replications. For each of the Pi, we calculated five test statistics of
the hypothesis that Pi equals its value. These statistics, denoted by OLSCM, HCO, HC 1,
HC2, and HC3 , utilize the covari ance matrices after which they are named . For each
ex periment, regressions were estimated and hypothesi s tests were computed for each
samp le at each sample size. The estimates of the Ws, standard en'Or of coefficients, t-
statistics and probabilities using the OLSCM and the four HCCMs were saved. These
si mulations were used to evaluate two situations in which the HCCM might be used.
First, we examine the consequences of using a HCCM based test when errors are
homoscedastic; second, we compare resu lts using OLSCM tests and the HCCM tests
when there is heteroscedasticity. Thi s wi II be seen in the fourth chapter.

3.3.2 Code for Simulation: Beside to this we created a program by writing any
sequence of Matlab 7.0 commands in a text.

Note: In the Appendix A we provide Monte Carlo simulation code in Matl ab 7.0 that
imp lements the OLSCM and the four HCCM methods in the linear regress ion model.

35
CHAPTER FOUR

4. ANALYSIS AND PRESENTATION OF RESULTS

Th is chapter can be divided into three sections. The first section deals about results of
experiments, the second section gives homoscedastic errors and the last section reveals
heteroscedastic errors.

4.1 Results of Experiments

Calcul ating the empirical size of t-tests for the P parameters assesses each method of
computing the covariance matrix. For size, we compare the nominal sign ificance level to
the proportion of times that the correct Ho is rejected over the 10,000 replications at a
given sample size. The true hypothesis is Ho: Pk = Pk', where Pk *is the population value
detel111ined from the regression based on the entire N observations. We test the null
hypothesis Ho with the t statistic, which IS given by

/J - I
t = S(P)

and the sq uared t-statistic, which is asymptoticall y l (1):

, - t' -( :rp)J.whoe, iJ '''h, md',,,y I~", "i"= ,,"m", ~(iJll>'h' ="d


"d

diagonal elem ent of the covariance matrix estimator. The realizations of the stati stics "'
are used to calculate a P-value at the nominal level a=O.OS. Many statistical procedures
work best when applied to variables that follow normal di stribution . The t tests assume
that variables follow a normal di stribution .

36
This assumption may not be critical here in our case since we have checked the nonnality
test before we use the t test. For the sake of conven ience we presented one result of
nonnality test in the APPENDIX B when N=50. In this case the P.value at the 5% level
of significance is 0.932280. This tells us the given variable is from normal distribution.
The safer altemati ve is employing a nonparametric test that does not assume nomlality
which is X2 test. We have checked thi s also and our resu lts still consistent.

The number of rep lications is R= IO,OOO. For ~I' ~2, and ~3, we assess size by testing the
true hypothesis: Ho: ~k = ~k·. For the tables and figures below, we consider Ho: ~k = ~k·.
Size curves for each coeffi cient were also computed and are summari zed where
appropriate. While size was examined at the .05 and .10 nominal levels, the findings were
similar so on ly results for the .05 level are presented here. The tab les and figures speak
by them selves, but we wi ll di scuss a few points of interest.

4.2 Homoscedastic Errors

In deriving ordinary least squares (OLS) estimates, we made the assumptions that the
residu als Ej were identically distributed with mean zero and equal variance ()'2, (i.e.
Var(Ej) = E(Ej 2) = ()'\ This assumpti on of equal variance as we mentioned earlier is
known as homosedasticity (which means equal scatter). The variance ()'2 is a measure of
dispersion of the residuals Ej around their mean zero.

Equi valently, it is a measure of di spersion of the observed value of the dependant


variable (Y) around the regression li ne ~ I+ ~2X2 + .. . ~kXk' Homosced asti city means that
the dispersion is the same across all observations. In the first set the experiment Ej were
NID (0, I), so that the OLS t statistic is appropriate.

The object here is to see how costly it is to use the various heteroskedasticity-consistent
estimators when there is in fact no heteroskedasticity. Table 4.2 presents the proportion
of times that the null hypotheses is rejected usin g tests based on the standard OLSCM
and each type ofHCCM when there is no heteroscedasticity over a range of sample sizes.

37
From table 4.2, it is clear that USIng HCO or HCl when there is in fact no
heteroscedasticity and the sample size is small cou ld easily lead to serious errors of
inference. Their worst performance was for PJ when n= IO. Here in th is case OLSCM did
always perform well when the sampl e size was small and there was no substantial
heteroscedastici ty.

The usual OLSCM t stati stic would reject the null hypothesi s 5% of the time at the
nominal 5% level, HCO would incorrectly reject the null hypothesis 26.4% of the time,
HCl would reject the null hypothesis 20.1% ofthe time, HC2 would reject it 9.5% of the
time and HC3 would reject it 4% of the time. Hence using HC3 is almost as reliable as
using OLSCM.

38
Tabl e 4.2: The proporti on of times that th e null hypothesis is rej ected usin g tests based
on the standard OLSCM and each type ofHCCM when there is no heteroscedasticity.

C ae!!. Estimator 10 25 50 100 250 500 600 1000

131 OLSCM 0.048 0.053 0.049 0.052 0 .052 0.050 0.050 0.052

HCO 0. 119 0.096 0 .069 0.060 0.052 0 .056 0.055 0.053

HCI 0.076 0.077 0 .062 0 .057 0.051 0 .053 0.052 0 .048

HC2 0.072 0.063 0.057 0.053 0.051 0 .05 1 0.050 0.053

HC3 0.030 0.047 0.049 0.048 0.050 0.049 0.047 0 .049

132 OLS CM 0.047 0.051 0.047 0.051 0.052 0.051 0.054 0.051

HCO 0.151 0.094 0 .073 0.06 1 0.052 0.05 1 0.046 0.049

HC I 0.101 0.078 0.064 0.057 0.05 1 0 .049 0.051 0 .051

HC2 0.080 0 .066 0 .056 0.050 0.054 0 .052 0.050 0 .052

HC3 0.034 0 .047 0 .047 0.047 0.053 0.05 1 0.052 0.050

1)3 OLSCM 0.050 0.051 0.048 0 .048 0.050 0.050 0.05 1 0.050

HCO 0.264 0.084 0.074 0.062 0.053 0.052 0.048 0 .052

HCI 0.20 1 0 .067 0 .067 0.058 0.051 0.052 0.048 0 .049

HC2 0.Q95 0.066 0 .055 0.049 0.054 0.05 1 0.052 0 .052

HC3 0.040 0.046 0.045 0 .044 0.052 0.050 0.050 0.048

39
Figure 4.2 uses results from a population with hOll1oscedastic normal errors to illustrate
our findings. The horizontal axis indi cates the size of the sample used in the simulation ;
the vertical axis indicates the proportion of times that Ho was rejected out of 10,000
replications .

Figure 4.2a: Plots of 81

0 .12 - - OLSCM=@
- - H CO=O
0 . 10 HC1=1
'C
--HC2=2
oS
"'" 0 .08
HC3=3

-"'"
'iii'
~

c 0 .06
~
• •
a. '" 0 .04
3
0.02

10 25 50 100 250 500 600 1000


Sample size

Figure 4 .2b: Plots of 82

0 .16 - - OLSCM=@

\
- - HCO=O
0 .14 HC1=1
'C --HC2=2
0 .12
oS HC3=3
"
Q)

-
'iii' 0 .10

~ .-~~
~

c 0 .08
Q)

"
~
Q) 0.06
a.
0.04 3
3
0 .02

10 25 50 100 250 500 600 1000

Sample size

40
Figure 4.2c: Plot of 83

0 .27 - - OLSCM=@
- - HCO=O
0.24
HC1=1

-'"'"
"C 0.21 - - HC2=2
HC3=3
u 0.18
.iii'

-'"
u
~
~

c::
0.15

0 .12

'"
0.. 0.09

0.06

0.03 3

10 25 50 100 250 500 600 1000

Sample size

Figure 4.2: Plots the size distortion that is estimated null rejection freq uencies at the
nominal 5% level, against different samp le sizes.

I. Since the elTors are homoscedastic, OLS CM tests have the best size properties.

2. For N<50, the size properties of HC3 tests are nearly as good as those for OLSCM tests,
whi Ie tests based on HCO and HC I have large size distortion.

3. When N2:250, there is very little distortion introduced by using any of the HCCM- based
tests as well as OLSCM based test when there is no heteroscedasticity.

41
4.3 Heteroscedastic Errors

Heteroscedasticity is a viol ation of our assumpti ons about the error telm, whi ch has
adverse impli cations for least squares estimation. But we do not know the errors, but
prox y them with the residuals . The res iduals are a function of our model specifications.
Heteroscedasticity of the error term implies that we cannot use the estimates of the
stand ard elTors of the regression coefficients computed on the basis of the standard
fonmtlae deri ved fi'om the classical regression model.

Furthermore, the least squares estimators are in effic ient, but unbi ased . But this prediction
is not the most reliabl e (since it is in effic ient) among all linear predictions and we cmmot
make any statement about the uncertainty (confidence interval, hypothes is tests) of the
pred ictions based on the standard errors computed accordi ng to the formul ae under the
assumpti on of homoscedasticity.

- - - - - - - - 'Why--110t- mocl·i-fy-the-feIDluJ.a aGGer-diJl.g~¥-to-a!.lo_~OL·.hel.ellllS.Cmas.lJ~


c~errrrQ
or!]s'1
? _In
I1L_ _ __

principle, this is possible. Nevertheless, to do so we need to be able to state the nature of


the heteroscedasticity of the error term . This is exactly what we try to do. Let us now
consider each error structures used in the simulations.

4.3.1 Heteroscedasticity function: Ei = X2iEi*

This expression shows that the error is heteroscedastic and is related to the value of the

regressor X2i. In thi s case, the error variance is propol1ional to the square of the value of

X2i This en'o r structure has milder form s of heteroscedasticity. Based on 10,000

replications, we obtain the rejection freq uenci es o f the three regression coeffi cients using
OLSCM and the four HCCM estimators. We quote these results by tab le 4.3.1

42
Tabl e 4.3 .1: The rejection frequencies of the three regression coeffi cients using OLSCM
and the four HCCM estimators.

Coeff. Estimator 10 25 50 100 250 500 600 1000

~l OLSCM 0.106 0.056 0.030 0 .030 0 .037 0.042 0.040 0.035

HC O 0.119 0.087 0.070 0.056 0.057 0.058 0.054 0 .053

HCI 0.070 0.067 0 .062 0.051 0.055 0.054 0.054 0.052

HC2 0.057 0.056 0.052 0.052 0.053 0.054 0.052 0.051

HC3 0.024 0.042 0.042 0 .046 0.051 0 .049 0.049 0.050

~2 OLSCM 0.1 17 0 .055 0.029 0.030 0 .051 0 .053 0.049 0 .042

HC O 0.111 0.086 0.074 0 .055 0.055 0 .054 0.054 0 .053

HCl 0.067 0.069 0.067 0.051 0.053 0 .055 0.053 0 .052

HC2 0.074 0.059 0.050 U.U:J4 U .U:J:L 0.0-52 u.uo3- -~052

HC3 0.032 0.042 0.041 0.048 0.052 0.050 0.05 1 0.049

~3 OLSCM 0.138 0.043 0.036 0.027 0.080 0 .081 0.078 0.066

HCO 0.097 0 .077 0.087 0.058 0.055 0.056 0.056 0.059

HCl 0.051 0.06 1 0.079 0.055 0 .053 0.054 0.053 0.055

HC2 0.120 0.072 0 .053 0 .056 0 .053 0.053 0.051 0.052

HC3 0.052 0 .051 0.043 0 .051 0.049 0.05 1 0.052 0.050

43
Figure 4.3. 1 Plots the sIze properties of each test when the elTors have a nomlal
distribution for heteroscedasticity fun ction that is given above.

Figure 4.3.1a : Plot of 81

0.12 - - O LSC M=@


- - HCO=O

\:
'tI 0.10 HC1=1
....'CJ" - - HC2=2
.i- 0.08 HC3=3

....c
~

0.06
'"
CJ
~

'"
0.. 0.04
3
0.02

10 25 50 100 250 500 600 1000


Sample size

Figure 4.3 .1 b: Plot of B2

0. 12 - - OLSCM=@
- - HCO=O
'tI 0.10 HC1 =1
.sCJ - - HC2=2
Q)
.~ 0.08 HC3= 3
Q)

....c
~

Q)
0.06
CJ
~
Q)
3
=~
0.. 0.04 3

0.02

10 25 50 100 250 500 600 1000

Sample size

44
Figure 4.3.1 c: Plot of 83

0. 14 - - OLSCM=@
- - HCO=O
0.12

-'"
"
CIl
0.10
HC1=1
- - HC2=2

..
CIl
HC3=3

-..'"
'Qj'
0.08
c
CIl 0. 06
CIl 3
[l. 0.04 3

0.02

10 25 50 100 250 500 600 1000

Sample size

Figure 4.3.1: P lots of si ze of t- test of ~l . ~ 2 and ~3 for normal errors with


heteroscedasticity function: C; = X2;c;*

The key findin gs in these fi gures are:

1. For N::;50, tests based on all tests seem to have large size distortion. But HC2
seems relati vel y good.

2. Tests based on OLSCM do have large size di stoliion. Thi s size di stortion does not
vani sh even for large samp le size.

3. When N2: IOO , there is very littl e distortion introduced by using any of the HCCM-
based tests when there is heteroscedasticity

4.3.2 Heteroscedasticity function: ej = (X3j + l.S)et

The assumption about the nature of heteroscedasticity in thi s case is: C; = (X l; + 1.5)c;*
The specification that we make is that the error is related to the explanatory variab le and
some constant. Thi s error structure has a moderate amount of hetero scedasticity. Table
4.3.2 presents the empirical level of the t statistics based on the standard OLSCM and the
four HCCM estimators. We quote these resu lts as shown below.

45
Table 4.3 .2 : The empirical level of the t statistics based on the standard OLSCM and the
four HCCM estimators.

Coeff. Estimator 10 25 50 100 250 500 600 1000

131 OLSCM 0.021 0.054 0.037 0.048 0.047 0.040 0.041 0.047

HCO 0.132 0 .097 0.064 0.058 0 .054 0.054 0.053 0.049

HC l 0.084 0 .080 0.057 0.055 0.052 0.048 0.048 0.052

HC2 0.079 0.059 0.057 0 .051 0 .051 0 .049 0.051 0.053

HC3 0.037 0.043 0.048 0 .047 0.048 0 .049 0.049 0.051

132 OLSCM 0.0 18 0.054 0.039 0.052 0.050 0.043 0.043 0.050

HCO 0. 139 0.094 0.065 0.061 0.056 0 .054 0.054 0.053

HCl 0.087 0.076 0.057 0.058 0.054 0 .050 0.048 0.052

HC2 0.079 0.057 0.057 0.051 0.052 0.053 0.052 0 .049

HC3 0.038 0.041 0.048 0.046 0.049 0 .052 0.050 0.051

133 OLSCM 0.0 14 0 .057 0.045 0 .060 0.062 0 .051 0.049 0.051

HCO 0.143 0 .088 0.068 0.060 0.058 0 .056 0.052 0.053

HCl 0.092 0.069 0.061 0 .057 0.057 0.054 0.054 0.053

HC2 0.089 0.063 0.056 0 .055 0 .049 0.052 0.052 0.050

HC3 0.040 0 .044 0.047 0.049 0.047 0 .049 0.049 0.051

46
Figure 4.3.2 plots the size properties of each test when the errors have a nonnal
distribution with heteroscedasticity function that is given above.

Figure 4.3.2a: Plot of B1

0.14 o - - OLSCM=@
- - HCO=O
0. 12
'0 HC1=1
....u
Q)

0.10
--HC2=2
Q)
HC3=3
'(j)
....c:
~
0.08
Q)
u 0.06
~
Q)
0.. 0.04

0.02

10 25 50 100 250 500 600 1000

Sample size

Figure 4.3.2b: Plot of B2

0.14 - - OLSCM=@
8 - - HCO=O
0.12 HC1=1
'0 --HC2=2
0.10
....u
Q)
HC3=3
Q)
'(j) 0.08
....c:
~

0.06
Q)
u
~
Q) 0.04
0..
0.02

0.00

10 25 50 100 250 500 600 1000


Sample size

47
Figure 4.3.2c: Plots of 83

0.15 - - OLSCM=@
- - HCO=O
HC1=1
0.12
-0 - - HC2=2
J!l
u
HC3=3
Q) 0.09
.~
...c
Q) 0.06
~
Q)
c..
0.03

0.00

10 25 50 100 250 500 600 1000

Sample size

Figure 4.3 .2: Plots of sIze of t- test of Pl . P2 and P3 for normal errors with
heteroscedasticity function: Ei = (X3i + 1.5)£i*

The key findi ngs in these figures are:

1. For N= I 0, all tests do have size distOIiions . But HC3 seems relatively better.

2. For N=50, tests based on RCI, HC2 and HC3 perform well while those based
on OLSCM and RCa indicate relatively large size distortion.

3. For N::s600, tests based on OLSCM do not have a clear pattern about size
distOIiion.

4. For N= IOOO, rejection frequencies based on all tests converge to the nom inal
5% level of significance.

48
4.3.3 Heteroscedasticity function: 1':; = Xl;(X3; + 2.5)1':;*

This is more extreme forms ofheteroscedasticity. Table 4.3.3 presents the proportion of
times that the null hypothesis is rejected using tests based on the standard OLSCM and
each type ofHCCM.

Table 4.3.3: The proportion oftimes that the null hypothesis is rejected using tests based
on the standard OLSCM and each type of HCCM.

Coell. Estimator 10 25 50 100 250 500 600 1000

~l OLSCM 0.032 0.027 0.054 0.053 0.041 0.039 0.041 0.044

HCO 0.149 0.077 0 .068 0.061 0.052 0.047 0.048 0 .046

HC l 0.096 0 .058 0 .060 0.057 0.051 0 .048 0.052 0 .053

HC2 0.D75 0.050 0.056 0.047 0.050 0 .053 0.052 0.051

Fla 0:039- f-(J.(J3;5 f-(J:MB (J.(J1I"3 (J.(J<t8- e:M9 0:049 -(J:(JS(J

~2 OLSCM 0.029 0.035 0.059 0.060 0.048 0.047 0.048 0.050

HCO 0.40 0.086 0.070 0 .062 0.053 0.049 0.046 0 .052

HCl 0.091 0.068 0.063 0 .059 0.052 0.053 0.052 0.053

HC2 0.072 0.050 0.057 0.049 0.049 0.052 0.052 0 .051

HC3 0.035 0.035 0.048 0.043 0 .047 0.049 0.048 0.049

~3 OLSCM 0.011 0.046 0.055 0.061 0 .051 0.055 0.054 0.052

HCO 0.118 0.100 0.067 0 .071 0.055 0.054 0.055 0 .052

HC1 0.073 0.083 0.059 0 .068 0 .053 0.052 0.053 0 .051

HC2 0.048 0.060 0.052 0.049 0.049 0.051 0.051 0.049

HC3 0.018 0.044 0.043 0.043 0.048 0.050 0.049 0.051

49
When there was an extreme form of heteroscedasticity and the sample size was small,
even HC3 did not always perform well. As it can be seen from table 4.3.3 for PJ when
n=IO; tests based on the OLSCM estimator would reject the null hypothesis 1.1% of the
time at the nominal 5% level, HCO would reject the null hypothesis 11 .8% of the time,
HC 1 would reject it 7.3% of the time, HC2 would reject the null 4.8% of the time where
as HC3 would reject it 1.8% of the time

Figure 4.3.3 plots the size properties of each test statistic when the errors have a normal
distribution with the heteroscedasticity function: Ei = X2i(XJi + 2.5)Ei*. This error structure
(error structure 3 in table 3.1) has extreme forms of heteroscedasticity.

Figure 4.3.3a: Plot of 81

0.15 -OLSCM=@
- HCO=O
'tJ 0.125 HC1=1
.!U -HC2=2
HC3=3
.S!. 1

..
0.10
e (

2
; 0.075
I:!
~ 0.05

0.025
:
~
/
LT~-.---r---.--'---.---'---~
10 25 50 100 250 500 600 1000

Sample size

50
Figure 4.3.3b: Plot of B2

0.40 - - OLSCM=@
- - HCO=O

...'"
"C

CJ
0.30
He1 = 1
- - HC2=2

...'"
'Qj' HC3=3
~
0.20
I:

'"
CJ
~
0.10
Q.'"
0.00
t
3
--@
10 25 50 100 250 500 600 1000
Sample size

Figure 4.3.3c: Plot of 83

0.12 - OLSCM=@
d - -..::"",.,--- - - - - - - - - - - - - ---1_ _-
~ - ,HCO"OI_ _ _ _ _ _ _ _ __
"C o:v
...'"
CJ 0.08
HC1=1
-HC2=2
II>
'Qj' HC3=3
......C 0.06

II> 0.04 /"

"...II> 3
Il. 0.02

0.00
Y
@

10 25 50 100 250 500 600 1000

Sample size

Figure 4.3.3 : Plots of sIze of t- test of Pl. P2 and P3 for nonnal errors with
heteroscedasticity function: Ci = X2i ( X3i + 2 .S)ci*

The findin gs in these fi gures are for the given error structure. The key findings
are:

51
I. For N=I 0, tests based on all estimators show a large size distoliioI1.

2. For N=25, HC2 test out performs all others.

3. For N:;:250, all HCCM tests perfonn well for all coefficients.

4. For N= IOOO, all rejection frequencies are close to the nominal 5% level of
significance. Hence the results from all types of tests are indi stinguishab le.

5. Heteroscedasticity does not affect tests of each coefficient to the same


degree.

6. This experiment shows that, for N<50 HCO can be even more misleading
than the conventional OLSCM which Ignores the possibility of
heteroscedasticity.

52
CHAPTER FIVE

5. SUMMARY, CONCLUSION AND RECOMMENDATION

This chapter can be divided into two sections. The first section deals about concl usions
and recommendations, and the second section mentions possib le areas of further research.

5.1 Conclusions and Recommendations

Cross sectional data di sp lay some form of heteroskedasticity. It is common practice to


still use the OLSCM estimator of the vector of regression parameters, since it remains
unbiased and consistent.

Its covariance matrix, however, has to be consistently estimated in order for inference to
be perfoml ed. F rom the preceding disc ussion it is clear that heteroscedasticity is
- - - - - - 'Fl0tentiall¥-3-Secious-prohlem and th.eJes now whether it is resent in a
given situation.

If its presence is detected, then one can take corrective action, such as using HCCM
tests . In this study, we have exp lored the asymptotically justified versions of the HCCM
and the standard OLSCM tests in the linear regress ion model.

Whi le no Monte Carlo can represent all poss ible en·or structures that can be enco untered
in practi ce, the consistency of our results across the four en·or structures adds credence to
our suggestions fo r using HCCM and OLS CM based tests in the linear regression model.
Our results lead us to the foll owing co nclusions.

1. If there is a priori reason to suspect that there is heteroscedasticity from a matter of


intui tion, educated guess work, prior emp irical evidence or sheer speculation, HCCM
based tests should be used.

53
2. Since the cost of using HC3, instead of OLSCM when heteroskedasticity is absent, is
apparently not very great (see table 4.2), it would seem wise to employ t statistic based on
HC2 and HC3 even when there is little evidence of heteroskedasticity.

3. For homoscedastic data, all the HCCM tests are less reliable than OLSCM test. But HC3
test performs as good as OLSCM test.

4. We have examined the perfomlance of three modified versions of White 's (1980) HCCM
estimator. In the presence of heteroscedasticity, among the HCCM tests, HC2 and HC3
consistently out perfom1 the other HCCM estimators generally.

5. In the presence ofhetero scedasticity and for N~250, HC2 and HC3 tests should be used ;
when N:;::250, other versions of the HCCM can be used.

5.2 Further Research

During the course of thi s study, I as well as my advisor came up with many ideas. Only a
-------t'P.w-of-these-ideas-were-imptemented-becanse-the-rest-were-beyoncHh e-seope-of-t-hi~---­

study. These ideas are introduced here as possible areas of further research into the topic
ofHCCM estimators.

As we have tried to show in the analysis part of the experiments, HC3 did not always
perfom1 well when the samp le size was small and there was substantial
heteroscedasti city. Hence, it may be interesting to see how to improve the finite samp le
properties of this estimator.

54
· REFERENCES:

[I]. Andrew F. Hayes (2003), "Heteroscedasticity Consistent Standard Error Estimates


for the Linear Regression Model: SPSS and SAS Impl ementation," The Ohio State
University, Co lumbus, Ohio.

[2]. Chanadan Mukherj ee, Howard White and Marc Wuyts "Econometrics and Analysis
for Developing Countries," London and New York.

[3]. C.R.Rao (1970), "Estimation of Heteroscedastic Variances in Linear Regression


Models," Journal of the Am erican Statistical Association. 46, 23 4-239

[4]. Davidson and Mackil1l1on (1993), "Estimation and Inference in Econometrics,"


Ox ford University Press.

[5]. Francisco Cribari- Neto (2000), "Improved Heteroscedasticity-Consistent


Covanance Matrix Esnmators,"131ometn Ka,"PnntedmGrean3TItam.21o,2lJ-=2311

[6] . Eicker, F. (1963), Asymptotic N0!11lality and Consistency of the Least Squares
Estimators for Fami li es of Linear Regressions," The Annals of Mathematical Statistics
34, 447-456.

[7]. Greene, W.H. (1997), "Econometric Analysis (5th Ed.),"Upper Saddle River, NJ :
Prentice Hall.

[8]. Gujarati, D.N. (1995), "Basic Econometrics (4th Ed.)," New York: McGraw-Hill.

[9]. Gujarati, D.N. (2004) , "Basic Econometrics (4th Ed.)," New York: McGraw-Hill.

[10]. H.Glejser, (1969), "A New Test for Homoscedasticity," Journal of the American
Statistical Association. 66, 416-423

[11]. Hom, S.D. , R.A. Hom, and D.B. Duncan. (1975)," Estimating Heteroscedastic
Variances in Linear Mode l, "Journal of the American Statistical Association, 70,380-385.

55
[12]. Jack Jolmston (1983)," Econometric YIethods. " University of California, Irvi ne.

[13]. Jan Kmenta (1971), "Elements of Econometrics," New York: McGraw-Hill.

[14]. Kauermann, G. and Carroll, R. (2001), A Note on the Efficiency of Sandwich Co-
variance Matrix Estimation," Journal of the American Statistical Association, 96, 1387-
1396

[15]. Long, J.S and Ervin, L. (2000), "Using Heteroscedasticity Consistent Standard
Errors in the Linear Regression Model," The American Statistician, 54, 217 -224

[16]. Michel Hurd (1979), "Estimation in Truncated Samp le When There IS


Heteroscedasticity," State University of New York

[17]. Natali e Jolmson (2007)," A Comparative Simulation Study of Robust Estimators of


Standard Enors," Brigham Young University.

[18]. Peter Schmidt (1 976),"Econometrics," New York: McGraw-HIll.

[19]. Ramu Ramanthan (1986), "Introductory Econometrics with Applications,"


University of California, San Diego

[20]. White, H. (1980), "A Heteroscedastic Consistent Covariance Matrix Estimator and
a Direct Test for Heteroscedsticity" Ecnometrica, 48,817-838

[21]. White, H. and MacKilmon, J. G. (1985), "Some Heteroskedasticity Consistent


Covariance Matrix Estimators with Improved Finite Sample Properties," Joumal of
Econometrics.29, 53-57

56
APPENDIX A: Monte Carlo Simulation Code in Matlab

A.1 OLSCM Estimator Code for Simulation in Matlab

function [SS, rejfreq] = hetrobustestimatorOLSCM;

% [SS, rejfreq] = hetrobustestimatorOLSCM;

% Th is program calculates the rejection frequencies (of incorrectly rejecting

% the null hypothesis) when OLSCM is used.

T = 1000;

gen1 = rand(T,l);

gen2 = randn(T,l);

x1 = 1+gen1 ;

x2 = 2* gen1 + 0.3*gen2;

SS = [];

for i = 1 :10000

%err = randn(T,l); %(homoscedastic errors)

%errnor = randn(T,l); % Heteroscedastic errors (type 1)

%err = diag(errnor)*x2;

%errnor = randn(T,l ); % Heteroscedastic errors (type 2)

%for i = l:T

% err(i) = (x2(i) + l.S)*errnor(i);

%end

%err = err';

57
errnor = randn(T,l); % Heteroseedastic errors (type 3)

for i = l :T

err(i) = xl (i)*(x2(i) + 2.5)*errnor(i);

end

err = err';

x = [ones(T,l) xl x2];

y = ones(T,l) + xl + x2 + err;

teta = inv(X'*X)*X'*y;

fit = X*teta;

res = y - fit;

RSS = res'*res ;

df = length(y)-3;

var = RSS/df;

em = var*iuv(X'*X);

se = sqrt(diag(em));

for j = 1:length(X(l,:))

sig(j) = (teta(j) - l)/se(j);

pval(j) = 2*(1 - tcdf(abs(sig(j)),df));

critval(j) = tinv(O.975,df);

if abs(sig(j)) > critval(j)

H(j)=l;

else

58
HU) = 0;

end

end

SSp = [H(I) H(2) H(3)];

SS = [SS;SSp];

clear SSp err y teta fit res RSS var cm se sig pval critval H errnor

end

rejfreql = sum(SS(:,I))/lengtb(SS(:,I));

rejfreq2 = sum(SS( :,2))/length(SS( :,1 ));

rejfreq3 = sum(SS(:,3))/length(SS(:,I));

rejfreq = [rejfreql rejfreq2 rejfreq3];

disp('***************************************************************** ');

fprintf('rejection frequency for T = %10Iln',T);

disp(' ');

disp(' for OLSCM test ');

disp(' ');

disp(' beta 1 beta 2 beta 3 ');

disp(' --------------------');

fprintf('% 10.3f % 10.3f % 10.3I1n' ,rejfreq l,rejfreq2,rejfreq3) ;

A.2 HCO and HC! Estimator Code for Simulation in Matlab

function [SS,SSI, rejfreqHCO,rejfreqHC1] = betrobustestimatorHC01;

% [SS,SSI , rejfreqHCO,rejfreqHC1] = betrobustestimatorHC01;

59
% Th is program calculates the rejection frequencies (of incorrectly rejecting

% the null hypothesis) when HCO and HC1 are used .

T = 10;

gen1 = rand(T,l);

gen2 = randn(T,l);

xl = l+gen1;

x2 = 2*gen1 + 0.3*gen2;

SS = [];

SSl = [];

for i = 1:10000

%err = randn(T,l); %(homoscedastic errors)

%errnor = randn(T,l); % Heteroscedastic errors (type 1)

%err = diag(errnor)*x2;

errnor = randn(T,l); % Heteroscedastic errors (type 2)

for i = l:T

err(i) = (x2(i) + 1.5)*errnor(i);

end

err = err';

%errnor = randn(T,l); % Heteroscedastic errors (type 3)

%for i = l:T

% err(i) = xl (i)*(x2(i) + 2.5)*errnor(i);

%end

60
% err = err' ;

x= [ones(T,I) xl x2);

y = ones(T,I) + xl + x2 + err;

teta = inv(X'*X)*X'*y;

fit = X*teta;

res = y - fit;

df = length(y)-3;

erertran = res*res';

omega = diag(diag(erertran)) ;

em = inv(X'*X)*X'*omega*X*inv(X'*X);

emI = (T*em)/(T-3);

se = sqrt(diag(em));

sel = sqrt(diag(emI));

for j = I :length(X(I,:))

sigU) = (tetaU) - I)/seU);

pvalU) = 2*(1- tedf(abs(sig(j)),df));

eritvalU) = tinv(0.975,df);

if abs(sigU)) > eritval(j)

H(j) = 1;

else

HU) = 0;

end

61
end

SSp = [H(I) H(2) H(3)] ;

SS = [SS;SSp];

for j = 1 :lengtb(X(I,:))

sigl(j) = (teta(j) - 1)/sel(j);

pval1(j) = 2*(1 - tedf(abs(sigl(j)),dl));

eritval1(j) = tinv(0.975,df);

if abs(sigl(j)) > eritval1(j)

G(j) = 1;

else

G(j) = 0;

en d

end

SSp 1 = [G(l) G(2) G(3)] ;

SSI = [SSI ;SSpl] ;

clear SSp SSpl err errnor y teta fit res erertran omega em eml se sel sig sigl

clear pval pvall eritval eritvall H G

end

rejfreqHCOl = sum(SS(:,I))/length(SS(:,I));

rejfreqHC02 = sum(SS(:,2))/Iength(SS(:,I));

rejfreqHC03 = sum(SS(:,3))/lengtb(SS(:,I));

rejfreqHCO = [rejfreqHCOl rejfreqHC02 rejfreqHC03];

62
rejfreqHCll = sum(SSI(:,I»/Jen gth(SSI(:,I»;

rejfreqHC12 = sum(SSI(:,2»/Iength(SSI(:,I»;

rejfreqHC13 = sum(SSI(:,3»/Jength(SSI(:,I»;

rejfreqHCl = [rejfreqHCll rejfreqHC12 rejfreqHC13);

disp(' ***************************************************************** ');

fprintf(,rejection frequency for T = %10I\n',T);

disp(' ');

disp(' for HCO test ');

disp(' ');

disp(' beta 1 beta 2 beta 3 ');

disp(' - - -- - - - -- - - - - - ______ ');


fprintf(,%10.3f %10.3f %1 0.31\n I,rejfreqHC01,rejfreqHC02,rejfreqHC03);

disp(' - - - - - -- - - - - - - - ______');
disp('*****************************************************************');

disp(' for HCl test ');

disp(' ');

disp(' beta 1 beta 2 beta 3 ' );

disp('
- - - -- - - - -- - - - - - -');
fp rintf(' % 10.3f %10.3f % 10.31\n 1,rejfreqHC11,rejfreqHCI2,rejfreqH CI3);

A.5 HC2 and HC3 Estimator Code for Simulation in MatIab

function [SS,SSI, rejfreqHC2,rejfreqHC3 j = hetrobustestimatorHC23;

% [SS,SSl, rejfreqHC2,rejfreqHC3] = hetrobustestimatorHC23;

63
% This program calculates the rejection frequencies (of incorrectly rejecting

% the null hypothesis) when HC2 and HC3 are used.

T = 100;

genl = l'and(T,I);

gen2 = randn(T,I);

xl = l+genl;

x2 = 2*genl + 0.3*gen2;

x = [ones(T,I) xl x2];

AA = ones(T,I);

M = diag(AA) - X*inv(X'*X)*X';

MM = diag(diag(M»;

MMI =MM*MM;

KK = inv(MM);

KKI = inv(MMl);

SS = [];

SSI = [];

for i = 1:10000

%err = randn(T,I); %(homoscedastic errors)

errnor = randn(T,I); % Heteroscedastic errors (type 1)

err = diag(errnor)*x2;

%errnor = randn(T,I); % Heteroscedastic errors (type 2)

%for i = I:T

64

I
I
I
J
% err(i) = (x2(i) + l.S)*errnor(i);

%end

%err = err';

%errnor = randn(T,I); % Heteroscedastic errors (type 3)

%for i = I:T

% err(i) = xl (i)*(x2(i) + 2.S)*errnor(i);

°/oend

O/oerr = err';

y = ones(T,I) + xl + x2 + err;

teta = inv(X'*X)*X'*y;

fit = X*teta;

res = y - fit ;

df = length(y)-3;

erertran = res*res' ;

omega = diag(diag(erertran»*KK;

omegal = diag(diag(erertran»*KKI;

em = inv(X'*X)*X'*omega*X*inv(X'*X);

eml = inv(X'*X)*X'*omegal *X*inv(X'*X);

se = sqrt(diag(em»;

sel = sqrt(diag(eml)) ;

for j = I :length(X(l ,:»

sig(j) = (teta(j) - l)/se(j);

65
pvalO) = 2*(1- tcdf(abs(sigO»,dl);

critvalO) = tinv(0.97S,dl);

if abs(sigG» > critvalO)

HO) = 1;

else

HO) = 0;

end

end

SSp = [H(l) H(2) H(3)];

SS = [SS ;SSp);

for j = 1:length(X(l,:»

siglO) - (tetaO) - 1)/selO);

pvallO) = 2*(1- tcdf(abs(siglO»,dl);

critvall 0) = tinv(0.97S,dl);

if abs(siglO» > critvallO)

GO) = 1;

else

GO) = 0;

end

end

SSp1 = [G(I) G(2) G(3)];

SSI = [SSl;SSpl);

66
clear SSp SSpl err erroor y teta fit res erertran omega omega l

clear em eml se sel sig sigl pval pvall eritval eritvall H G

end

rejfreqHC21 = sum(SS(:,I» /length(SS(:,l»;

rejfreqHC22 = sum(SS(:,2»/Iength(SS(:,I»;

rejfreqHC23 = sum(SS(:,3»/length(SS(:,1»;

rejfreqHC2 = [rejfreqHC21 rejfreqHC22 rejfreqHC23);

rejfreqHC31 = sum(SSl(:,l) /length(SSl(:,I»;

rejfreqHC32 = sum(SSl(:,2» /length(SSl(:,I» ;

rejfreqHC33 = sum(SSl(:,3» /length(SSl(:,1» ;

rejfreqHC3 = [rejfreqHC31 rejfreqHC32 rejfreqHC33];

disp('************* ******** ********************************************');

fprintf('rejeetion frequency for T = %10I\n',T);

disp(' ');

disp(' for HC2 test ');

disp(' ');

disp(' beta 1 beta 2 beta 3 ');

disp(' - - - - -- - - - - - - - -- - - - - - ' ) ;

fprintf(,%10.3f % 10.3f %1 0.3I\n' ,rejfreqHC21 ,rejfreqHC22,rejfreqHC23);

disp(' - - - - -- - - - - -- - - - - -');
disp('*****************************************************************');

disp(' for He3 test ');

67

I
I
I

J
disp(' - - - - - - - - - - - - - - - - - - -');

disp(' beta 1 beta 2 beta 3');

disp('
- - - - - - - - - - - - - - - -- - -');
fpriotf('% 10.3f %J 0.3f %10.31\0' ,rejfreqHC31,rejfreqHC32,rejfreqHC33);

APENDIX B. Plot of Y

1 2~ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _--.
Series: Y
Sarrple 150
otservati OIlS 50

fV'ean 26440071
rv'edian
rv'aximm
Mnirrum

Skewness O.
Kurtosis 27P.Q,:\?11

Jarque-8era 0.1402241

-4 -2 o 2 4 6 8 •
10
Probability O.

68
Declaration

I, the undersigned, declare that this thesis is my original work, has not been presented for
degrees in any other university and all sources of material used for the thesis have been
du ly acknowledged.

Declared By:

Name: Yegnanew Alem

Signature: ---~Q_-------

Place: Faculty of Science, Addis Ababa University

Date: June, 2008

This thesis has been submitted for examination with my approval as a uni versity advisor.

Name: Olusanya E. Olubusoye (Ph.D)

Signature: _®:_~~?:n~ \

Date: June, 2008

69

You might also like