0% found this document useful (0 votes)
138 views24 pages

Regression Explained SPSS

SPSS

Uploaded by

Shoaib Zaheer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
138 views24 pages

Regression Explained SPSS

SPSS

Uploaded by

Shoaib Zaheer
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

Regression explained in simple terms

A Vijay Gupta Publication


SPSS for Beginners Vijay Gupta 2000. All rights reside with author.

vjbooks.net

egression e!plained
Copyright 2000 Vijay Gupta Published by VJ oo!s "nc#

All rights reserved. "o part of this book #ay be used or reprodu$ed in any for# or by any #eans% or stored in a database or retrieval syste#% without prior written per#ission of the publisher e!$ept in the $ase of brief &uotations e#bodied in reviews% arti$les% and resear$h papers. 'aking $opies of any part of this book for any purpose other than personal use is a violation of (nited States and international $opyright laws. )or infor#ation $onta$t Vijay Gupta at vgupta*000+aol.$o#. ,ou $an rea$h the author at vgupta*000+aol.$o#. -ibrary of .ongress .atalog "o./ Pending 0SB"/ Pending )irst year of printing/ 2000 1ate of this $opy/ April 22% 2000 3his book is sold as is% without warranty of any kind% either e!press or i#plied% respe$ting the $ontents of this book% in$luding but not li#ited to i#plied warranties for the book4s &uality% perfor#an$e% #er$hantability% or fitness for any parti$ular purpose. "either the author% the publisher and its dealers% nor distributors shall be liable to the pur$haser or any other person or entity with respe$t to any liability% loss% or da#age $aused or alleged to be $aused dire$tly or indire$tly by the book. Publisher/ V5Books 0n$. $ditor/ Vijay Gupta Author/ Vijay Gupta

vjbooks.net

About the Author


Vijay Gupta has taught statisti$s and e$ono#etri$s to graduate students at Georgetown (niversity. A Georgetown (niversity graduate with a 'asters degree in e$ono#i$s% he has a vision of #aking the tools of e$ono#etri$s and statisti$s easily a$$essible to professionals and graduate students. 0n addition% he has assisted the 6orld Bank and other organi7ations with statisti$al analysis% design of international invest#ents% $ost8benefit and sensitivity analysis% and training and troubleshooting in several areas. 9e is $urrently working on/ a pa$kage of SPSS S$ripts :'aking the )or#atting of ;utput <asy: a #anual on 6ord a #anual for <!$el a tutorial for <8Views an <!$el add8in :3ools for <nri$hing <!$el4s 1ata Analysis .apa$ity: <!pe$t the# to be available during fall 2000. <arly versions $an be downloaded fro# www.vgupta.$o#.

vjbooks.net

%"&$AR R$GR$''"(&
0nterpretation of regression output is dis$ussed in se$tion **. ;ur approa$h #ight $onfli$t with pra$ti$es you have e#ployed in the past% su$h as always looking at the 8s&uare first. As a result of our vast e!perien$e in using and tea$hing e$ono#etri$s% we are fir# believers in our approa$h. ,ou will find the presentation to be &uite si#ple 8 everything is in one pla$e and displayed in an orderly #anner. 3he a$$eptan$e =as being reliable>true? of regression results hinges on diagnosti$ $he$king for the breakdown of $lassi$al assu#ptions2. 0f there is a breakdown% then the esti#ation is unreliable% and thus the interpretation fro# se$tion * is unreliable. 3he table in se$tion 2 su$$in$tly lists the various possible breakdowns and their i#pli$ations for the reliability of the regression results 2. 6hy is the result not a$$eptable unless the assu#ptions are #et@ 3he reason is that the strong state#ents inferred fro# a regression =i.e. 8 :an in$rease in one unit of the value of variable A $auses an in$rease in the value of variable , by 0.2* units:? depend on the presu#ption that the variables used in a regression% and the residuals fro# the regression% satisfy $ertain statisti$al properties. 3hese are e!pressed in the properties of the distribution of the residuals (that explains why so many of the diagnostic tests shown in sections 3-4 and the corrective methods are based on the use of the residuals). 0f these properties are satisfied% then we $an be $onfident in our interpretation of the results. 3he above state#ents are based on $o#ple! for#al #athe#ati$al proofs. Please $he$k your te!tbook if you are $urious about the for#al foundations of the state#ents. Se$tion 2 provides a brief s$he#a for $he$king for the breakdown of $lassi$al assu#ptions. 3he testing usually involves infor#al =graphi$al? and for#al =distribution8based hypothesis tests like the ) and 3? testing% with the latter involving the running of other regressions and $o#puting of variables.

)# "nterpretation o* regression results

<ven though interpretation pre$edes $he$king for the breakdown of $lassi$al assu#ptions% it is good pra$ti$e to first $he$k for the breakdown of $lassi$al assu#ptions =se$tion B?% then to $orre$t for the breakdowns% and then% finally% to interpret the results of a regression analysis.
2

6e will use the phrase :.lassi$al Assu#ptions: often. .he$k your te!tbook for details about these assu#ptions. 0n si#ple ter#s% regression is a statisti$al #ethod. 3he fa$t that this generi$ #ethod $an be used for so #any different types of #odels and in so #any different fields of study hinges on one area of $o##onality 8 the #odel rests on the bedro$k of the solid foundations of well8established and proven statisti$al properties>theore#s. 0f the spe$ifi$ regression #odel is in $on$ordan$e with the $ertain assu#ptions re&uired for the use of these properties>theore#s% then the generi$ regression results $an be inferred. 3he $lassi$al assu#ptions $onstitute these re&uire#ents.
2

0f you find any breakdown=s? of the $lassi$al assu#ptions% then you #ust $orre$t for it by taking appropriate #easures. .hapter C looks into these #easures. After running the :$orre$ted: #odel% you again #ust perfor# the full range of diagnosti$ $he$ks for the breakdown of $lassi$al assu#ptions. 3his pro$ess will $ontinue until you no longer have a serious breakdown proble#% or the li#itations of data $o#pel you to stop.

Vjbooks.net

Assu#e you want to run a regression of wage on age% work experience% education% gender, and a du##y for sector of employment =whether e#ployed in the publi$ se$tor?. wage D fun$tion=age% work experience% education% gender% sector? or% as your te!tbook will have it% wage D * E 2Fage E 2Fwork experience E BFeducation Always look at the #odel fit =IA";VAJ? first. 1o not #ake the #istake of looking at the 8s&uare before $he$king the goodness of fit. Signifi$an$e of the #odel (!"id the model explain the deviations in the dependent variable#)
3he last $olu#n shows the goodness of fit of the #odel. 3he lower this nu#ber% the better the fit. 3ypi$ally% if ISigJ is greater than 0.0G% we $on$lude that our #odel $ould not fit the data+.

GFgender A&(VAa

HFsector

,odel * egression esidual 3otal

'um o* '-uares $4$%4.3& $((&$.4) %'*)'&.&

d* $ %&)+ %&&(

,ean '-uare %'&'(.)) (*.3%&

. 4%4.(*(

'ig# .''' b

a. 1ependent Variable/ 6AG< b. 0ndependent Variables/ =.onstant?% 6; KL<A% <1(.A30;"% G<"1< % P(BLS<.% AG<

0f Sig M .0*% then the #odel is signifi$ant at NNO% if Sig M .0G% then the #odel is signifi$ant at NGO% and if Sig M.*% the #odel is signifi$ant at N0O. Signifi$an$e i#plies that we $an a$$ept the #odel. 0f SigP.%* then the #odel was not signifi$ant =a relationship $ould not be found? or : 8s&uare is not signifi$antly different fro# 7ero.:

Vjbooks.net

3he ) is $o#paring the two #odels below/ )# wage / ) 0 21age 0 21work experience 0 +1education + 31gender + 41sector 2. wage D * =0n for#al ter#s% the ) is testiong the hypothesis/ * D 2 D 2 D B, G, HD0
0f the ) is not signifi$ant% then we $annot say that #odel * is any better than #odel 2. 3he i#pli$ation is obvious88 the use of the independent variables has not assisted in predi$ting the dependent variable.

'um o* s-uares 3he 3SS =3otal Su# of S&uares? is the total deviations in the dependent variable. 5he aim o* the regression is to explain these de6iations =by finding the best betas that $an #ini#i7e the su# of the s&uares of these deviations?. 3he <SS =<!plained Su# of S&uares? is the a#ount of the 3SS that $ould be e!plained by the #odel. 3he 8s&uare% shown in the ne!t table% is the ratio <SS>3SS. 0t $aptures the per$ent of deviation fro# the #ean in the dependent variable that $ould be e!plained by the #odel. 3he SS is the a#ount that $ould not be e!plained =3SS #inus <SS?.

0n the previous table% the $olu#n :Su# of S&uares: holds the values for 3SS% <SS% and SS. 3he row :3otal: is 3SS =*0HC0N.N in the e!a#ple?% the row : egression: is <SS =GBG*B.2N in the e!a#ple?% and the row : esidual: $ontains the SS =G22NG.BC in the e!a#ple?.

Vjbooks.net

Adjusted R7s-uare 'easures the proportion of the 6ariance in the dependent variable =wage? that was e!plained by variations in the independent variables. 0n this e!a#ple% the IAdjusted 8 S&uareJ shows that G0.NO of the varian$e was e!plained. R7s-uare 'easures the proportion of the 6ariation in the dependent variable =wage? that was e!plained by variations in the independent variables. 0n this e!a#ple% the : 8S&uare:4 tells us that G*O of the variation =and not the varian$e? was e!plained.

a8b ,odel 'ummary

,odel

Variables $ntered Remo6ed -./0123, 2"45678.9, :29"2/, . ;4<1=25, c,d 6:2

R '-uare

Adjusted R '-uare

'td# $rror o* the $stimate

.$%'

.$'&

$.%3'(

a. 1ependent Variable/ 6AG< b. 'ethod/ <nter $. 0ndependent Variables/ =.onstant?% 6; KL<A% <1(.A30;"% G<"1< % P(BLS<.% AG< d. All re&uested variables entered.

'td $rror o* $stimate Std error of the esti#ate #easures the dispersion of the dependent variables esti#ate around its #ean =in this e!a#ple% the IStd. <rror of the <sti#ateJ is G.*2?. .o#pare this to the #ean of the IPredi$ted: values of the dependent variable. 0f the Std. <rror is #ore than *0O of the #ean% it is high.

5he reliability o* indi6idual coe**icients 3he table I.oeffi$ientsJ provides infor#ation on the $onfiden$e with whi$h we $an support the esti#ate for ea$h su$h esti#ate =see the $olu#ns I3J and ISig.J.? 0f the value in ISig.J is less than 0.0G% then we $an assu#e that the esti#ate in $olu#n IBJ $an be asserted as true with a NGO level of $onfiden$eG. Always interpret the :Sig: value first. 8f this value is more than ' .% then the coefficient estimate is not reliable because it has >too> much dispersion?variance. 5he indi6idual coe**icients 3he table I.oeffi$ientsJ provides infor#ation effe$t of individual variables =the :<sti#ated .oeffi$ients: or IbetaJ 88see $olu#n IBJ? on the dependent variable Con*idence "nter6al

0f the value is greater than 0.0G but less than 0.*% we $an only assert the vera$ity of the value in IBJ with a N0O level of $onfiden$e. 0f ISigJ is above 0.*% then the esti#ate in IBJ is unreliable and is said to not be statisti$ally signifi$ant. 3he $onfiden$e intervals provide a range of values within whi$h we $an assert with a NGO level of $onfiden$e that the esti#ated $oeffi$ient in IBJ lies. )or e!a#ple% :3he $oeffi$ient for age lies in the range .0N* and .*BG with a NGO level of $onfiden$e% while the $oeffi$ient for gender lies in the range 82.GNQ and 8*.BH2 at a NGO level of $onfiden$e.:

Vjbooks.net

Coe**icientsa
9nstandardi:ed Coe**icients ,odel =.onstant? AG< <1(.A30;" G<"1< P(BLS<. 6; KL<A -%.)(' .%%) .+++ -(.'3' %.+4% .%'' 'td# $rror .4(' .'%4 .'($ .()& .(&( .'%+ t -4.33& ).*3$ 3%.*(( -+.'(3 $.&$+ $.)$4 'ig# .''' .''' .''' .''' .''' .''' <3= Con*idence "nter6al *or %o;er 9pper ound ound -(.*43 -.&&+ .'&% .%4$ .+(& .)($ -(.$&+ -%.4*3 %.%*) (.3%4 .'*+ .%34

a. 1ependent Variable/ 6AG<

Re g r e s s i o n St a n d a r d i z e d Pr e d i c t e d Va l u e

Plot o* residual 6ersus predicted dependent 6ariable 3his is the plot for the Sc a t t e r p l o t standardi7ed predi$ted variable and the standardi7ed residuals. De p e n d e n t Va r ia b l e : WA GE 4 3he pattern in this plot indi$ates the presen$e of #is8 3 spe$ifi$ationH and>or heteroskedasti$ityQ. 2
1

-1

-2 -4 -2 0 2 4 6 8 10

Re g r e s s io n St a n d a r d iz e d Re s i d u a l

3his in$ludes the proble#s of in$orre$t fun$tional for#% o#itted variable% or a #is8#easured independent variable.

for#al test su$h as the <S<3 3est is re&uired to $on$lusively prove the e!isten$e of #is8 spe$ifi$ation. eview your te!tbook for the step8by8step des$ription of the <S<3 test.
Q

A for#al test like the 6hite4s 3est is ne$essary to $on$lusively prove the e!isten$e of heteroskedasti$ity.

eview

your te!tbook for the step8by8step des$ription of the <S<3 test.

Vjbooks.net

Plot o* residuals 6ersus independent 6ariables

3he definite positive pattern indi$atesC the presen$e of heteroskedasti$ity $aused% at least in part% by the variable edu$ation.

Pa r t ia l Re s id u a l Pl o t De p e n d e n t Va r ia b l e : WA GE
50 40

30

20

10

WA GE

-1 0 -2 0 -2 0 -1 0 0 10 20

EDUCA TION

3he plot of age and the residual has no patternN% whi$h i#plies that no heteroskedasti$ity is $aused by this variable.

Pa r t ia l Re s id u a l Pl o t De p e n d e n t Va r ia b l e : WA GE
50 40

30

20

10

W AG E

-1 0 -2 0 -3 0 -2 0 -1 0 0 10 20 30 40

A GE

A for#al test like the 6hiteRs 3est is re&uired to $on$lusively prove the e!isten$e and stru$ture of

heteroskedasti$ity . N So#eti#es these plots #ay not show a pattern. 3he reason #ay be the presen$e of e!tre#e values that widen the s$ale of one or both of the a!es% thereby :s#oothing out: any patterns. 0f you suspe$t this has happened% as would be the $ase if #ost of the graph area were e#pty save for a few dots at the e!tre#e ends of the graph% then res$ale the a!es using the #ethods. 3his is true for all s$atter graphs.

Vjbooks.net

Plots o* the residuals 3he histogra# and the P8P plot of the residual suggest that the residual is probably nor#ally distributed*0. ,ou $an also use other tests to $he$k for nor#ality.
His t o g r a m De p e n d e n t Va r ia b l e : WA GE
6 00

5 00

No r m a l P-P Pl o t o f Re g r e s s ion St a n da r d iz e d Re s id u a l De p e n de n t Va r ia b l e : WA GE
1 .00

4 00

3 00
.7 5

0deali7ed "or#al .urve. 0n order to #eet the $lassi$al assu#ptions% .the residuals should% roughly% follow this $urves shape.

E x p e c t e d Cu m P r o b

.5 0

Fr e q u e n c y

2 00

1 00 0
-3 -2 -1 0. 1 2 3 4 5 6 7 8 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 0 0 0 0 0 0 0 0 0

Std . De v = 1 .00 Me a n = 0.00 N = 1 9 9 3 .00


.0

.2 5

0.00 0 .0 0 .2 5 .5 0

3he thi$k $urve should lie $lose .7 5 1 .0 0 to the diagonal.

Re g r e s s i o n St a n d a r d i z e d Re s i d u a l

Ob s e r v e d Cu m Pr o b

*0

3he residuals should be distributed nor#ally. 0f not% then so#e $lassi$al assu#ption has been violated.

Vjbooks.net

Regression output interpretation guidelines


&ame (* 'tatistic> Chart Sig.8) ?hat @oes "t ,easure (r "ndicateA 6hether the #odel as a whole is signifi$ant. 0t tests whether 8 s&uare is signifi$antly different fro# 7ero Critical Values Comment

8 below .0* for NNO $onfiden$e in the ability of the #odel to e!plain the dependent variable

8 below .0G for NGO $onfiden$e in the ability of the #odel to e!plain the dependent variable

8 below 0.* for N0O $onfiden$e in the ability of the #odel to e!plain the dependent variable SS% <SS S 3SS 3he #ain fun$tion of these values lies in $al$ulating test statisti$s like the )8test% et$. 3he <SS should be high $o#pared to the 3SS =the ratio e&uals the 8s&uare?. "ote for interpreting the SPSS table% $olu#n :Su# of S&uares:/ :3otal: D3SS% : egression: D <SS% and : esidual: D SS

5he *irst statistic to loo! *or in 'P'' output. 0f Sig.8) is insignifi$ant% then the regression as a whole has failed. "o #ore interpretation is ne$essary =although so#e statisti$ians disagree on this point?. ,ou #ust $on$lude that the :1ependent variable $annot be e!plained by the independent>e!planatory variables.: 3he ne!t steps $ould be rebuilding the #odel% using #ore data points% et$.

0f the 8s&uares of two #odels are very si#ilar or rounded off to 7ero or one% then you #ight prefer to use the )8test for#ula that uses SS and <SS.

Vjbooks.net

&ame (* 'tatistic> Chart S< of egression

?hat @oes "t ,easure (r "ndicateA 3he standard error of the esti#ate predi$ted dependent variable

Critical Values

Comment

3here is no $riti$al value. 5ust $o#pare the std. error to the #ean of the predi$ted dependent variable. 3he for#er should be s#all =M*0O? $o#pared to the latter.

,ou #ay wish to $o##ent on the S<% espe$ially if it is too large or s#all relative to the #ean of the predi$ted>esti#ated values of the dependent variable. 3his often #is8used value should serve only as a su##ary #easure of Goodness of )it. 1o not use it blindly as a $riterion for #odel sele$tion. Another su##ary #easure of Goodness of )it. Superior to 8s&uare be$ause it is sensitive to the addition of irrelevant variables.

8S&uare

Proportion of variation in the dependent variable that $an be e!plained by the independent variables Proportion of varian$e in the dependent variable that $an be e!plained by the independent variables or 8s&uare adjusted for T of independent variables 3he reliability of our esti#ate of the individual beta

Between 0 and *. A higher value is better.

Adjusted 8s&uare

Below *. A higher value is better

38 atios

-ook at the p8value =in the $olu#n ISig.J? it #ust be low/ 8 below .0* for NNO $onfiden$e in the value of the esti#ated $oeffi$ient

)or a one8tailed test =at NGO $onfiden$e level?% the $riti$al value is =appro!i#ately? *.HG for testing if the $oeffi$ient is greater than 7ero and =appro!i#ately? 8*.HG for testing if it is below 7ero.

8 below .0G for NGO $onfiden$e in the value of the esti#ated $oeffi$ient

8 below .* for N0O $onfiden$e in the value of the esti#ated $oeffi$ient

Vjbooks.net

&ame (* 'tatistic> Chart .onfiden$e 0nterval for beta

?hat @oes "t ,easure (r "ndicateA 3he NGO $onfiden$e band for ea$h beta esti#ate

Critical Values

Comment

3he upper and lower values give the NGO $onfiden$e li#its for the $oeffi$ient

Any value within the $onfiden$e interval $annot be reje$ted =as the true value? at NGO degree of $onfiden$e

.harts/ S$atter of predi$ted dependent variable and residual

'is8spe$ifi$ation and>or heteroskedasti$ity

3here should be no dis$ernible pattern. 0f there is a dis$ernible pattern% then do the <S<3 and>or 16 test for #is8spe$ifi$ation or the 6hiteRs test for heteroskedasti$ity

<!tre#ely useful for $he$king for breakdowns of the $lassi$al assu#ptions% i.e. 8 for proble#s like #is8 spe$ifi$ation and>or heteroskedasti$ity. At the top of this table% we #entioned that the )8statisti$ is the first output to interpret. So#e #ay argue that the UP <18U <S01 plot is #ore i#portant =their rationale will be$o#e apparent as you read through the rest of this $hapter and $hapter C?. .o##on in $ross8se$tional data.

.harts/ plots of residuals against independent variables

9eteroskedasti$ity

3here should be no dis$ernible pattern. 0f there is a dis$ernible pattern% then perfor# 6hite4s test to for#ally $he$k.

0f a partial plot has a pattern% then that variable is a likely $andidate for the $ause of heteroskedasti$ity. .harts/ 9istogra#s of residuals Provides an idea about the distribution of the residuals 3he distribution should look like a nor#al distribution A good way to observe the a$tual behavior of our residuals and to observe any severe proble# in the residuals =whi$h would indi$ate a breakdown of the $lassi$al assu#ptions?

Vjbooks.net

Problems caused by brea!do;n o* classical assumptions


3he fa$t that we $an #ake bold state#ents on $ausality fro# a regression hinges on the $lassi$al linear #odel. 0f its assu#ptions are violated% then we #ust re8spe$ify our analysis and begin the regression anew. 0t is very unsettling to reali7e that a large nu#ber of institutions% journals% and fa$ulties allow this fa$t to be overlooked. 6hen using the table below% re#e#ber the ordering of the severity of an i#pa$t. 3he worst i#pa$t is a bias in the ) =then the #odel $ant be trusted? A se$ond disastrous i#pa$t is a bias in the betas =the $oeffi$ient esti#ates are unreliable? .o#pared to the above% biases in the standard errors and 3 are not so har#ful =these biases only affe$t the reliability of our $onfiden$e about the variability of an esti#ate% not the reliability about the value of the esti#ate itself?

'ummary o* impact o* a brea!do;n o* a classical assumption on the reliability ;ith ;hich regression output can be interpreted

Violation "mpact
'easure#ent error in dependent variable 'easure#ent error in independent variable 0rrelevant variable ;#itted variable 0n$orre$t fun$tional for#

R2

'td error =of esti#ate?

'td error =of ?

Count o* 6iolations

B B B B B

B B B B B

2 4 2 4 4

B B

B B

B B

B B

Vjbooks.net

Violation "mpact
9eteroskedasti$ity .ollinearity Si#ultaneity Bias

R2

'td error =of esti#ate?

'td error =of ?

Count o* 6iolations

B B B

B B B

2 2 4

Legend for understanding the table

3he statisti$ is still reliable and unbiased. 3he statisti$ is biased% and thus $annot be relied upon. (pward bias in esti#ation 1ownward bias in esti#ation.

Vjbooks.net

@iagnostics
3his se$tion lists so#e #ethods of dete$ting for breakdowns of the $lassi$al assu#ptions. 6hy is the result not a$$eptable unless the assu#ptions are #et@ 3he reason is si#ple 8 the strong state#ents inferred fro# a regression =e.g. 8 :an in$rease in one unit of the value of variable A $auses an in$rease of the value of variable , by 0.2* units:? depend on the presu#ption that the variables used in a regression% and the residuals fro# that regression% satisfy $ertain statisti$al properties. 3hese are e!pressed in the properties of the distribution of the residuals. 7hat explains why so many of the diagnostic tests shown in sections +.4-+.$ and their relevant corrective methods, shown in this chapter, are based on the use of the residuals. 0f these properties are satisfied% then we $an be $onfident in our interpretation of the results. 3he above state#ents are based on $o#ple!% for#al #athe#ati$al proofs. Please refer to your te!tbook if you are $urious about the for#al foundations of the state#ents. 6ith e!perien$e% you should develop the habit of doing the diagnosti$s before interpreting the #odel4s signifi$an$e% e!planatory power% and the signifi$an$e and esti#ates of the regression $oeffi$ients. 0f the diagnosti$s show the presen$e of a proble#% you #ust first $orre$t the proble# and then interpret the #odel. e#e#ber that the power of a regression analysis =after all% it is e!tre#ely powerful to be able to say that :data shows that A $auses , by this slope fa$tor:? is based upon the fulfill#ent of $ertain $onditions that are spe$ified in what have been dubbed the :$lassi$al: assu#ptions. efer to your te!tbook for a $o#prehensive listing of #ethods and their detailed des$riptions. 0f a for#al** diagnosti$ test $onfir#s the breakdown of an assu#ption% then you #ust atte#pt to $orre$t for it. 3his $orre$tion usually involves running another regression on a transfor#ed version of the original #odel% with the e!a$t nature of the transfor#ation being a fun$tion of the $lassi$al regression assu#ption that has been violated*2.

Collinearity13
.ollinearity between variables is always present. A proble# o$$urs if the degree of $ollinearity is high enough to bias the esti#ates. "ote/ .ollinearity #eans that two or #ore of the independent>e!planatory variables in a regression have a linear relationship. 3his $auses a proble# in the interpretation of the regression results. 0f the variables have a $lose linear relationship% then the esti#ated regression $oeffi$ients and 38 statisti$s #ay not be able to properly isolate the uni&ue effe$t>role of ea$h variable and the $onfiden$e with whi$h we $an presu#e these effe$ts to be true. 3he $lose relationship of the
**

(sually% a :for#al: test uses a hypothesis testing approa$h. 3his involves the use of testing against distributions like the 3% )% or .hi8S&uare. An :infor#al4 test typi$ally refers to a graphi$al test.
*2

1onRt worry if this line $onfuses you at present 8 its #eaning and relevan$e will be$o#e apparent as you read through this $hapter.
*2

Also $alled 'ulti$ollinearity.

Vjbooks.net

variables #akes this isolation diffi$ult. ;ur e!planation #ay not satisfy a statisti$ian% but we hope it $onveys the funda#ental prin$iple of $ollinearity. Su##ary #easures for testing and dete$ting $ollinearity in$lude/ unning bivariate and partial $orrelations =see se$tion G.2?. A bivariate or partial $orrelation $oeffi$ient greater than 0.C =in absolute ter#s? between two variables indi$ates the presen$e of signifi$ant $ollinearity between the#. .ollinearity is indi$ated if the 8s&uare is high =greater than 0.QG *B? and only a few 38values are signifi$ant. .he$k your te!tbook for #ore on $ollinearity diagnosti$s.

,is7speci*ication
'is8spe$ifi$ation of the regression #odel is the #ost severe proble# that $an befall an e$ono#etri$ analysis. (nfortunately% it is also the #ost diffi$ult to dete$t and $orre$t. "ote/ 'is8spe$ifi$ation $overs a list of proble#s. 3hese proble#s $an $ause #oderate or severe da#age to the regression analysis. ;f graver i#portan$e is the fa$t that #ost of these proble#s are $aused not by the nature of the data>issue% but by the #odeling work done by the resear$her. 0t is of the ut#ost i#portan$e that every resear$her realise that the responsibility of $orre$tly spe$ifying an e$ono#etri$ #odel lies solely on the#. A proper spe$ifi$ation in$ludes deter#ining $urvature =linear or not?% fun$tional for# =whether to use logs% e!ponentials% or s&uared variables?% and the a$$ura$y of #easure#ent of ea$h variable% et$.

'is8spe$ifi$ation $an be of several types/ in$orre$t fun$tional for#% o#ission of a relevant independent variable% and>or #easure#ent error in the variables. Se$tions Q.B.$ to Q.B.f list a few su##ary #ethods for dete$ting #is8spe$ifi$ation. efer to your te!tbook for a $o#prehensive listing of #ethods and their detailed des$riptions.

'imultaneity bias
Si#ultaneity bias #ay be seen as a type of #is8spe$ifi$ation. 3his bias o$$urs if one or #ore of the independent variables is a$tually dependent on other variables in the e&uation. )or e!a#ple% we are using a #odel that $lai#s that in$o#e $an be e!plained by invest#ent and edu$ation. 9owever% we #ight believe that invest#ent% in turn% is e!plained by in$o#e. 0f we were to use a si#ple #odel in whi$h in$o#e =the dependent variable? is regressed on invest#ent and edu$ation =the independent variables?% then the spe$ifi$ation would be in$orre$t be$ause invest#ent would not really be :independent: to the #odel 8 it is affe$ted by in$o#e. 0ntuitively% this is a proble# be$ause the si#ultaneity i#plies that the residual will have so#e relation with the variable that has been in$orre$tly spe$ified as :independent: 8 the residual is $apturing =#ore in a #etaphysi$al than for#al #athe#ati$al sense? so#e of the un#odeled reverse relation between the :dependent: and :independent: variables.
*B

So#e books advise using 0.C.

Vjbooks.net

"ncorrect *unctional *orm


0f the $orre$t relation between the variables is non8linear but you use a linear #odel and do not transfor# the variables% then the results will be biased. 6hy should an in$orre$t fun$tional for# lead to severe proble#s@ egression is based on finding $oeffi$ients that #ini#i7e the :su# of s&uared residuals.: <a$h residual is the differen$e between the predi$ted value =the regression line? of the dependent variable versus the reali7ed value in the data. 0f the fun$tional for# is in$orre$t% then ea$h point on the regression :line: is in$orre$t be$ause the line is based on an in$orre$t fun$tional for#. A si#ple e!a#ple/ assu#e , has a log relation with A =a log $urve represents their s$atter plot? but a linear relation with :-og A.: 0f we regress , on A =and not on :-og A:?% then the esti#ated regression line will have a syste#i$ tenden$y for a bias be$ause we are fitting a straight line on what should be a $urve. 3he residuals will be $al$ulated fro# the in$orre$t :straight: line and will be wrong. 0f they are wrong% then the entire analysis will be biased be$ause everything hinges on the use of the residuals. -isted below are #ethods of dete$ting in$orre$t fun$tional for#s/ Perfor# a preli#inary visual test. Any pattern in a plot of the predi$ted variable and the residuals plot i#plies #is8spe$ifi$ation =and>or heteroskedasti$ity? due to the use of an in$orre$t fun$tional for# or due to o#ission of a relevant variable. 0f the visual test indi$ates a proble#% perfor# a for#al diagnosti$ test like the <S<3 test or the 16 test. .he$k the #athe#ati$al derivation =if any? of the #odel. 1eter#ine whether any of the s$atter plots have a non8linear pattern. 0f so% is the pattern log% s&uare% et$@ 3he nature of the distribution of a variable #ay provide so#e indi$ation of the transfor#ation that should be applied to it. )or e!a#ple% se$tion 2.2 showed that wage is non8nor#al but that its log is nor#al. 3his suggests re8spe$ifying the #odel by using the log of wage instead of wage. .he$k your te!tbook for #ore #ethods.

(mitted 6ariable
"ot in$luding a variable that a$tually plays a role in e!plaining the dependent variable $an bias the regression results. 'ethods of dete$tion *G in$lude/ Any pattern in this plot i#plies #is8spe$ifi$ation =and>or heteroskedasti$ity? due to the use of an in$orre$t fun$tional for# or due to the o#ission of a relevant variable. 0f the visual test indi$ates a proble#% perfor# a for#al diagnosti$ test su$h as the <S<3 test. Apply your intuition% previous resear$h% hints fro# preli#inary bivariate analysis% et$. )or e!a#ple% in the #odel we ran% we believe that there #ay be an o#itted variable bias be$ause of the absen$e of two $ru$ial variables for wage deter#ination 8 whether the labor is unioni7ed and the professional se$tor of work =#edi$ine% finan$e% retail% et$.?. .he$k your te!tbook for #ore #ethods.

*G

3he first three tests are si#ilar to those for 0n$orre$t )un$tional for#.

Vjbooks.net

"nclusion o* an irrele6ant 6ariable


3his #is8spe$ifi$ation o$$urs when a variable that is not a$tually relevant to the #odel is in$luded*H. 3o dete$t the presen$e of irrelevant variables/ <!a#ine the signifi$an$e of the 38statisti$s. 0f the 38statisti$ is not signifi$ant at the *0O level =usually if 3M *.HB in absolute ter#s?% then the variable #ay be irrelevant to the #odel.

,easurement error
3his is not a very severe proble# if it only affli$ts the dependent variable% but it #ay bias the 38 statisti$s. 'ethods of dete$ting this proble# in$lude/ Knowledge about proble#s>#istakes in data $olle$tion 3here #ay be a #easure#ent error if the variable you are using is a pro!y for the a$tual variable you intended to use. 0n our e!a#ple% the wage variable in$ludes the #oneti7ed values of the benefits re$eived by the respondent. But this is a subje$tive #oneti7ation of respondents and is probably undervalued. As su$h% we $an guess that there is probably so#e #easure#ent error. .he$k your te!tbook for #ore #ethods 'easure#ent errors $ausing proble#s $an be easily understood. ;#itted variable bias is a bit #ore $o#ple!. 3hink of it this way 8 the deviations in the dependent variable are in reality e!plained by the variable that has been o#itted. Be$ause the variable has been o#itted% the algorith# will% #istakenly% apportion what should have been e!plained by that variable to the other variables% thus $reating the error=s?. e#e#ber/ our e!planations are too infor#al and probably in$orre$t by stri$t #athe#ati$al proof for use in an e!a#. 6e in$lude the# here to help you understand the proble#s a bit better.

Ceteros!edasticity
9eteroskedasti$ity i#plies that the varian$es =i.e. 8 the dispersion around the e!pe$ted #ean of 7ero? of the residuals are not $onstant% but that they are different for different observations. 3his $auses a proble#/ if the varian$es are une&ual% then the relative reliability of ea$h observation =used in the regression analysis? is une&ual. 3he larger the varian$e% the lower should be the i#portan$e =or weight? atta$hed to that observation. As you will see in se$tion C.2% the $orre$tion for this proble# involves the downgrading in relative i#portan$e of those observations with higher varian$e. 3he proble# is #ore apparent when the value of the varian$e has so#e relation to one or #ore of the independent variables. 8ntuitively, this is a problem because the distribution of the residuals should have no relation with any of the variables (a basic assumption of the classical model). 1ete$tion involves two steps/ -ooking for patterns in the plot of the predi$ted dependent variable and the residual

*H

By dropping it% we i#prove the reliability of the 38statisti$s of the other variables =whi$h are relevant to the #odel?. But% we #ay be $ausing a far #ore serious proble# 8 an o#itted variableV An insignifi$ant 3 is not ne$essarily a bad thing 8 it is the result of a :true: #odel. 3rying to re#ove variables to obtain only signifi$ant 38statisti$s is bad pra$ti$e.

Vjbooks.net

0f the graphi$al inspe$tion hints at heteroskedasti$ity% you #ust $ondu$t a for#al test like the 6hiteRs test. Se$tion Q.G tea$hes you how to $ondu$t a 6hiteRs test *Q.

Chec!ing *ormally *or heteros!edasticityD ?hiteEs test

3he 6hiteRs test is usually used as a test for heteroskedasti$ity. 0n this test% a regression of the s&uares of the residuals is run on the variables suspe$ted of $ausing the heteroskedasti$ity% their s&uares% and $ross produ$ts. =residuals?2 D b0 E b* educ E b2 workLex E b2 =educ?2 E bB =workLex?2 E bG =educFworkLex?

a ,odel 'ummary

Variables $ntered =@1-./0, =@12"45, 2"41-./0, -ork 2xperience, 2"45678.9

R '-uare

Adjusted R '-uare

'td# $rror o* the $s timate

.'3+

.'3$

.(%'(

a. 1ependent Variable/ SWL <S

.al$ulate nF QB.H.

6hiteRs 3est D 0.02Q% nD20*H

3hus% nF

D .02QF20*H D

.o#pare this value with 2 =n?% i.e. with 2 =20*H? =2 is the sy#bol for the .hi8S&uare distribution? As nF
2

2 =20*H? D *2B obtained fro# 2 table. =)or NGG $onfiden$e? heteroskedasti$ity $an not be $onfir#ed.

M 2 %

"ote/ Please refer to your te!tbook for further infor#ation regarding the interpretation of the 6hite4s test. 0f you have not en$ountered the .hi8S&uare distribution>test before% there is no need to pani$V 3he sa#e rules apply for testing using any distribution 8 the 3% )% U% or .hi8S&uare. )irst% $al$ulate the re&uired value fro# your results. 9ere the re&uired value is the sa#ple si7e =:n:? #ultiplied by the 8s&uare. ,ou #ust deter#ine whether this value is higher than that in the standard table for the relevant distribution =here the .hi8S&uare? at the re$o##ended level of $onfiden$e =usually NGO? for the appropriate degrees of freedo# =for the 6hite4s test% this e&uals the sa#ple si7e :n:? in the table for the distribution =whi$h you will find in the ba$k of #ost e$ono#etri$s>statisti$s te!tbooks?. 0f the for#er is higher% then the hypothesis is reje$ted. (sually the reje$tion i#plies that the test $ould not find a proble#*C.

*Q

;ther tests/ Park% Glejser% Goldfelt8Wuandt. efer to your te!t book for a $o#prehensive listing of #ethods and their detailed des$riptions.
*C

6e use the phraseology :.onfiden$e -evel of :NGO.: 'any professors #ay frown upon this% instead preferring to use :Signifi$an$e -evel of GO.: Also% our e!planation is si#plisti$. 1o not use it in an e!a#V 0nstead% refer to the $hapter on :9ypothesis 3esting: or :.onfiden$e 0ntervals: in your te!tbook. A $lear understanding of these $on$epts is essential.

Vjbooks.net

A&'?$R' 5( C(&C$P59A% F9$'5"(&' (& R$GR$''"(& A&A%G'"' *. 6hy is the regression #ethod you use $alled X-east S&uaresR@ .an you justify the use of su$h a #ethod@ Ans/ 3he #ethod #ini#ises the s&uares of the residuals. 3he for#ulas for obtaining the esti#ates of the beta $oeffi$ients% std errors% et$. are all based on this prin$iple. ,es% we $an justify the use of su$h a #ethod/ the ai# is to #ini#ise the error in our predi$tion of the dependent variable% and by #ini#ising the residuals we are doing just that. By using the :s&uares: we are pre$luding the proble# of signs thereby giving positive and negative predi$tion errors the sa#e i#portan$e. 2. 3he $lassi$al assu#ptions #ostly hinge on the properties of the residuals. 6hy should it be so@ Ans/ this is linked to &uestion *. 3he esti#ation #ethod is based on #ini#ising the su# of the s&uared residuals. As su$h% all the powerful inferen$es we draw fro# the results Ylike 2% betas% 3% )% et$.Z are based on assu#ed properties of the residuals. Any deviations fro# these assu#ptions $an $ause #ajor proble#s. 2. Prior to running a regression% you have to $reate a #odel. 6hat are the i#portant steps and $onsiderations in $reating su$h a #odel@ Ans/ the #ost i#portant $onsideration is the theory you want to test>support>refute using the esti#ation. 3he theory #ay be based on theoreti$al>analyti$al resear$h and derivations% previous work by others% intuition% et$.. 3he li#itations of data #ay $onstrain the #odel. S$atter plots #ay provide an indi$ation of the transfor#ations needed. .orrelations #ay tell you about the possibility of $ollinearity and the #odeling ra#ifi$ations thereof. B. 6hat role #ay $orrelations% s$atter plots and other bivariate and #ultivariate analysis play in the spe$ifi$ation of a regression #odel@ Ans/ prior to any regression analysis% it is essential to run so#e des$riptives and basi$ bivariate and #ultivariate analysis. Based on the inferen$es fro# these% you #ay want to $onstru$t a #odel whi$h $an answer the &uestions raised by the initial analysis and>or $an in$orporate the insights fro# the initial analysis. G. After running a regression% in what order should you interpret the results and why@ Ans/ first% $he$k for the breakdown of $lassi$al assu#ptions Y$ollinearity% heteroskedasti$ity% et$..Z. 3hen% you are sure that no #ajor proble# is present% interpret the results in roughly the following order/ Sig )% Adj 2 % Std error of esti#ate% Sig83% beta% .onfiden$e interval of beta H. 0n the regression results% are we $onfident about the $oeffi$ient esti#ates@ 0f not% what additional infor#ation Ystatisti$Z are we using to $apture our degree of un8$onfiden$e about the esti#ate we obtain@ Ans/ no% we are not $onfident about the esti#ated $oeffi$ient. 3he std error is being used to
Vjbooks.net

Vjbooks.net

.% ( ? @"AGRA, .( R R$ GR$ ' ' "( & A&A% G' "'

.learly define the issue and the results being sought

Problem>issue

,ethod
;-S% '-<% -ogit% 3i#e Series% et$.

,odel
Y, D a E b AZ or other

.inal @ataset Prepare data *or analysis

Read data into so*t;are program

@escripti6es8 correlations scatter charts


9istogra# to see distribution% nor#ality tests% $orrelations

"nterpretation
;btain so#e intuitive understanding of the data series% identify outliers% % et$.

Run Regression 5est .or rea!do;n o* Classical Assumptions H*or %inear RegressionI &o rea!do;n rea!do;n

'easure#ent <rror

9eteroskedasti$ity% Auto$orrelation% 'isspe$ifi$ation% .ollinearity% Si#ultaneity Bias

;#itted Variable% 0rrelevant Variable% 0n$orre$t )un$tional )or#

Create a ne; model Jsee list below?


9eteroskedasti$ity/ 6-S% Auto$orrelation/ G-S% A 0'A Si#ultaneity Bias/ 0V% 2S-S% 2S-S% .ollinearity/ 0V% dropping a variable ,ay need to create ne; 6ariables

Running Correct ,odel and ,ethod

@iagnostics

Cypothesis 5ests
egression s$he#ati$ flow $hart *NNN Vijay Gupta

.% ( ? @"AGRA, .( R R$ GR$ ' ' "( & A&A% G' "'

egression s$he#ati$ flow $hart *NNN Vijay Gupta

You might also like