Deville 1992
Deville 1992
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
Surveystatisticians use auxiliaryinformation in many and lrkl = Pr(k& /E s) areassumedto be strictly positive.
waysto improvesurveyestimates. For example,usingthe LetYk be thevalueofthevariableofinterest, y,forthekth
generalregression estimatorfora finite populationtotalor population element, withwhichalsoisassociated anauxiliary
meanrequires a vectorofauxiliary variablesforwhichthe vectorvalue,Xk= (Xkl, . . . Xkj,. . . , xk)'. For theelements
population totalisknown. Thecalibration derived k E s, we observe(Yk, Xk). The populationtotalof x, tx
estimators
inthisarticle area family ofestimators appealinga common = u Xk, isassumedtobe accurately known. Thisknowledge
baseofauxiliary information. A calibration estimator uses maycomefromone or moresources,suchas censusdata,
calibrated weights, whichareas closeas possible, according administrative datafiles,andothers. IfA (A c U) is anyset
toa givendistancemeasure, to theoriginal sampling design ofpopulation elements, ZA is ourshorthand for2kE.A (e.g.,
weights 7r-1whilealso respecting a setof constraints,the Is Yk meansEkEs Yk)-
calibration equations. Foreverydistancemeasurethereis a The objectiveis to estimatethe populationtotal ty
corresponding setofcalibrated weights anda calibrationes- = >UYk. Extending an ideaofLemel( 1976),Deville( 1988)
timator.In Section2 wedefine a family ofdistancemeasures usedcalibration onknownpopulation x-totalstomodify the
andderive thecorresponding family ofcalibration
estimators,basicsampling designweights, dk =1 /7rk, thatappearinthe
thenestablish theirproperties Variance Horvitz-Thompson
in a seriesofresults. estimator, t, = Es YklWk = Jsdkyk. A
estimators forcalibration estimators aregivenin Section3. newestimator, tyw = Es WkYk is sought, withweights wkas
Animportant application oftheseideas,mentioned in Sec- closeas possible,in an averagesensefora givenmetric, to
tion4, is thecalibration on knownmarginal countsin two- thedk whilerespecting thecalibration equation
wayormultiway tables,whichleadsto generalized raking.
I WkXk= tx-(1.)
1. DERIVINGTHEGENERALREGRESSION s
BY CALIBRATION
ESTIMATOR Here,wkswouldbe a moreappropriate notation forthesam-
Considera finitepopulation U = { 1, . . ., k, ..., pledependent butforbrevity
weights, wewrite justWk. The
fromwhicha probability samples (s C U) is drawnwitha idea ofadjustingthesampleweights dk is discussedin the
givensamplingdesign,p( *). That is,p(s) is theprobability context oftheU.S. ConsumerExpenditure SurveybyZie-
thats is selected.
Theinclusion probabilities rk = Pr(kE s) schang(1986, 1990), who considered"'weighting control
procedures" through leastsquaresweighting
a generalized
* Jean-ClaudeDevilleisHeadoftheStatistical
MethologyandSampling algorithm. The workof Bankier(1990) is also related.If
Division,Institut et des EtudesEconomiques, Ep(*) denotes
Nationalde la Statistique withrespect
expectation tothesampling design
18 BoulevardAdolphePinard,75675ParisCedex 14,France.Carl-Erik
Sarndalis Professor,
Departement et de Statistique,
de Mathematiques
Universitede C. P. 6128,Succursale
Montreal, A,Montreal,
QuebecH3C3J7, ? 1992 AmericanStatisticalAssociation
Canada.Thisworkwassupported in partbytheNaturalSciencesandEn- Journalof the AmericanStatisticalAssociation
Research
gineering CouncilofCanada. June 1992,Vol. 87, No. 418, Theoryand Methods
376
s
Inmostofourapplications, gk(w, d) = g(w/d)/qk,where
g(.) is a function of the single argument wld, independent
wheretx,= Is dkXkdenotestheHorvitz-Thompson esti- ofk, continuous, strictly increasing, and such thatg( 1) = 0
matorforthex-vector and andg'( 1) = 1. Examples are found in Table 1. Thengk(w,
d) depends on k the
onlythrough multiplicative factor1/
Bk= T-1 dkqkxkyk (1.7)
theinversefunction ofg(*),
S
qk. IfF(u) = g-'(u) denotes
(2.2) becomes
isa weighted estimatorofthemultiple regression coefficient.
wk = dkF(qkx'kX) (2.2a)
Thus,Deville's(1988) calibration techniqueachievestwo
things: (1) itprovides an alternative
derivation ofthegen- From( 1.1),thecalibration equations necessarytodetermine
eralized regressionestimator(Cassel,Sarndal, andWretman X =(1,. .., X, . . ., XJ)'are
1976;Gourieroux 1981;Samdal1980;IsakiandFuller1982;
Wright 1983)and(2) itemphasizes thatitis fruitful
toview tx = I WkXk= l dkFk(X'kX)xk. (2.3)
oftheobservations
(1.6) as a linearweighting Yk with weights s s
aretherelative merits ofthesecases?The existence ofa so- < (L- )/q,k;and Fk(u) = Uif u > ( U- l)/qk.The weights
lutionof (2.5) is one aspectthatneedsto be considered. wkwillthenbe restricted according toLdk < Wk < Udk. The
Cases 1 and 2 alwaysleadto a solution;in Cases3, 4, and corresponding distancefunction is as in Case 1,ifLdk < Wk
5, a solutionis notguaranteed, butResult1 on page 379 < Udk,andisdefined as infinityotherwise. Ifwechoosethat
showsthattheprobability ofa solutiontendsto 1. More L willbe positive, negative weights can neveroccur.
important perhaps is therangeofvaluesthattheweights wk
= dkF(qkx'kX)cantake.InCase 1,whichyieldstheregression Example 2. Returning to Example1,we notethatthe
estimator ( 1.6),theweights canbepositive ornegative; Cases ratioestimatoris obtainedforanygk( w,d) oftheform g( w/
2, 3,4, and5 guarantee positive weights.In eachcase,some d)/qk,ifqk = l/Xk. ThenFk(x'kX) = F(qkXkX) = F(X), a
unrealistic orextreme weights wk mayoccurforrareor"un- constant. From(2.3), F(X) = tf/t, so (2.6) givestheratio
lucky"samples.ThatCase 1 can yieldnegative weights wk estimatortyw= txty/t.x
maybe unacceptable to someusers.Case 2 mayyieldsome Example2 is exceptional in thatthechoiceoffunction
weights wk thatare extremely largecomparedto thebasic hasno bearingon theestimator. In general, differentFk(u)
sampling weights dk = X-1; again,theusermayfindthis yielddifferent estimators. However,one can expectthese
unacceptable. Onemaywanttoavoida function Fk( u) that estimatorstoproduceonlyslightly different estimatesinme-
can giveoverlyextremeweights, becauseapplying, these diumtolargesamples.Thetheoretical backing forthisclaim
weights to makeestimates forvarioussubpopulations (do- is theimportantResult5 (p. 379) stating thatall estimators
mains)mayproduce unrealistic estimatesforsomedomains. (2.6), undermildconditions on theunderlying Fk(u), are
We therefore consider a fewadditional functions thathave asymptotically equivalent to theregression estimator (1.6)
theattractive property ofyielding weights restrictedto an generated bythelinearfunction Fk(u) = 1 + qku. Thus,for
interval thatthestatistician canspecify inadvance.Extreme mediumto largesamples,thechoiceofFk(u) has onlya
setuphas thefollowing important features: We considera Remark. Becausetyw is thenearest estimator to ty,7in a
sequenceoffinite populations andsampling designs indexed givensense,itcanbe expected to inherit someoftheprop-
byn,wheren is thesamplesize(fora fixed-sized sampling ertiesofti,. Designunbiasedness is a property ofti,, so we
design)or theexpectedsamplesize (fora random-sizedmayexpectto findthatt,yw is at leastasymptotically design-
sampling design).The finitepopulationsize,N, tendsto unbiased(ADU). Thisproperty in
can factbe obtained, if
infinitywithn, and we assumethatforanyvectorvalued attention It
is paidtoonedetail: is notcertain that (2.5) has
variablex ofinterest tothisarticle a solution.Witha smallprobability, thereis none,and tyw
is undefined. Wetherefore modify theestimator as follows:
1. lim N't, exists.
if a if
Use tyw (2.5) has solution; itdoesnot,usety,(thatis,
2. N'(ti, - t,) -* 0 in designprobability.
setXs= 0). Thisgivesan ADU estimator. Note,forexample,
3. n'12N-'(ti, - t) converges indistribution tothemul-
thattheregression estimator 1.6
( ) is undefined ifTs is sin-
tinormalN(O, A).
gular.The usualpoststratification a
estimator,specialcase
Here(3) is tojustify theuse ofthenormalapproximationof( 1.6),isundefined ifthereis atleastonezeropoststratum
inconfidence intervalsbasedon ty,Froma practical stand- count.
point,theassumptions meanthat:(a) thecomponents of Result5. ForanyFk(* ) obeying ourconditions, given
- t, areconsidered smallandquantities on theorderof tyw
by(2.6) is asymptotically equivalent to theregression esti-
IItX- t 112 areconsidered and(b) tx - t, follows matortyreg givenby (1.6), in the sense that
negligible
N-1(tyw
anapproximately normal distribution
withcovariance matrix -
tyreg)= Op(n-1). As a consequence,the two estimators
n-1N2A. Let E >u be shorthand forZIN1 ZN,; set Akl
share thesameasymptotic variance.
-
rkl- lrk7lr Now(1), (2), and (3) imply that
Proof From(2.6) and (2.7),
nN2 I Akl(Xk/7rk)(Xl/7rl) = nN 2V(tXw)
u N'tyW = N-1ty + N l(tX- tx)'T1s dkqkxkyk
converges to the fixedmatrixA and thatN-1(tix- t,)
= Op(n-1/2). We can viewA as a matrix thatdescribes an + Op(n-1)+ N-1 2: dkYkOk(X'kXs).
asymptotic ofthesampling
effect designusedforthesurvey. s
= T(tx - tx1); subsequentiterations, v = 2, 3, element k is in rowi and0 otherwise, and 6 .jk = 1 ifk is in
obey(3.5) untilconvergence. NowX1isthevector (1.4) that columnj and 0 otherwise. Then,IU Xk = (N1+, ..., Nr+,
yieldstheregression estimator ( 1.6). Thus,( 1.6) is a first N+1, . . . , N+c)', whereNi+ = I jq=INij, N+j = I ir=INij.Letting
approximation to(2.6); Result5 showedthemtobe asymp- X = (ul, . . , Ur,V,. . , vC)',we havex'kX= ui+ whenever
vj
totically equivalent. Putsomewhat differently, thefirst ad- k belongsto cellij. Thatis, F(x' X) = F( ui + depends
vj)
justmentof X is essential, buttheremaining adjustmentson thecell but noton thelabelwithinthecell. WithNg
arerelatively unimportant. IfF(u) = 1 + u is thechosen = 1 thecalibration equations(2.3) taketheform
jIrk,
function, theiteration stopsafterthefirst step.(Note:For
caseswhereF- ' mapsC ontoan interval I ofDR, one must c
This systemis to be solvedforul, .. ,Ur, vl, . . , vc,using Note the propertiesN`04)(O) = 0; 4(O) = 0, and N-')',(0)
thefunction F( *) chosenbythestatistician. Iterativesolution = N-'Ts; 4'(0) = T = limN` >3UXkX'. NowforeveryX,4' is a
is oftenrequired. One ofther + c equationsis redundant,positivedefinite matrix, becauseall Fk are increasing functions.
so it is possibleto fixone component-say, vc 0-and
= Consequently, 4 is injectiveand maps C* onto an open neighbor-
the
solve system =
fori 1, . . ., r; j = 1,. . ., c-1. Note hood of 0 in RIV.
Let B be a closedsphere with radius r contained
thatui + Vjremainsinvariant to the elimination of one in thatneighborhood, and letA be thecompactset+-1'(B). The
inversefunction +-l isdefined onB, continuous, andcontinuously
equation.Havingobtainedtheui and vj,we calculatethe differentiable. Then l4)-'(x) 1 is continuously differentiableand
cell factorsF(ui + vj), thecalibrated cell countestimates boundedon B. LetK = maxxEBllI()'(x) 11.
Ni, = Ni F(ui + vj), and the calibrated weights wk
- dkNtJ/ Nij. Finally,the calibrationestimator,obtained 1.2 Properties of N-14 (X)
from(2.6), is We needa resultthatjustifies theuseofan inverse mapping of
Os. Such a resultis obtainedin this section.All functions
tyw- z =kyk- z z N iJYs,1J (4.3) N-10'4(X)aredefined on C* andtherefore onA. Fora continuous
s ij #definedon C*, let iI4iIM= sup;EM,l1m(X)IIforM compactin C*.
Byourgeneral propertiesofconvergence, wehaveforeverye > 0
whereY = (5s,, dkYk)/9jj. Thecellcountestimates NT are thatPn( 11 N-'Os - 4)A < e) -- 1 whenn increases. Now let4)1
oftensubstantial improvements on thenaiveestimates Nij. = N-'Os for some function verifying
Ill - 4)IIA< 3r; 11)'1- O) IIA
In fact,the estimator (4.3) can be nearlyas efficient as i3K,with0 <[ < I. The probability
O ofthiseventtendsto 1 as
I; Ii NijY,, thepoststratified estimator formed whenthe n increases.Let r1= (1 - f)r, and let B1 be the sphere1lxii < r
Niiareknown.Iftheeffects onyoftherowsandthecolumns in RJ.Now X1 maps the frontierof A onto the crownr, < lixii
areadditive (i.e.,theinteraction effects
arenegligible), then ? r( 1 + [3),and XI)(A) is a bordered manifold homotopic to B.
(4.3) and I Zij Nijjs,havevirtually identicalvariances. These notions arediscussed in Trenoguine (1987). Consequently,
Efficiency,varianceestimation, computational aspects,the 41 (A) coversthesphereB,-in otherwords,foreveryx 6 B,, the
=
occurrence ofextremeweights, andtheuse ofspecialfunc- equation 41(X) x has a (unique) solution.Moreover,4)l1, de-
finedon B,, is a continuously differentiablefunction.Because
tionsF( *) torestrict
therangeoftheweights (as in Cases6
Ii4,- )'11< [K foreveryX in C, (041 )'(x) existsforeveryx
and7) areaspectsofgeneralized raking
thatwediscussin a EB,, and
II4y1'(x)II<?IxIIK(1 - [-f
forthcoming paper.Thetheory inSection2 permits
discussed
a widechoiceoffunctions F( *). Somesimplespecifications2. Proofsof the Three Results
ofF( *) correspond towell-known procedures:
First,thelin- Result1. First,N-' (ti, - t,) = z belongsto B1witha probability
ear functionF(u) = 1 + u yieldsadditivecell factors,F(ui to 1. Second,N- 14)hasan inverse
tending on B, witha
function
+ vj) = + ui+ v, andtheweights arewk=dk(1+ ui+ vj) probabilitytendingto 1. As (2.5) can be writtenN` (t - tx,r)
fortheelementskincellij. Theseweights arenotnecessarily= N-4)s(X), theequationhas a uniquesolutionwithprobability
Thecalibration
positive. equations to 1.
(4.1) and(4.2) thatresult tending
fromthiscase werepresentedin Demingand Stephan Result 2. Let XA= (N-'O))-'(z) if z belongs to B,; other-
(1940). Second,theexponential caseF(u) = exp(u) gives wise,Xsis arbitrarily
defined.Since4s(0) = 0, we haveXs- 0
cell factorsF(ui + vj) = exp(ui)exp(vj), and
multiplicative = (N-1'0') `(z) -(N-AT1)-1(0) and lxiis?< IIzlK( [)I 1- i
thealways-positive weights are wk = dkexp(ui+ vj). The This inequality holdswithprobability tending to 1 whenn in-
solutionto (4.1) and (4.2) in thiscase can be obtainedby creases.But z = Op(n- 2), so thereexistsa constantK' such
carrying out (untilconvergence) theclassicalrakingratio thatPn(izil? K'n-/2) -* 1. Combiningthe twoinequalities,
< 1 1,whichimpliesbydefinition
algorithm ofDemingandStephan( 1940),sometimes called Pn(IIXsII KK'( -f[)3n-1/2)
-*
thatXs=Op(n-l/2
iterativeproportionalfitting.In practice,the procedureis
sometimes stoppedaftertwoiterations. As pointedout in Result3. Let Ok(U) = Fk(u) - 1 - qku. We assumethat Ok(U)
Huang( 1976),DemingandStephan suggestedthealgorithm Q(u2) holdsuniformly,
= whichis equivalent to ourassumption
apparently thinkingthatitconverges tothesolutionforthe thatF',(O) is bounded.
uniformly Thus 0(u) = max 0k(U) = O(u2).
foranye > 0, thereexistsK" suchthat,forall k, I u1
linearcase,forwhichtheyhadpresented theequations.This Otherwise,
< e willimplythatMk(U) < K'u2. We can write(2.5) as t-,
waslaternotedbyDeming( 1943).
= Es dkxk{ qkx'kXs + Ok(X'kXs)}, and thereforeXs- Ts-I(tx- ti,)
= -T-1 Is dkxkOk(x'Xs). ForXssufficiently small,
APPENDIX:PROOFS OF RESULTS1, 2, AND 3
1Xs - T;1(tx - tx)1II < II(N- Ts)-I1K{llN 1 z dkllxkiI3I lxlsII2.
1. MathematicalPreliminaries
(2) inSection
Here,11(N-1Ts)-f1 = Op( 1), and,usingassumption
1.1 TheFunction
X and ItsProperties 2, N-1 Es dkilXkli3 = Op(l). Moreover,by Result2, II1s 2
= 0p(n1). Result3 follows.
{ :
LetC, = nfl xkX E Imk(dk) }, wheren isoverk E U, the
finite
populationassociatedwiththe(expected)samplesizen. The [ReceivedNovember1989. RevisedJuly1991.1
C? ofCnis an openconvexsetcontaining
interior 0 foreveryn.
Moreover, C* =nn-l C? iS convex;weassumeitis alsoopen.Let REFERENCES
EXnandP,,denoteexpectation andprobability,
withrespect to the
samplingdesignindexedbyn. ForXE C*',N' tE, { 4)(X) } isa well Bankier,M. D. (1990), "Generalized Least Squares EstimationUnder
Poststratification,"
in Proceedings oftheSectionon SurveyResearch
definedcontinuouslydifferentiable
function.
Byourassumption 3 Methods,American Statistical
Association, pp.730-755.
appliedto thevariableFk(xX), it convergesto a fixedfunction Bethlehem,J.G., and Keller,W. J. (1987), "Linear Weightingof Sample
denoted4). Convergence is uniformon everycompactsetin C*'. SurveyData," JournalofOfficialStatistics,3, 141-153.
Cassel,C. M., Sarndal,C. E., and Wretman,J. H. (1976), "Some Results le Redressement des Enquetespar Sondage,Annalesde l'INSEE, 22-23,
on GeneralizedDifference Estimationand GeneralizedRegressionEsti- 272-282.
mationforFinitePopulations,"Biometrika,63, 615-620. Sarmdal, C. E. (1980), "On ir-Inverse
WeightingVersusBestLinearWeight-
Deming,W. E. (1943), StatisticalAdjustmentofData, New York: John ingin ProbabilitySampling,"Biometrika,67, 639-650.
Wiley. (1982), "ImplicationsofSurveyDesignforGeneralizedRegression
Deming,W. E., and Stephan,F. F. (1940), "On a LeastSquaresAdjustment Estimationof LinearFunctions,"JournalofStatisticalPlanningand In-
of a Sampled FrequencyTable whenthe ExpectedMarginalTotals are ference,7, 155-170.
Known,"TheAnnalsofMathematical 11,427-444.
Statistics, Siirndal,C. E., Swensson,B., and Wretman,J. H. (1989), "The Weighted
Deville,J.C. (1988), "EstimationLineaireetRedressement surInformations ResidualTechniqueforEstimating theVarianceoftheGeneralRegression
Auxiliairesd'Enquetespar Sondage,"in Essais en l'Honneurd'Edmond Estimatorof theFinitePopulationTotal," Biometrika,76, 527-537.
Malinvaud,eds. A. Monfortand J. J. Laffond,Paris: Economica, pp.
Sautory,0. ( 1991), "Redressements d'Echantillonsd'Enquetesaupresdes
915-927.
Menagespar Calage surles Marges,"Internalreport,DivisionMethodes
Fuller,W. A., and Isaki,C. T. (1981), "SurveyDesign Under Superpopu-
Statistiqueset Sondages,INSEE, Paris.
lationModels," in CurrentTopicsin SurveySampling,eds. D. Krewski,
J.N. K. Rao, and R. Platek,New York: AcademicPress,pp. 199-226. Trenoguine,V. (1987), AnalyseFonctionnelle, Moscow: EditionsM.I.R.
Gourieroux,C. (1981), The'oriedes Sondages,Paris:Economica. Wright, R. L. (1983), "FinitePopulationSamplingWithMultivariate Aux-
Huang,E. (1976), "RegressionEstimationofMeans and TotalsforSample iliaryInformation," JournaloftheAmerican Statistical
Association,
78,
SurveyData," unpublishedPh.D. thesis,Iowa StateUniversity. 879-884.
Isaki,C. T., and Fuller,W. A. (1982), "SurveyDesignUndertheRegression Zieschang,K. D. (1986), "A GeneralizedLeast SquaresWeighting System
Superpopulation Model,"Journal oftheAmerican Statistical
Association, fortheConsumerExpenditureSurvey,"in ProceedingsoftheSectionon
77, 89-96. Survey Research Methods, American Statistical
Association,
pp.64-71.
Lemaitre,G., and Dufour,J.(1987), "An Integrated MethodforWeighting (1990), "Sample WeightingMethodsand Estimationof Totals in
Personsand Families,"SurveyMethodology,13, 199-207. theConsumerExpenditureSurvey,"JournaloftheAmericanStatistical
Lemel, Y. (1976), "Une Generalisationde la Methodedu Quotientpour Association,85, 986-1001.