0% found this document useful (0 votes)
23 views8 pages

Deville 1992

Uploaded by

Ioannis Milas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views8 pages

Deville 1992

Uploaded by

Ioannis Milas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Calibration Estimators in Survey Sampling

Author(s): Jean-Claude Deville and Carl-Erik Sarndal


Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 87, No. 418 (Jun., 1992), pp. 376-
382
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2290268 .
Accessed: 16/02/2013 15:00

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.

http://www.jstor.org

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
Calibration Estimatorsin Survey Sampling
DEVILLEand CARL-ERIK
JEAN-CLAUDE SARNDAL*

This articleinvestigatesestimationof finitepopulationtotalsin the presenceof univariateor multivariate auxiliaryinformation.


Estimationis equivalentto attachingweightsto the surveydata. We focusattentionon the severalweighting systemsthatcan be
associatedwitha givenamountof auxiliaryinformation and derivea weighting systemwiththeaid ofa distancemeasureand a set
ofcalibrationequations.We briefly mentionan applicationto thecase in whichtheinformation consistsof knownmarginalcounts
in a two-or multi-way table,knownas generalizedraking.The generalregression estimator (GREG) was conceivedwithmultivariate
auxiliaryinformation in mind.Ordinarily, thisestimatoris justifiedby a regressionrelationshipbetweenthe studyvariabley and
the auxiliaryvectorx. But we note thatthe GREG can be derivedby a different routeby focusinginsteadon the weights.The
ordinarysamplingweightsof the kth observationis l /rk, whereirk iS the inclusionprobabilityof k. We show thatthe weights
impliedby the GREG are as close as possible,accordingto a givendistancemeasure,to the 1/ rk whilerespecting side conditions
calledcalibrationequations.These statethatthesamplesumoftheweighted auxiliaryvariablevaluesmustequal theknownpopulation
totalforthatauxiliaryvariable.That is, the calibratedweightsmustgiveperfectestimateswhenapplied to each auxiliaryvariable.
That is a consistencycheckthatappeals to manypractitioners, because a strongcorrelationbetweenthe auxiliaryvariablesand the
studyvariablemeans thatthe weightsthatperformwell forthe auxiliaryvariablealso should performwell forthe studyvariable.
The GREG uses the auxiliaryinformation so the estimatesare precise;however,the individualweightsare not always
efficiently,
withoutreproach.For example,negativeweightscan occur,and in some applicationsthisdoes not make sense.It is naturalto seek
therootof thedissatisfaction in the underlying distancemeasure.Consequently,we allow alternative distancemeasuresthatsatisfy
onlya set of minimalrequirements. Each distancemeasureleads, via the calibrationequations,to a specificweighting systemand
thereby to a newestimator.
These estimators forma familyofcalibrationestimators. We showthattheGREG is a first approximation
to all othermembersof the family;all are asymptotically equivalentto the GREG, and the varianceestimatoralreadyknownfor
theGREG is recommendedforuse in any othermemberof thefamily.Numericalfeaturesoftheweightsand ease ofcomputation
become morethan anythingelse the bases forchoosingbetweenthe estimators.The reasoningis applied to calibrationon known
marginalsofa two-wayfrequency table.Our familyofdistancemeasuresleadsin thiscase to a familyofgeneralizedrakingprocedures,
of whichclassicalrakingratiois one.
KEY WORDS: Raking;Regressionestimators.
Calibration;Multivariateauxiliaryinformation;

Surveystatisticians use auxiliaryinformation in many and lrkl = Pr(k& /E s) areassumedto be strictly positive.
waysto improvesurveyestimates. For example,usingthe LetYk be thevalueofthevariableofinterest, y,forthekth
generalregression estimatorfora finite populationtotalor population element, withwhichalsoisassociated anauxiliary
meanrequires a vectorofauxiliary variablesforwhichthe vectorvalue,Xk= (Xkl, . . . Xkj,. . . , xk)'. For theelements
population totalisknown. Thecalibration derived k E s, we observe(Yk, Xk). The populationtotalof x, tx
estimators
inthisarticle area family ofestimators appealinga common = u Xk, isassumedtobe accurately known. Thisknowledge
baseofauxiliary information. A calibration estimator uses maycomefromone or moresources,suchas censusdata,
calibrated weights, whichareas closeas possible, according administrative datafiles,andothers. IfA (A c U) is anyset
toa givendistancemeasure, to theoriginal sampling design ofpopulation elements, ZA is ourshorthand for2kE.A (e.g.,
weights 7r-1whilealso respecting a setof constraints,the Is Yk meansEkEs Yk)-
calibration equations. Foreverydistancemeasurethereis a The objectiveis to estimatethe populationtotal ty
corresponding setofcalibrated weights anda calibrationes- = >UYk. Extending an ideaofLemel( 1976),Deville( 1988)
timator.In Section2 wedefine a family ofdistancemeasures usedcalibration onknownpopulation x-totalstomodify the
andderive thecorresponding family ofcalibration
estimators,basicsampling designweights, dk =1 /7rk, thatappearinthe
thenestablish theirproperties Variance Horvitz-Thompson
in a seriesofresults. estimator, t, = Es YklWk = Jsdkyk. A
estimators forcalibration estimators aregivenin Section3. newestimator, tyw = Es WkYk is sought, withweights wkas
Animportant application oftheseideas,mentioned in Sec- closeas possible,in an averagesensefora givenmetric, to
tion4, is thecalibration on knownmarginal countsin two- thedk whilerespecting thecalibration equation
wayormultiway tables,whichleadsto generalized raking.
I WkXk= tx-(1.)
1. DERIVINGTHEGENERALREGRESSION s

BY CALIBRATION
ESTIMATOR Here,wkswouldbe a moreappropriate notation forthesam-
Considera finitepopulation U = { 1, . . ., k, ..., pledependent butforbrevity
weights, wewrite justWk. The
fromwhicha probability samples (s C U) is drawnwitha idea ofadjustingthesampleweights dk is discussedin the
givensamplingdesign,p( *). That is,p(s) is theprobability context oftheU.S. ConsumerExpenditure SurveybyZie-
thats is selected.
Theinclusion probabilities rk = Pr(kE s) schang(1986, 1990), who considered"'weighting control
procedures" through leastsquaresweighting
a generalized
* Jean-ClaudeDevilleisHeadoftheStatistical
MethologyandSampling algorithm. The workof Bankier(1990) is also related.If
Division,Institut et des EtudesEconomiques, Ep(*) denotes
Nationalde la Statistique withrespect
expectation tothesampling design
18 BoulevardAdolphePinard,75675ParisCedex 14,France.Carl-Erik
Sarndalis Professor,
Departement et de Statistique,
de Mathematiques
Universitede C. P. 6128,Succursale
Montreal, A,Montreal,
QuebecH3C3J7, ? 1992 AmericanStatisticalAssociation
Canada.Thisworkwassupported in partbytheNaturalSciencesandEn- Journalof the AmericanStatisticalAssociation
Research
gineering CouncilofCanada. June 1992,Vol. 87, No. 418, Theoryand Methods
376

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
Deville and Sarndal: Calibration Estimatorsin Survey Sampling 377

p(s), a measureofaveragedistancereminiscent ofthechi- Example1. Derivation oftheratioestimator. Take Xk


isEp t >J (vwk- dk)2/dk }. Formoregenerality= Xk,a positivescalar.Then 'XX = xkX. Let us takeqk = 1/
squarestatistic
wecanletthekthtermhavean individual, Xk. We obtainX = (lu xk)/(ZS dkxk) - 1 = txl-x,
inthisexpression, 1,
knownpositiveweight1/qk, unrelated to dk,whichgives whereby Wk = dk( 1 + qkXkX) = dk(1 + X) = dktx/tXt, and
theaveragedistance t, thewell-known
from(1.6) tyreg= txty,,/ ratioestimator.
Notethattheunequalweighting qk = 1/Xk is essential for
Ep,( (Wk -dk)2/dkqk} . (1.2) obtainingthisresult.
2. A CLASS OF DISTANCEMEASURES
The uniform weighting 1/qk = 1 is likelyto dominatein
butunequalweights
applications, 1/qkaresometimes moti- In (1.2), thedistance between theoriginal weight dkand
vated;seeExample1,whichfollows. Ourobjective is tode- the new weight Wk was rather arbitrarily taken as (Wk-dk)2/
rivenewweights thatmodify as littleas possibletheoriginal dkqk. Itis natural toconsider alternative distance measures.
sampling weightsdk= -x-I, which havethedesirable property These measures should share a few basic features thatare
ofyieldingunbiasedestimates; thesurvey statisticianwants easy to accept. For element k, we consider a distance Gk( w,
to staycloseto theseweights. We thusseektheminimum d) such that ( 1 ) for every fixed d > 0, Gk( w, d) is nonneg-
of(1.2) subjectto (1.1) holdingforeverysamples. Thisis ative,differentiable withrespect to w,strictly convex, defined
equivalentto minimizing, foranyparticular s, thequantity on an interval Dk(d) containing d, and such that Gk(d, d)
Zs (Wk- dk)2/dkqk= Zsdk( Wk/dk- 1)2/qk, subject tothe = 0; and (2) gk(w, d) = aGk(w, d)/aw is continuous and
singleconstraint(1.1). In otherwords, weshouldminimize mapsDk(d) ontoan interval Imk(d)ina one-to-one fashion.
theconditionalvalueofthedistance, giventherealizedsam- It follows that gk(w, d) is a strictlyincreasing function ofw
ple s. Nothingsaysthatthenewweights willcontinueto and gk(d, d) = 0. Average distance is then measured by
giveunbiasedestimates, buta realistic expectationis to re- Ep { > Gk(Wk, dk) }. To minimize this quantity subject to
mainnearunbiasedness. Minimization leadstothecalibrated (1.1) holding for all s is equivalent to seeking the Wkthat
weight minimize, foranyparticular s, thesum>J Gk(Wk, dk)under
thesingleconstraint ( 1.1). IfXdenotesa vector ofLagrange
wk= dk(l + qkxkX), (1.3) multipliers, derivation gives
wherethevectorof Lagrangemultipliers X is determined gk(Wk, dk) -xX = 0. (2.1)
from(1.1); thatis,
If a solutionexists,our assumptions thatit is
guarantee
X=5-I(tx - ix,.), (1.4) unique.It can alwaysbe written as
of
thattheinverse
assuming Wk= dkFk(X'kX), (2.2)

Ts dkqkxkx'k. (1.5) mappingofgk(*,dk) that


wheredkFk(* ) is thereciprocal
s mapsImk(dk) ontoDk(dk) in an increasing fashion.
From
ourassumptions, Fk(O) = 1 and F'(0) > 0. The important
exists.Theresulting oftyis
estimator F',(0) playsthesameroleas qkin ( 1.2),so weuse
quantity
tyreg= z = tyir + (tx - txir)'Bs, (1.6)
the notationF',(0) = qk.

s
Inmostofourapplications, gk(w, d) = g(w/d)/qk,where
g(.) is a function of the single argument wld, independent
wheretx,= Is dkXkdenotestheHorvitz-Thompson esti- ofk, continuous, strictly increasing, and such thatg( 1) = 0
matorforthex-vector and andg'( 1) = 1. Examples are found in Table 1. Thengk(w,
d) depends on k the
onlythrough multiplicative factor1/
Bk= T-1 dkqkxkyk (1.7)
theinversefunction ofg(*),
S
qk. IfF(u) = g-'(u) denotes
(2.2) becomes
isa weighted estimatorofthemultiple regression coefficient.
wk = dkF(qkx'kX) (2.2a)
Thus,Deville's(1988) calibration techniqueachievestwo
things: (1) itprovides an alternative
derivation ofthegen- From( 1.1),thecalibration equations necessarytodetermine
eralized regressionestimator(Cassel,Sarndal, andWretman X =(1,. .., X, . . ., XJ)'are
1976;Gourieroux 1981;Samdal1980;IsakiandFuller1982;
Wright 1983)and(2) itemphasizes thatitis fruitful
toview tx = I WkXk= l dkFk(X'kX)xk. (2.3)
oftheobservations
(1.6) as a linearweighting Yk with weights s s

Wkthataresample-dependent andgivenby(1.3). Thisview It is convenient to define


is alsoheldbySarmdal (1982), whousedtheWkto createa
suitable variance estimator (see Section4), andalso
fortyreg Os5(X)= I dk{Fk(X'kX) - } Xk, (2.4)
byBethlehem and Keller(1987) and Lemaitre andDufour S

(1987). The research questionaddressedin thisarticleis whereby (2.3) canbe written as


whether estimators
usefulalternative willresultbyallowing
otherdistance measures. 4t>5(X)= t- tXlr (2.5)

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
378 Journal of the American Statistical Association, June 1992

Table 1. Examples ofDistance FunctionsGk(wk,dk) WiththeAssociated gk(wk,dk) and Fk(u)

Case qkGk(wk,dk) g(wk/dk) = qkgk(wk, dk) Fk(U) = F(qku)

1 (Wk -dk)2 2dk Wkldk 1 + qku

2 Wkl0g(Wk/dk) - Wk + dk log(wk/dk) exp(qku)


3 2(Vw.k
- k) 2{1 - (Wkldk)112} (1 - qku/2)

4 -dklog(Wk/dk) + Wk- dk 1 - (WkId)' (1 - qkU)


5 (Wk -dk)2 12wk {1 -
(Wkldk)2}1/2 (1 -2qku)

NOTE: The functionsare normalizedso thatFk(O) = qk in all cases.

The right foreverysamples. To


sideis a knownquantity weightscanbe eliminated,
buttheresulting
estimators
retain
summarize,thestepsoftheprocedure
are: theirfavorable
properties.
1. Giventhedatafortherealizedsamples and forthe Case 6. In Case2,thevaluesofFk(u) = exp(qku) range
chosenFk( ), solve(2.5) forX. Iteration
maybe required. in (0, oo). To restrict
theweights,andinparticular
toavoid
2. OnceX is determined obtaintheresulting
calibration extremely largeweights, twoconstants
specify L and U such
estimatorof tyas thatL < 1 < U, setA =(U-L)/ {( 1-L)( U-1 ) }, and
define
tyw= Wk
dkFk(XWky)Yk=
Y (2.6)
s s
Fk(U) L(U- 1) + U(1 -L)exp(Aqku)
Thisestimator oft = uYk ifthere
willgivecloseestimates Fk(u)= (U- 1) +(1 -L)exp(Aqku)
isa strongrelationshipbetween yandx. To seethis,suppose We then have Fk(-oo) = L; Fk(oo) = U; Fk(O) = 1,
thatYk = x' a forall k andsomeconstant vectora; thatis, F'k(0) = qk . ItfollowsthattheweightsWk= dkF(qkXk) are
explainedby x. Then, from(2.3), tyw
y is perfectly = tyfor restricted
byLdk < Wk < Udk. It is worthnotingthatthe
everysample,so thevarianceis nil. distancefunction Gk(Wk, dk) forthiscase is, apartfroma
Thestatistician
choosesthedistance functionGk( Wk, dk). multiplicative constant,
Or,equivalently,theuniquely function
corresponding Fk(U)
- Fk(x'kX).Examples of the formgk(w, d) = g(w/d)/qk
(x-L)log x_L + (U-x)logU-
areshownin Table 1,wherethefunctions are normalized I1-L U- I
to obtainFk(O) = 1 and F'k(O) = qk. Because I /qk is a
recurring multiplicative factor, thetableshowsqkGk( Wk, dk) withx = wk/dk.IfL is largenegativeand U is largepositive,
and qkgk(Wk, dk) = g(Wk/dk). wearecloseto Case 1. IfL = 0 and U is large,weareclose
The casesin Table 1 correspond to well-known distance to Case 2.
measures;forexample,Hellinger distancein Case 3, and Case 7. Withthefollowing modification, we can avoid
minimum entropy distancein Case 4. In Cases 1,3, 4, and thenegative weightsand thatcan arisein Case 1: Specify
5, Fk(u) is oftheform(1 + aqku)l/a, witha = 1, -1/2, theconstants L and U suchthatL < 1 < U anddefine Fk(u)
-1, -2, respectively; Case 2 is obtainedwhena 0. What = 1 + qkuif(L- 1)/qk < u < (U- 1)/qk;Fk(u) = L ifu
-

aretherelative merits ofthesecases?The existence ofa so- < (L- )/q,k;and Fk(u) = Uif u > ( U- l)/qk.The weights
lutionof (2.5) is one aspectthatneedsto be considered. wkwillthenbe restricted according toLdk < Wk < Udk. The
Cases 1 and 2 alwaysleadto a solution;in Cases3, 4, and corresponding distancefunction is as in Case 1,ifLdk < Wk
5, a solutionis notguaranteed, butResult1 on page 379 < Udk,andisdefined as infinityotherwise. Ifwechoosethat
showsthattheprobability ofa solutiontendsto 1. More L willbe positive, negative weights can neveroccur.
important perhaps is therangeofvaluesthattheweights wk
= dkF(qkx'kX)cantake.InCase 1,whichyieldstheregression Example 2. Returning to Example1,we notethatthe
estimator ( 1.6),theweights canbepositive ornegative; Cases ratioestimatoris obtainedforanygk( w,d) oftheform g( w/
2, 3,4, and5 guarantee positive weights.In eachcase,some d)/qk,ifqk = l/Xk. ThenFk(x'kX) = F(qkXkX) = F(X), a
unrealistic orextreme weights wk mayoccurforrareor"un- constant. From(2.3), F(X) = tf/t, so (2.6) givestheratio
lucky"samples.ThatCase 1 can yieldnegative weights wk estimatortyw= txty/t.x
maybe unacceptable to someusers.Case 2 mayyieldsome Example2 is exceptional in thatthechoiceoffunction
weights wk thatare extremely largecomparedto thebasic hasno bearingon theestimator. In general, differentFk(u)
sampling weights dk = X-1; again,theusermayfindthis yielddifferent estimators. However,one can expectthese
unacceptable. Onemaywanttoavoida function Fk( u) that estimatorstoproduceonlyslightly different estimatesinme-
can giveoverlyextremeweights, becauseapplying, these diumtolargesamples.Thetheoretical backing forthisclaim
weights to makeestimates forvarioussubpopulations (do- is theimportantResult5 (p. 379) stating thatall estimators
mains)mayproduce unrealistic estimatesforsomedomains. (2.6), undermildconditions on theunderlying Fk(u), are
We therefore consider a fewadditional functions thathave asymptotically equivalent to theregression estimator (1.6)
theattractive property ofyielding weights restrictedto an generated bythelinearfunction Fk(u) = 1 + qku. Thus,for
interval thatthestatistician canspecify inadvance.Extreme mediumto largesamples,thechoiceofFk(u) has onlya

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
Deville and Sarndal: Calibration Estimatorsin Survey Sampling 379

modestimpacton suchessential properties as thevariance Fk(u) = 1 + qkU + Ok(U). (2.7)


oftheestimator. In a smallMonteCarlostudywithsimple
randomsampling without replacement ofn = 200 froma If (2.5) has a solution,Xk,then tyw- tiy = Is dkyk
population ofN = 2,000,wefoundpractically no differenceX { qkX'kXs+ Ok(X'kXs)}. Now,usingcondition 3 forthevari-
invariance amongtheestimators generated byseveralofthe able YkOk(XkXs), and given the fact that maxkeUOk(u)
< N-l{(Es dkqk
functions thatwe have described. Computational conve- = O(u2), we obtain N-'Ityw - ityll
niencemorethananything elsemaythendictatethechoice X IYkI IlxkII)IlXsII} + Op(n'), where N`{Zs dkqk
X IYkIIIxkII} = Op(1), and Xs= Op(n-1/2)by Result2.
ofFk(u). We nowderiveseveralasymptotic results
thatare
neededlater.Oursetupforasymptotics thatof Result4 follows,
is essentially becausety,is design-consistent and N`
-
Fullerand Isaki( 1981) and Isakiand Fuller( 1982). This (tyi,7 ty) = p

setuphas thefollowing important features: We considera Remark. Becausetyw is thenearest estimator to ty,7in a
sequenceoffinite populations andsampling designs indexed givensense,itcanbe expected to inherit someoftheprop-
byn,wheren is thesamplesize(fora fixed-sized sampling ertiesofti,. Designunbiasedness is a property ofti,, so we
design)or theexpectedsamplesize (fora random-sizedmayexpectto findthatt,yw is at leastasymptotically design-
sampling design).The finitepopulationsize,N, tendsto unbiased(ADU). Thisproperty in
can factbe obtained, if
infinitywithn, and we assumethatforanyvectorvalued attention It
is paidtoonedetail: is notcertain that (2.5) has
variablex ofinterest tothisarticle a solution.Witha smallprobability, thereis none,and tyw
is undefined. Wetherefore modify theestimator as follows:
1. lim N't, exists.
if a if
Use tyw (2.5) has solution; itdoesnot,usety,(thatis,
2. N'(ti, - t,) -* 0 in designprobability.
setXs= 0). Thisgivesan ADU estimator. Note,forexample,
3. n'12N-'(ti, - t) converges indistribution tothemul-
thattheregression estimator 1.6
( ) is undefined ifTs is sin-
tinormalN(O, A).
gular.The usualpoststratification a
estimator,specialcase
Here(3) is tojustify theuse ofthenormalapproximationof( 1.6),isundefined ifthereis atleastonezeropoststratum
inconfidence intervalsbasedon ty,Froma practical stand- count.
point,theassumptions meanthat:(a) thecomponents of Result5. ForanyFk(* ) obeying ourconditions, given
- t, areconsidered smallandquantities on theorderof tyw
by(2.6) is asymptotically equivalent to theregression esti-
IItX- t 112 areconsidered and(b) tx - t, follows matortyreg givenby (1.6), in the sense that
negligible
N-1(tyw
anapproximately normal distribution
withcovariance matrix -
tyreg)= Op(n-1). As a consequence,the two estimators
n-1N2A. Let E >u be shorthand forZIN1 ZN,; set Akl
share thesameasymptotic variance.
-
rkl- lrk7lr Now(1), (2), and (3) imply that
Proof From(2.6) and (2.7),
nN2 I Akl(Xk/7rk)(Xl/7rl) = nN 2V(tXw)
u N'tyW = N-1ty + N l(tX- tx)'T1s dkqkxkyk
converges to the fixedmatrixA and thatN-1(tix- t,)
= Op(n-1/2). We can viewA as a matrix thatdescribes an + Op(n-1)+ N-1 2: dkYkOk(X'kXs).
asymptotic ofthesampling
effect designusedforthesurvey. s

Before proving anyasymptotic propertiesofty,wediscuss The first twotermsoftherightsideequal N-'tyreg,where


theexistence of a solutionof (2.5). Now,(2.4) definesa tyregis givenby( 1.6). The lasttermwasfoundin theproof
function ofX on C = nkkeU {X : X'kXE Imk(dk) }, a convex of Result 4 to be Op(n-'). Therefore,n"2N-1(tyw- tyreg)
domain.We assumethatC is an openneighborhood of0, = Op(n- /2), witha zeroasymptotic variance.
forevery n.Fiveresultsarenowstated andproved; theproofs
ofResults1,2, and 3 aregivenin theAppendix. 3. VARIANCEAND VARIANCEESTIMATION
Result1. Equation(2.5) hasa uniquesolution belonging Result5 statesthattyw is asymptotically equivalentto
to C, withprobability tendingto one as n oo. -
tyreg whichis the case
special of
^ywgenerated byFk(u) = 1
+ qku. ForanyFk(u) satisfying ourconditions, theasymp-
Result2. Let X,be thesolutionof(2.5), ifone exists; toticvariance(AV) of tywis thusthesameas thatofthe
otherwise, letX,be an arbitraryfixedvalue.Then,X,tends regression estimator,namely
to0 in designprobability,and Xs= Op(n-'2).
To obtainResults3,4, and5,weaddtheassumptions (a) AV(tyw) = L/kl(dkEk)(dlEl),
A (3.1)
u
maxllXk 11= M < oo, wheremax is overn as wellas overk,
and(b) maxF',(O) = M' < oo. Theassumption (b) isverifiedwhereAkl = -kl-rklr and Ek = Yk - x' B, withB satisfying
forCases 1-7. thenormalequation
Result3. We have Xs= Ts'(l -txX) + Op(n-l).
kxkx)B=) kX ( (3.2)
Result4. Thecalibration estimator
t,. given
by(2.6) is
design-consistent,
and N' (tyw- ty1)= Op(n'12). Clearly,
B minimizes
theweighted leastsquaresexpression
Proof: BecauseF'k(O) = qk,wehave,usingtheassump- SSU,= z 4'k(Yk- xkB)2 = z qkE2k. (3.3)
tion(b), u u

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
380 Journal of the American Statistical Association, June 1992

To estimate ( 3.1), theresiduals Ek cannotbe used,because bureau (I.N.S.E.E.) to produceestimatescalibratedon


B is unknown. Anestimator, Ks, is obtained bynoting that knownmarginals.
SSu givenby(3.3) is theunknown population totalofthe 4. CALIBRATION ON KNOWNCOUNTS IN
fixedquantities qkEk. The calibrated weights estimator of FREQUENCYTABLES
thistotalis SSsw = Is wkqkEk, whichis minimized bythe
vectorBwssatisfying the sample-based normaleqVation An important application ofthetechnique in thisarticle
(Is wkqkXkx)?ws s wkqkxkyk.
= Sample-based residuals occurs in connection with calibration on the known counts
nowcan be calculatedas ek Yk x'kBs. The variance
= -
(cell counts or marginal counts) of a frequency table inany
estimator thatwe advocateis givenby number of dimensions. Here we limit ourselves to a brief
discussion oftwo-way tables.We assumedistance measures
V(tyw)= Z (Akllrkl)(wkek)(wlel). (3.4) oftheformgk(w, d) = g(w/d) and qk = 1 forall k. This
impliesthatFk(u) = F(u) = g`(u). Withr rowsand c
The calibration weights wkareusedin (3.4) to weight the columns,thepopulationelements are classified intor X c
residuals ek. Froma strictly design-based pointofview,one cells-for example, individuals classified into agegroupby
can simply weight theekbythestandard designweights dk socioprofessional category. Suppose the typical population
and obtaina design-consistent varianceestimator. When cell,Uij,contains Nijelements: i = 1,.. ., r; j = 1,..., c;
model-based as wellas design-based properties areconsid- so thatN = ij Nij,where j means I I. We
ered,however, thereis reasonto prefer thewkoverthedk. can distinguish (a) calibration on known cell counts Nij,
The model,(, thatunderlies theregression estimator (1.6) whichmaybe called"completepoststratification" and (b)
statesthatE( Yk) = A'Xk, VJ( Yk) = -2. Now(3.4) notonly calibration onknownmarginal counts, whichmaybe called
is a design-consistent varianceestimator butalso is nearly "incomplete poststratification." In case (a), thecalibration
model-unbiased forthemodelmeansquarederror,E(tyw estimator (2.6) equals the well-known poststratification es-
- t)2, as Sirndal,Swensson, andWretman (1989) showin timator with the r X c cellsas poststrata; this holds for any
thecasewherecr is oftheforma'xk, fora constant vector distancemeasuresuchthatgk( w, d) = g( wld). Case (b) is
a. Undersimplerandomsampling without replacement, the moreinteresting, becauseherethetheory ofcalibration es-
modelbiasof(3.4) is negligible ifthesampling fraction is timators leads to a new classof estimators corresponding to
small. the class of distance measures discussed in Section 2. These
estimators canbe described as generalized raking procedures;
Example 3. Let us returnto tyw = ttfi/t, in Example theclassicalrakingratiois a simplespecialcase.Case (b)
2. UnderSRS,(3.4) yields has at leasttwoimportant practical applications. First,the
= 1 -f Ise marginal counts are known, but the cell counts Nij arenot.
u) The marginal counts may stem from different sources; for
V( tyw)= (x ) n-1 I
example, agegroupcountsfrom onedatafileandprofessional
groupcountsfromanother, withcross-classification counts
whereek = Yk -Bsxk withJ3s= (Is Yk)/(lsXk). Thisis an
lacking. Bynecessity, calibration isthenon theknownmar-
often-recommended varianceestimator fortheratioesti-
ginals.Second,there aresomezeroorextremely smallsample
mator.
cellcounts.Calibration onthecellcounts, although perhaps
To calculate thecalibration estimator (2.6), wemustfirst
feasible, isabandoned infavor ofthemorereliable calibration
solve(2.5) forX.A noteon thecomputational aspectsis in obtainedfromtheknownmarginals. (The needto calibrate
order.We suggest an algorithm basedon Newton'smethod; on
marginal countsrather thanoncellcountswouldbe even
in theexampleswherewe triedit,convergence was quick. morestrongly feltfora tablewiththreeormoredimensions.)
Let 05(X) = d)5(X)/dX. StartwithXo = 0. Subsequentit-
To calibrate on themarginals, first identify thexk-vector
erativevalues,X,,v = 1, 2, * are obtainedby
havingtheproperty suchthatE u Xksummarizes (and does
k1+1= X^ + {0'(Xv)} -
{tx ix, - s(Xv)} (3.5) not go beyond) the population totals used in the calibra-
tion-in thiscase,themarginal counts.It is easytoseethat
From(2.4), 4s(0) = 0; 04(0) = Ts. Thefirst iteration gives Xk = (61 k, * , rr-k 6- Ik * * 66-ck)II where6i.k = 1 ifthe
. .

= T(tx - tx1); subsequentiterations, v = 2, 3, element k is in rowi and0 otherwise, and 6 .jk = 1 ifk is in
obey(3.5) untilconvergence. NowX1isthevector (1.4) that columnj and 0 otherwise. Then,IU Xk = (N1+, ..., Nr+,
yieldstheregression estimator ( 1.6). Thus,( 1.6) is a first N+1, . . . , N+c)', whereNi+ = I jq=INij, N+j = I ir=INij.Letting
approximation to(2.6); Result5 showedthemtobe asymp- X = (ul, . . , Ur,V,. . , vC)',we havex'kX= ui+ whenever
vj
totically equivalent. Putsomewhat differently, thefirst ad- k belongsto cellij. Thatis, F(x' X) = F( ui + depends
vj)
justmentof X is essential, buttheremaining adjustmentson thecell but noton thelabelwithinthecell. WithNg
arerelatively unimportant. IfF(u) = 1 + u is thechosen = 1 thecalibration equations(2.3) taketheform
jIrk,
function, theiteration stopsafterthefirst step.(Note:For
caseswhereF- ' mapsC ontoan interval I ofDR, one must c

checkthatx'X,+, reallybelongsto I. For instance,if NjF(ui + vj) = Nj+


g (i = l, ... ., r) (4.1)
xXk\+,2 SUp I, itisa goodideatoreplace \^+ byX?+,= \
j=1

+ O@,( ^+- X^)forsomeO,^< 1 suchthatX?+,is nearthe and


borderofthesetofpermissible values.)A SAS computer r
program bySautory ( 1991), basedon( 3.5), isnowroutinely , NijF(u, + Vj) = N+1 (j = 1, . .., c). (4.2)
usedincertain largesurveys attheFrenchnationalstatistics i=l

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
Deville and Sarndal: Calibration Estimatorsin Survey Sampling 381

This systemis to be solvedforul, .. ,Ur, vl, . . , vc,using Note the propertiesN`04)(O) = 0; 4(O) = 0, and N-')',(0)
thefunction F( *) chosenbythestatistician. Iterativesolution = N-'Ts; 4'(0) = T = limN` >3UXkX'. NowforeveryX,4' is a
is oftenrequired. One ofther + c equationsis redundant,positivedefinite matrix, becauseall Fk are increasing functions.
so it is possibleto fixone component-say, vc 0-and
= Consequently, 4 is injectiveand maps C* onto an open neighbor-
the
solve system =
fori 1, . . ., r; j = 1,. . ., c-1. Note hood of 0 in RIV.
Let B be a closedsphere with radius r contained
thatui + Vjremainsinvariant to the elimination of one in thatneighborhood, and letA be thecompactset+-1'(B). The
inversefunction +-l isdefined onB, continuous, andcontinuously
equation.Havingobtainedtheui and vj,we calculatethe differentiable. Then l4)-'(x) 1 is continuously differentiableand
cell factorsF(ui + vj), thecalibrated cell countestimates boundedon B. LetK = maxxEBllI()'(x) 11.
Ni, = Ni F(ui + vj), and the calibrated weights wk
- dkNtJ/ Nij. Finally,the calibrationestimator,obtained 1.2 Properties of N-14 (X)
from(2.6), is We needa resultthatjustifies theuseofan inverse mapping of
Os. Such a resultis obtainedin this section.All functions
tyw- z =kyk- z z N iJYs,1J (4.3) N-10'4(X)aredefined on C* andtherefore onA. Fora continuous
s ij #definedon C*, let iI4iIM= sup;EM,l1m(X)IIforM compactin C*.
Byourgeneral propertiesofconvergence, wehaveforeverye > 0
whereY = (5s,, dkYk)/9jj. Thecellcountestimates NT are thatPn( 11 N-'Os - 4)A < e) -- 1 whenn increases. Now let4)1
oftensubstantial improvements on thenaiveestimates Nij. = N-'Os for some function verifying
Ill - 4)IIA< 3r; 11)'1- O) IIA
In fact,the estimator (4.3) can be nearlyas efficient as i3K,with0 <[ < I. The probability
O ofthiseventtendsto 1 as
I; Ii NijY,, thepoststratified estimator formed whenthe n increases.Let r1= (1 - f)r, and let B1 be the sphere1lxii < r
Niiareknown.Iftheeffects onyoftherowsandthecolumns in RJ.Now X1 maps the frontierof A onto the crownr, < lixii
areadditive (i.e.,theinteraction effects
arenegligible), then ? r( 1 + [3),and XI)(A) is a bordered manifold homotopic to B.
(4.3) and I Zij Nijjs,havevirtually identicalvariances. These notions arediscussed in Trenoguine (1987). Consequently,
Efficiency,varianceestimation, computational aspects,the 41 (A) coversthesphereB,-in otherwords,foreveryx 6 B,, the
=
occurrence ofextremeweights, andtheuse ofspecialfunc- equation 41(X) x has a (unique) solution.Moreover,4)l1, de-
finedon B,, is a continuously differentiablefunction.Because
tionsF( *) torestrict
therangeoftheweights (as in Cases6
Ii4,- )'11< [K foreveryX in C, (041 )'(x) existsforeveryx
and7) areaspectsofgeneralized raking
thatwediscussin a EB,, and
II4y1'(x)II<?IxIIK(1 - [-f
forthcoming paper.Thetheory inSection2 permits
discussed
a widechoiceoffunctions F( *). Somesimplespecifications2. Proofsof the Three Results
ofF( *) correspond towell-known procedures:
First,thelin- Result1. First,N-' (ti, - t,) = z belongsto B1witha probability
ear functionF(u) = 1 + u yieldsadditivecell factors,F(ui to 1. Second,N- 14)hasan inverse
tending on B, witha
function
+ vj) = + ui+ v, andtheweights arewk=dk(1+ ui+ vj) probabilitytendingto 1. As (2.5) can be writtenN` (t - tx,r)
fortheelementskincellij. Theseweights arenotnecessarily= N-4)s(X), theequationhas a uniquesolutionwithprobability
Thecalibration
positive. equations to 1.
(4.1) and(4.2) thatresult tending
fromthiscase werepresentedin Demingand Stephan Result 2. Let XA= (N-'O))-'(z) if z belongs to B,; other-
(1940). Second,theexponential caseF(u) = exp(u) gives wise,Xsis arbitrarily
defined.Since4s(0) = 0, we haveXs- 0
cell factorsF(ui + vj) = exp(ui)exp(vj), and
multiplicative = (N-1'0') `(z) -(N-AT1)-1(0) and lxiis?< IIzlK( [)I 1- i

thealways-positive weights are wk = dkexp(ui+ vj). The This inequality holdswithprobability tending to 1 whenn in-
solutionto (4.1) and (4.2) in thiscase can be obtainedby creases.But z = Op(n- 2), so thereexistsa constantK' such
carrying out (untilconvergence) theclassicalrakingratio thatPn(izil? K'n-/2) -* 1. Combiningthe twoinequalities,
< 1 1,whichimpliesbydefinition
algorithm ofDemingandStephan( 1940),sometimes called Pn(IIXsII KK'( -f[)3n-1/2)
-*

thatXs=Op(n-l/2
iterativeproportionalfitting.In practice,the procedureis
sometimes stoppedaftertwoiterations. As pointedout in Result3. Let Ok(U) = Fk(u) - 1 - qku. We assumethat Ok(U)
Huang( 1976),DemingandStephan suggestedthealgorithm Q(u2) holdsuniformly,
= whichis equivalent to ourassumption
apparently thinkingthatitconverges tothesolutionforthe thatF',(O) is bounded.
uniformly Thus 0(u) = max 0k(U) = O(u2).
foranye > 0, thereexistsK" suchthat,forall k, I u1
linearcase,forwhichtheyhadpresented theequations.This Otherwise,
< e willimplythatMk(U) < K'u2. We can write(2.5) as t-,
waslaternotedbyDeming( 1943).
= Es dkxk{ qkx'kXs + Ok(X'kXs)}, and thereforeXs- Ts-I(tx- ti,)
= -T-1 Is dkxkOk(x'Xs). ForXssufficiently small,
APPENDIX:PROOFS OF RESULTS1, 2, AND 3
1Xs - T;1(tx - tx)1II < II(N- Ts)-I1K{llN 1 z dkllxkiI3I lxlsII2.
1. MathematicalPreliminaries
(2) inSection
Here,11(N-1Ts)-f1 = Op( 1), and,usingassumption
1.1 TheFunction
X and ItsProperties 2, N-1 Es dkilXkli3 = Op(l). Moreover,by Result2, II1s 2
= 0p(n1). Result3 follows.
{ :
LetC, = nfl xkX E Imk(dk) }, wheren isoverk E U, the
finite
populationassociatedwiththe(expected)samplesizen. The [ReceivedNovember1989. RevisedJuly1991.1
C? ofCnis an openconvexsetcontaining
interior 0 foreveryn.
Moreover, C* =nn-l C? iS convex;weassumeitis alsoopen.Let REFERENCES
EXnandP,,denoteexpectation andprobability,
withrespect to the
samplingdesignindexedbyn. ForXE C*',N' tE, { 4)(X) } isa well Bankier,M. D. (1990), "Generalized Least Squares EstimationUnder
Poststratification,"
in Proceedings oftheSectionon SurveyResearch
definedcontinuouslydifferentiable
function.
Byourassumption 3 Methods,American Statistical
Association, pp.730-755.
appliedto thevariableFk(xX), it convergesto a fixedfunction Bethlehem,J.G., and Keller,W. J. (1987), "Linear Weightingof Sample
denoted4). Convergence is uniformon everycompactsetin C*'. SurveyData," JournalofOfficialStatistics,3, 141-153.

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions
382 Journal of the American Statistical Association, June 1992

Cassel,C. M., Sarndal,C. E., and Wretman,J. H. (1976), "Some Results le Redressement des Enquetespar Sondage,Annalesde l'INSEE, 22-23,
on GeneralizedDifference Estimationand GeneralizedRegressionEsti- 272-282.
mationforFinitePopulations,"Biometrika,63, 615-620. Sarmdal, C. E. (1980), "On ir-Inverse
WeightingVersusBestLinearWeight-
Deming,W. E. (1943), StatisticalAdjustmentofData, New York: John ingin ProbabilitySampling,"Biometrika,67, 639-650.
Wiley. (1982), "ImplicationsofSurveyDesignforGeneralizedRegression
Deming,W. E., and Stephan,F. F. (1940), "On a LeastSquaresAdjustment Estimationof LinearFunctions,"JournalofStatisticalPlanningand In-
of a Sampled FrequencyTable whenthe ExpectedMarginalTotals are ference,7, 155-170.
Known,"TheAnnalsofMathematical 11,427-444.
Statistics, Siirndal,C. E., Swensson,B., and Wretman,J. H. (1989), "The Weighted
Deville,J.C. (1988), "EstimationLineaireetRedressement surInformations ResidualTechniqueforEstimating theVarianceoftheGeneralRegression
Auxiliairesd'Enquetespar Sondage,"in Essais en l'Honneurd'Edmond Estimatorof theFinitePopulationTotal," Biometrika,76, 527-537.
Malinvaud,eds. A. Monfortand J. J. Laffond,Paris: Economica, pp.
Sautory,0. ( 1991), "Redressements d'Echantillonsd'Enquetesaupresdes
915-927.
Menagespar Calage surles Marges,"Internalreport,DivisionMethodes
Fuller,W. A., and Isaki,C. T. (1981), "SurveyDesign Under Superpopu-
Statistiqueset Sondages,INSEE, Paris.
lationModels," in CurrentTopicsin SurveySampling,eds. D. Krewski,
J.N. K. Rao, and R. Platek,New York: AcademicPress,pp. 199-226. Trenoguine,V. (1987), AnalyseFonctionnelle, Moscow: EditionsM.I.R.
Gourieroux,C. (1981), The'oriedes Sondages,Paris:Economica. Wright, R. L. (1983), "FinitePopulationSamplingWithMultivariate Aux-
Huang,E. (1976), "RegressionEstimationofMeans and TotalsforSample iliaryInformation," JournaloftheAmerican Statistical
Association,
78,
SurveyData," unpublishedPh.D. thesis,Iowa StateUniversity. 879-884.
Isaki,C. T., and Fuller,W. A. (1982), "SurveyDesignUndertheRegression Zieschang,K. D. (1986), "A GeneralizedLeast SquaresWeighting System
Superpopulation Model,"Journal oftheAmerican Statistical
Association, fortheConsumerExpenditureSurvey,"in ProceedingsoftheSectionon
77, 89-96. Survey Research Methods, American Statistical
Association,
pp.64-71.
Lemaitre,G., and Dufour,J.(1987), "An Integrated MethodforWeighting (1990), "Sample WeightingMethodsand Estimationof Totals in
Personsand Families,"SurveyMethodology,13, 199-207. theConsumerExpenditureSurvey,"JournaloftheAmericanStatistical
Lemel, Y. (1976), "Une Generalisationde la Methodedu Quotientpour Association,85, 986-1001.

This content downloaded on Sat, 16 Feb 2013 15:00:47 PM


All use subject to JSTOR Terms and Conditions

You might also like