Unit-16 IGNOU STATISTICS
Unit-16 IGNOU STATISTICS
16.1 INTRODUCTION
In Unit 15, you have been introduced to the problem of point estimation and also to
some basic concepts of the theory of point estimation. There we have also discussed
two desirable properties of an estimator, viz., unbiasedness and consistency. In this
unit, the problem of point estimation will be discussed in greater detail. To begin
with, we shall introduce some more concepts. Next, some methods of point
estimation are discussed. In particular, we shall concentrate on two methods of
estimation that are used widely in practice, viz., the method of moments and the
method of maximum likelihood. The first one is easy to implement in practice and
fi.1 the latter leads to estimators with "good" properties.
%
Objectives
After reading this unit, you should be able to*;
where E, denotes the expectation taken when 8 is the parameter and SZ is the
parameter space
Var, (TI) s Var, (T2) for all 8 E 52 and with strict inequality for at least one 8 E 52 .
-
Definition 3: For a fixed sample size, n, T T (XI, X2, . . .&) is called a
minimum variance unbiased estimator of g (8) if (i) E, (T) = g (8) for all 8 E SL,
i.e., T is unbiased for g (€I), and (ii) Var, (T) s Var, (T') for all 0 E SL with strict
inequality for at least one 8 E SL, where T' is any other estimator based on
XI, X2, . . .Xn satisfying (i).
How do we locate a minimum variance unbiased estimator in a given problem ?
From definition 3 alone, it may be a very difficult task, if not impossible, to find a
minimum variance unbiased estimator. The following example illustrates this fact.
random variable X. The joint probability density or mass function of XI, X2, . . . & I:
I
for given 8, is I'
I t
where xl, x2, . . .x, are a realization of XI, X2, . . . & for the given sample. If 8 is
unknown and varies over SZ, I., (8) may be regarded as a function of the variable 8,
and is called the likelihood function of 8.
We sha!I1 henceforth assume that X is continunu?;and hence f (x ;0) is a probability
density function. The likelihdd function based on the sample XI, X2,. . . X,, is
II -
unbiased for 0, then B (8) 0. Now,
Ee[g(X)l<m.Let
B(8) = E,[g(X_)l-8
B (0) is called the bias of the estimator g ( 5 ) in estimating 0. Clearly, if g ( X_ ) is
The function ln L, (0) is called the score function based on the obserVations
d0
XI,X2, . . . Xn. Now, since f (x ;0) is a density function, we have
\
for all 0.
f
For brevity, we write the above equation as
and
a
d
4 (8) - ( -&1.I.,,, (8)) 4 (8). we can write (3) and (4) alternatively as
I
I
I
-1nL,(8)
I L,@dx_ - 0 (5)
d
and $g(x)z[lnL,(~)]~(~)dx_-l+B'(~) (6)
A
respectively.
..
Since L, (8) is the joint density of XI, X2, . ,X,, when 8 is the parameter, the
relations (5) and (6) may be written in terms of expectations, as
and
( 10)
Let U - d
g(Z)-e,V = ~ l n k ( 8 )
2
where I,, (8) = E,[$ Lo 4 (@)I. I,, (8) is called the Fbhainformation in the
sample (X,, X2, . . . ,X,,). The equality (11) is known as the Cramer-Rao
1
inequality.
It can be shown that Point Eatlmatlon
Ifg(X)isunbiasedfor8,thatis,ifEe(g(~)) - 8,then
E,[~(x)-~]' -
Var,[g(X)]andB(8)
for an unbiased estimator g ( X ) of 8, we have
-
0andhenceB1(8) - O.Thus,
Note that it is possible that there exists a uniformly minimum variance unbiased
estimator for 6 (8) but the vakiance of this estimator does not attain the Cramer -
Rao lower bound.
-Ee[-$hf(~;€l)]~
.
Example 2: Let XI, X2, . . ,X, be a random sample from a normal population with
unknown mean p and variance unity. The density function of a normal random
variable with mean p and variance unity is
Elements of Statistical
Infemnce
and thus
Var, ( ) - i- 1
l/n. Therefore, Var, ( ) attains the Cramer-Rao lower bound and %
is the U M W E of p. It can be shown that there is only one such UMWE, that is, 51
is the unique UMVUE of p.
Example 3: Let for n r 3, XI, X2, . . . ,Xn denote a random sample of size n from a
d
-lnf(x;A)
dA
- -1+x/l,
d2
and -lnf(x;A) = -xA2
dA2
E l ) Let XI, X2, . . . ,Xnbe independent Bernoulli random variables, that is,
XI, X2, . . . ,X, are independent random variables with P (Xi 1) p,
P(Xi = Q) 1-pfori = 1,2,..., n.Showthatif
- -
-
5
:Then, E0 [ T (X) ] - 1.P0 [X - 01 - e-', so that T (X) is unbiased for e-e Also,
VarO(T(X)) - e
-0
-e
-20
- e
-0
(1-e-').
.'
Now, the probability mass function of X is
P
andthus,lnf(x;Q) - f ( x ; 0 ) e-e8X/x!
-0+xln0-ln(x!),
-
I I
Also, 6 (0) - e- ',so that 6' (0) =
d
6 (0) - - ce.Hence, the Cramer-Rao
t lower bound to the variance of T (X), using (14), is
0 /
-
But Yare [ T (X) ] e- (1 - e- ') > 0e-20 for 0 > 0. Thus, T (X), though unbiased
- -
for 6 (0) e e, has a variance larger than the Cramer-Rao lower bound. However,
'Irf
it can be shown that T (X) is the only unbiased estimator of 6 (8)' e and hence -
Is the U M W E of e-e.
?4#
We now bring in another important concept, namely; that of sufficient statistic and
touch upon it briefly. Let X be a random variable having probability density (or,
mass) function f (x ; 8) and XI, X2, . . . ,Xn be independent observations on X that
is, let XI, X2, . . . ,X, be a random sample from a mulation with density (mass)
function f (x ; 8). The joint distribution of ( XI, X2, . . . ,Xa ) clearly depends on 0.
Is it possible to find a statistic (a furiction of ( XI, X,, . . . ,X, ) which contains all
the "information" about 0? Such a question becomes relevant when we want to
summarize the available data, because storing large bodies of data is expensive and
might give rise to errors of recording etc. Moreover ,it is unnecessary if we are able
to summarize the data without losing any "information". A statistic containing all
information about 0 is called a sufficient statistic. We give below a praise
definition.
-
Elements or Stattstlcal A statistic T = T ( XI, X2, . . . ,Xn ) is said to be a sufficient statistic for the
Inference
parameter 8 if the conditional distribution of ( XI, X2, . . . ,X, ) given T does not
depend on 8.
From the above definition, it is clear that if there is a sufficient statistic for 0, then
since the conditional distribution of XI, X2, . . . ,X,, given the sufficient statistic is
independent of 8, no other function of the observations can have any additional
information about 8,given the sufficient statistic.
be the r-th sample moment. Suppose & = E (X ' ) exists for r = 1, <. . . ,k. The
mgthod of moments involves solving the equation
m,' = %' (el, E)2, . . . . . ,eL)., 1s r s k.
Example 5: Let XI, X2, . . . ,X,,be a random sample from a normal population with
-
mean p and variaace u 2. Here, the parameter 9 (p, o 2 ) is 2-dirhensional. In
.order to obtain the method of moments estimators of p and u 2, we equate the first
I
two sample moments to the corresponding population moments, that is,
mll - n-I
n
Exi
i- 1
--
XisequatedtoE(X) - p
*
andm,' - - EX:
n
-
1
n
i 1
'
isquatedto~(x2) - $+a2.
" 2
- -X is an unbia
estimator of p but & is not unbiased for a 2.
However, both p and a are co~.,-.ent estimators of p and a 2 respectively.
A
.. - . I
Polnt fithation
I
Example 6: Let XI, X2,. . . ,X,be a random sample from a uniform distribution
with density function
- 0, elsewhere.
Instead of equating mt1 to ptl and m i to pt2,we may as well equate mtl to pP1
i-1
E2) Let XI, X2, . . . ,Xn be a random sample from a Poisson distribution with
parameter A. Obtain two estimators of A using the method of moments.
E3) Let XI, X2, . . .,X, be a random sample of size N from a binomial population
with parameters n and p, both unknown. Obtain the method of moments
estimators of n and p.
d
Therefore, -ln 4 (8) 1
de
- $ = 0 provided $ - -
n-I
n
i 1
Xi.
2
-
In order to verify whether L,(8)is indeed maximum at 0 go, we compute the
second derivative of In L,(8) at 8 = go and check whether it is negative . Here,
d2
and clearly, 7 in L,(0) I a < 0. This shows that 4 ( 8 )is maximized at
do
8 - 8, =
i-1
X/n. Since there is a unique maximum for L,( 8 )and the maximum
Equating these two partial derivatives to zero, we get the likelihood equations.
These equations have unique solutions
The verification of the fact that these solutions actually maximize the likelihood
G
function is left to the reader. Hence, and 6 are the maximum likelihood
2
estimators of p and a respectively.
E4) Let XI, XQ,. . . ,X,, be a random sample from a population with density
function
I = 0, elsewhere.
j Find the maximum likelihood estimator of 8.
I
In the case of a scalar parameter, the likelihood function is a function of one
variable (as in the case of Example 7) and if this function is twice differentiable in
the domain of its definition, then one can use the methods of Calculus to find the
maximum. However, if the parameter 8 is a vector parameter, the likelihood
function is a function of several variable and finding the points of maxima of such
functions might be difficult in general. In such cases, special methods, depending
lon the problem on hand are needed. Of course, it is possible that the likelihood
function may not be differentiable at all and in that case also, we might have to
resort to special techniques. The following example is an illustration of such a
situation.
Example 9: Let XI, X2, . . . ,X,,be a random sample from a uniform distribution
with density function
- 0, otherwise
We can write the likelihood function alternatively as
L, (8) - 8-', if 0 r xfn,r 8
Where x(,) is the largest observation in the sample. The derivative of L, (8) does not
vanish and hence, we cannot use the methods of Calculus to g ~a tmaximum
likelihood estimator. However, L,(8) attains its maximum at 8 = x(,) and x(,) is
the unique maximum likelihood estimator of 8.
There is another way to look at the same problem. Since 4 (8) €I-', 0 r Xi s 0
-
is an ever-decreasing function of 8, the maximum can be found by selecting 8 as
...
small as possible. Now, 8 r X, for i 1,2, . . ,n and in particular, 8 a x(,).
Thus, 4 (8) can be made no larger than l/x;,) and the unique maximum likelihood
estimator of 8 is xfn).
Are maximum likelihood estimators unbiased and unique h every situation?
The answer to both the above questions is in the negative. That maximum
likelihood estimators need not be unbiased is demonstrated by making an appeal to
n
Examples 8 and 9. In Example 8, we had seen that 6' (Xi - z)2is the
i-1
maximum likelihood estimator of a 2, the variance of a normal population with
unknown mean p. Clearly, this estimator is not unbiased for a 2. Again, in Example
9, it was demonstrated that x(,), the largest observation in the sample is the
maximum likelihood estimator of 8. But, it can be shown that
Eo (4)) = nW(n + I), so that xo is not unbiased for 8.
To see that maximum likelihood estimabr need not be unique, consider the
following example. a
Example 10: LRt XI; X., . . . ,& be a random sample from a uniform distribution
I :, :1
over 8--
L
likelihood is
8 + - ,where8isunknowi;and8ES2 = ( x : - o o s x < m ] . ~ h e
4
- 0; otherwise ;
- 0, otherwise.
Thus, ;
,(8) attains its maximum provided
or, when
8 s min (XI, . . . . , . q)+ 1/2
and . . . . .,&)- 1/2.
8 r max (XI,
This means that any statistic T (XI, . . .,X,,)satisfying
m a x i - 1 / 2 r T(X1,....., &)sminX,+1/2
i
lies in the interval max Xi - 1/2 r T s min Xi + 1/2. Thus, for any a, O< a < 1,
i i
- -
the above estimator is a maximum likelihood estimator of 8. In particular, for
a 1/2, we get an estimator T1 ( rnax X; + min X, )/2 and for a = 1/3, we
i i
get the estimator T2 = ( 4 max Xi + 2 min X; - 1 )/6.
i i
I ,
4
The important properties of maximum likelihood estimators are that under certain
, regularity conditions, these estimators are
(i) Consistent
(ii) Asyinptotically efficient
(iii) Asymptotically normal with mean 8 and variance I/[ nI (8) 1.
The third property says that for large samples, the distribution of the maximum
likelihood estimato& of 8 is approximately normal with mean 8 and variance
I/[ a1 (8) 1.
The exact statements of the above results and their proofs are beyond the scope of
this course and are therefore not given here.
&*. -
6
-"(I
$
s
d
?
15.4 SUMMARY
i
var, (S/n) - 2
n-2
i-1
Var, (x,) = n- 2 [ np (1 - P) I = P (1 - p)/n. Hen-
S/n is the U M W E of p. %
E2) Here XI, X2, . . . .Xnis a random sample from a Poisson distribution with
parameter A. Hence E (Xi) h for i - -
1,2, . . . ,n. Equating the sample
mean to the population mean leads to the following equation:
Since A > 0, a unique positive solutidn of the above equation gives the second
moments estimator of h as
E3) We are given that XI, X2, . . . . XNis a random sample from a binomial
population with parameters n and p, both unknown. We know that if X has a
binomial distribution with parameters n and p, then
E, (XI - np, Varp (X) - np ( 1 - p).
equate the fmt two sample moments K1 z-
N
1 1
X, and K' 2
N
i-1
to the first
Pdnt EdWtlon
E4) Here XI, X2, . . . .X,,is a random sample from a population with density
function
- 0, elsewhere
Therefore, the likelihood function is
d
and dB ln 4 (8) - -dB
n
+ 2 X,/e2.
x- 1
d
Equating -In L, (8) to zero, gives on solving for 8,
de
which is negative at 8
estimator of 8.
-6-R Hence is the maximum likelihood
Elements of Statistical
Inference
= 0, otherwise.
. You are given that if X is normal with mean zero and variance a 2 ,
i
3. Let XI, X2, . . . . X, be a random sample from a distribution having finite mean
4
- 0, elsewhere.
Obtain a maximum likelihood estimator of 8.