0% found this document useful (0 votes)
17 views140 pages

Adv Stat II

This document is a lecture outline for a course on Advanced Mathematical Statistics II that focuses on hypothesis testing. It introduces key concepts like hypotheses, type I and II errors, and non-randomized tests. It then discusses uniformly most powerful (UMP) tests, including the Neyman-Pearson lemma, UMP tests for one-sided and two-sided hypotheses, and UMP unbiased tests. Finally, it covers likelihood ratio tests and their asymptotic properties, as well as applications to specific families like the normal distribution. The document is a detailed overview of hypothesis testing methods and theory.

Uploaded by

Widmung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views140 pages

Adv Stat II

This document is a lecture outline for a course on Advanced Mathematical Statistics II that focuses on hypothesis testing. It introduces key concepts like hypotheses, type I and II errors, and non-randomized tests. It then discusses uniformly most powerful (UMP) tests, including the Neyman-Pearson lemma, UMP tests for one-sided and two-sided hypotheses, and UMP unbiased tests. Finally, it covers likelihood ratio tests and their asymptotic properties, as well as applications to specific families like the normal distribution. The document is a detailed overview of hypothesis testing methods and theory.

Uploaded by

Widmung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

Advanced Mathematical Statistics II

(Hypothesis testing)

Dr. Bing-Yi JING

Department of Mathematics
Hong Kong University of Science and Technology

August 4, 2008
Contents

1 Introduction and Basic Concepts 4


1.1 Hypotheses and Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Type I and II Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Non-randomized tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Normal example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.2 Binomial example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 The uniformly most powerful (UMP) tests 11


2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 UMP tests for simple hypothesis: Neyman-Pearson Lemma . . . . . . . . . 11
2.3 The relationship between the power and the size . . . . . . . . . . . . . . . 15
2.4 Some examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Binomial example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Normal example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Testing Normal Against Double Exponential. . . . . . . . . . . . . . 18
2.5 The Neyman-Pearson Lemma in terms of sufficient statistics . . . . . . . . 19

3 UMP tests For One-Sided Hypotheses 20


3.1 One-sided hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Monotone Likelihood Ratio (MLR) . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.1 Examples when MLR exists . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 A UMP test may exist even if an MLR does not . . . . . . . . . . . 28
3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 UMP tests for two-sided hypotheses 31


4.1 Two-sided hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 An example where a UMP test does not exist . . . . . . . . . . . . . . . . 31
4.3 Generalized N-P Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.1 Revision on N-P Lemma . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3.2 Generalized N-P Lemma (GNP Lemma) . . . . . . . . . . . . . . . 33
4.4 UMP tests for one-parameter exponential families. . . . . . . . . . . . . . . 35
4.4.1 Uniqueness of UMP tests . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5 UMP tests for totally positive families. . . . . . . . . . . . . . . . . . . . . 43
4.6 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6.1 Normal example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.6.2 A non-regular example∗ . (Optional) . . . . . . . . . . . . . . . . . 46
4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

1
4.8 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Unbiased Tests. 48
5.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 UMPU for One-parameter exponential family . . . . . . . . . . . . . . . . 50
5.2.1 Case I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.2 Case II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.3 A lemma used in the proof of the last theorem. . . . . . . . . . . . 54
5.2.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 UMPU tests for multiparameter exponential families . . . . . . . . . . . . 58
5.3.1 Complete and Boundedly Complete Statistics . . . . . . . . . . . . 58
5.3.2 Similarity and Neyman Structure . . . . . . . . . . . . . . . . . . . 61
5.3.3 UMPU tests for multiparameter exponential families . . . . . . . . 62
5.3.4 UMPU tests for linear combinations of parameters in multiparam-
eter exponential families . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.5 Power calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.6 Some examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Unbiased tests for special families (e.g., Normal and Gamma families) 72
6.1 Ancillary Statistics and Basu’s Theorem . . . . . . . . . . . . . . . . . . . 72
6.2 UMPU tests for multi-parameter exponential families . . . . . . . . . . . . 76
6.2.1 Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.2 UMPU tests for special families . . . . . . . . . . . . . . . . . . . . 77
6.2.3 Some basic facts about bivariate normal distribution . . . . . . . . 80
6.2.4 Application 1: one-sample problem. . . . . . . . . . . . . . . . . . . 82
6.2.5 Application 2: two-sample problem. . . . . . . . . . . . . . . . . . . 85
6.2.6 Application 3: Testing for independence in the bivariate normal
family. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.7 Application 4: Regression. . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.8 Application 5: Non-normal example. . . . . . . . . . . . . . . . . . 93
6.3 The LSE in linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

7 Hypothesis Testing by Likelihood Methods 96


7.1 Likelihood Ratio Tests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.1.1 Definition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.2 LRT with no nuisance parameters. . . . . . . . . . . . . . . . . . . . . . . 98
7.2.1 Some examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.2.2 LRT in one-parameter exponential family. . . . . . . . . . . . . . . 100
7.2.3 LRT with non-exponential family (with no nuisance parameters) . . 102
7.3 Equivalence of LRT and Neyman-Pearson test when both exist . . . . . . . 103
7.4 LRT with nuisance parameters . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.4.1 Examples with multiparameter exponential families . . . . . . . . . 104
7.5 Bad performance of LRT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.6 Asymptotic χ2 approximation of the LRT. . . . . . . . . . . . . . . . . . . 113
7.6.1 Why asymptotics? . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

2
7.6.2 Review: Asymptotic properties of MLE . . . . . . . . . . . . . . . . 116
7.6.3 Formulation of the problem. . . . . . . . . . . . . . . . . . . . . . . 117
7.6.4 Asymptotic χ2 approximation to LRT . . . . . . . . . . . . . . . . . 118
7.7 Wald’s and Rao’s tests and their relation with LRT . . . . . . . . . . . . . 123
7.8 LRT, Wald’s and Rao’s tests in independent but non-identically distributed
r.v. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.9 χ2 -tests for multinominal distribution . . . . . . . . . . . . . . . . . . . . . 125
7.9.1 Prelimanaries with the multinominal distribution . . . . . . . . . . 125
7.9.2 Tests for multinomial distribution . . . . . . . . . . . . . . . . . . . 126
7.9.3 Application: Goodness of fit tests . . . . . . . . . . . . . . . . . . . 131
7.10 Test of independence in contingency tables . . . . . . . . . . . . . . . . . . 134
7.11 Some general comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

3
Chapter 1

Introduction and Basic Concepts

1.1 Hypotheses and Tests


Let F = {Fθ : θ ∈ Θ}, where θ is a population parameter.

Definition:
(1). A hypothesis is a statement about a population parameter.
(2). The two complementary hypotheses in a hypothesis testing problem are
called the null hypothesis H0 and the alternative hypothesis H1 , respectively.
(3) Then the general format of a hypothesis is
H0 : θ ∈ Θ 0
H1 : θ ∈ Θ1 ≡ Θc0 ,
where Θ0 and Θ1 ≡ Θc0 are called the null and alternative parameter spaces,
respectively.

Definition:
(i). Let φ(X) be a statistic taking values in [0, 1]. When X = x is observed,
we reject H0 with probability φ(x) and accept H0 with probability 1 − φ(x).
φ(x) is called the critical function, or test function or simply a test.
(ii). If φ(X) = 1 or 0 a.s., then φ(X) is a non-randomized test. Otherwise,
φ(X) is a randomized test.
(iii). The power function of the critical function φ(X) is
Z
βφ (θ) = Eθ φ(X) = φ(x)dFθ (x).

(The subscript φ is often dropped when no confusion occurs.)


(iv). We refer to βφ (θ) when θ ∈ Θ1 as the power of the test.

Example 1.1.1 Given a coin, we wish to test whether it is loaded or not. For instance,
we wish to see if p = P (head) < 1/2 or not. That is, Suppose we wish to test: H0 : p < 1/2
against H1 : p ≥ 1/2. To test this, we throw the coin n = 2 times, and denote the total
number of heads to be Sn . Note that the possible outcomes for Sn is 0,1,2. We might
make the following decisions:

4
Decision A: Reject H0 if Sn = 1, 2; do not reject otherwise. Then,

φA (x) = 1 when x = 1, 2
= 0 when x = 0.

So φA (X) is a non-randomized test.

Decision B: Reject H0 if Sn = 2; do not reject if Sn = 2; if Sn = 1, we can


not make a firm decision either way since both hypothesis seem OK to us, so
we give H0 and H1 an equal probability of being correct. Then,

φB (x) = 1 when x = 2
= 1/2 when x = 1
= 0 when x = 0 .

So φA (X) is a randomized test.

1.2 Type I and II Errors

Definition: There are two types of errors that could occur:

Type I error: reject H0 when H0 is true.


Type II error: accept H0 when H1 is true.

Clearly,

Prob(Type I error) = Pθ∈Θ0 (Reject H0 ) = βφ (θ) where θ ∈ Θ0 , and


Prob(Type II error) = Pθ∈Θ1 (Do not reject H0 ) = 1 − βφ (θ) where θ ∈ Θ1 .

That is,

βφ (θ) = Eθ φ (X) = Pθ (Type I error) if θ ∈ Θ0


1 − Pθ (Type II error) if θ ∈ Θ1 = Power at θ.

Definition. Suppose that a hypothesis test has power function β(θ). Let 0 ≤ α ≤ 1.

(1). supθ∈Θ0 β(θ) is called the true level of the test, or the size of the test.
(2). A test is said to be of size α if supθ∈Θ0 β(θ) = α.
(3). A test is said to be of level α if supθ∈Θ0 β(θ) ≤ α,
(We often refer to α as the level of the test, specified by the experimenters.
Typically, α = 0.01, 0.05, 0.10.)

5
Remark 1.2.1 For a fixed sample size, it is usually impossible to make both types of
error probabilities arbitrarily small. For instance, if we minimize the prob of type I error
by restricting α = 0, which means that we never reject H0 a.s. (or always accept H0
a.s.). This implies that prob(Type II error) = PH1 (accept H0 ) = 1. Similarly, if we let
prob(Type II error) = 0, we get prob(Type I error) = 1.

Remark 1.2.2 In searching for a good test, it is common to control the Type I error
probability at a specified level. Within this class of tests, we then search for tests that have
minimum Type II error probability (or maximum power function).

1.3 Non-randomized tests


For non-randomized tests, there are two different ways of doing the testing.

Method 1: the critical region method


In a non-randomized test, φ(X) = 1 or 0, a.s., corresponding to rejecting H0 with prob-
ability 1 or 0, respectively. Therefore, based on the sample X, we either reject H0 or do
not reject H0 .

Definition: A rejection region or critical region (C.R.) is the subset of the sample space
to reject the null hypothesis. Its complement C.R.c is called the acceptance region.

Remark 1.3.1 Clearly, C.R. = {x : φ(x) = 1}. Also we can write

φ(x) = 1 when x ∈ C.R.


= 0 when x ∈ C.R.c ,

and the power function is

β(θ) = Eθ φ(X) = Pθ (X ∈ C.R) .

Method 2: the p-value method


Another way of reporting the results of a hypothesis test is to report the p-value.

Definition. The p-value for the sample point x is the smallest size for which this sample
point x will lead to rejection of H0 .

Theorem 1.3.1 If the critical region of a test has the form

T (X) ≥ C,

where T (X) is a test statistic, then

p − value = sup Pθ (T (X) ≥ T (x)) .


θ∈Θ0

6
Proof. For the sample point x, we reject H0 : θ ∈ Θ0 if and only if

T (x) ≥ C. (3.1)

Note that the size of the test is

sup β(θ) = sup Pθ (T (X) ≥ C) =: γ(C).


θ∈Θ0 θ∈Θ0

Note that γ(C) is non-increasing in C. From (3.1), the largest critical value C for which
we would reject is C = T (x). So, the smallest level α for which we would reject H0
corresponds to the largest C for which we would reject H0 , i.e.,

p − value = γ (T (x)) = sup Pθ (T (X) ≥ T (x)) .


θ∈Θ0

The theorem is thus proved.

Remark 1.3.2

(1). Since rejection of H0 using a test with small size is more convincing
evidence that H1 is true than that with a large size, the interpretation of p-
value goes in the same way. The smaller the p-value, the stronger the sample
evidence that H1 is true. If the level of the test is set at α, then we would
reject H0 if the p-value ≤ α.
(Alternatively, if p-value ≤ α, clearly the p-value is already the smallest size
to reject H0 , so we reject H0 . If p-value > α, clearly the p-value is NOT the
smallest size to reject H0 (since al is), so we can not reject H0 .)

(2). From the theorem, if the critical region has the form T (X) ≤ C which is
equivalent to −T (X) ≥ −C, then the p-value for the sample point x is

sup Pθ (−T (X) ≥ −T (x)) = sup Pθ (T (X) ≤ T (x)) .


θ∈Θ0 θ∈Θ0

(3). The p-value is often called the observed size. Note that the p-value is a
function of the sample point x, hence a random variable itself.

(4). Notice that the definition of the p-value is only related to the null hy-
pothesis H0 , regardless of the alternative hypothesis H1 . Therefore, the actual
p-value indicates the strength of the evidence against the null hypothesis.

(5). When H0 is simple (i.e. Θ0 = {θ0 }), and the critical region is of the
form T (X) ≤ C or T (X) ≥ C, then the two methods shown in this section are
equivalent; see Homework 1.

7
1.4 Some examples
1.4.1 Normal example

Example: Let X1 , . . . , Xn ∼ N (µ, σ02 ), where σ02 is known. Consider testing

H0 : µ ≤ µ0
H1 : µ > µ0 .

1. Choose a reasonable test statistic and find the critical region of the test.

2. Find the power function β(µ) of the test.

3. Show that β(µ) is an increasing function of µ with

lim β(µ) = 0, lim β(µ) = 1, lim β(µ) = β(µ0 ) = 1 − Φ(C).


µ→−∞ µ→∞ µ→µ0

4. If we want a test of level α, show that the critical value is C = Φ−1 (1 − α) and
hence show that
lim β(µ) = β(µ0 ) = α.
µ→µ0

5. Show that both type I error and type II error probabilities decrease as the sample
size n increases.

6. If we wish to have maximum Type I error probability of 0.1, and a maximum Type
II error probability of 0.2 if µ > µ0 + σ0 , find C and n to achieve these goals.

Solution:

1. A reasonable test statistic is X̄ or equivalently its standardized version



n(X̄ − µ0 )
T = .
σ0
We reject H0 if T > C for some constant C. The critical region (C.R.) is

R = {x : T (x) > C}, or simply R = {T > C},

2. The power function is


Ã√ ! Ã√ √ !
n(X̄ − µ0 ) n(X̄ − µ) n(µ0 − µ)
β(µ) = Pµ > C = Pµ >C+
σ0 σ0 σ0
à √ !
n(µ0 − µ)
= 1−Φ C + ,
σ0

where Φ(·) is the d.f. of a standard normal r.v. N (0, 1).

3. Clearly, β(µ) is an increasing function of µ with

lim β(µ) = 0, lim β(µ) = 1, lim β(µ) = β(µ0 ) = 1 − Φ(C).


µ→−∞ µ→∞ µ→µ0

8
4. We require that supµ≤µ0 β(µ) = β(µ0 ) = 1 − Φ(C) = α. So C = Φ−1 (1 − α). Hence
we have
lim β(µ) = β(µ0 ) = 1 − Φ(C) = α.
µ→µ0

The plot of β(µ) against µ is shown below.

5. Recall (2). As n increases, β(µ) when µ ∈ Θ0 = (−∞, µ0 ] (i.e., probability of Type


I error) decreases; and β(µ) when µ ∈ Θ1 = (µ0 , ∞) (power=1-P(Type II error))
increases.

6. Note that the power function β(µ) is an increasing function of µ. Also note that
Θ0 = {µ : µ ≤ µ0 } and Θ1 = {µ : µ > µ0 }. Therefore, the requirements in this
problem become
sup β(µ) = 1 − Φ (C) = 0.1,
µ∈Θ0

and ³ √ ´
sup [1 − β(µ)] = 1 − inf β(µ) = 1 − Φ C − n = 0.2
µ∈Θ1 µ∈Θ1

Solving these two equations, we get C = 1.28 and n = 4.49(≈ 5).

Remark 1.4.1

(1). From (2), no matter how large n is, we can always find µ sufficiently close
to µ0 (but µ > µ0 ) so that the power β(µ) is arbitrarily close to β(µ0 ) = α.
Equivalently, the prob of type II error is close to 1 − α. That is, the prob
of accepting H0 when H1 is true is close to 1 − α, which is very high. This
means that not much significance can be attached to the acceptance
of H0 .

(2). However, the low power usually happens near µ0 , where we are willing
to tolerate low power. Such a region is called an indifference region. We are
not interested in values of µ in (µ0 , µ0 + δ) for some small δ > 0, since such
improvements are negligible. So (µ0 , µ0 + δ) would be our indifference region.
Outside the indifference region, we want guaranteed power. For instance, we’d
like to have
sup β(µ) ≥ β0 ,
µ≥µ0 +δ

for some β0 close to 1.

9
1.4.2 Binomial example

Example: Let X1 , ..., X5 ∼ Bin(1, p). Consider testing

H0 : p ≤ 1/2
H1 : p > 1/2.
P5
We can choose a test statistic T = i=1 Xi , and reject H0 if T > C.

(1). If the critical region is T = 5, find its power function.


(2). If the critical region is T ≥ 3, find its power function.
(3). Compare the type I and type II error probabilities for the two different
critical regions.

Solution. Note that T ∼ Bin(5, p).

(1). If the C.R. is T = 5, the power function is

β1 (p) = Pp (T = 5) = p5 .

(2). If the C.R. is T ≥ 3, the power function is


5
X
β2 (p) = Pp (T ≥ 3) = C5i pi (1 − p)5−i .
i=3

(3). We can plot the two power functions β1 (p) and β2 (p) against p, as shown below.
(Clearly, β2 (p) > β1 (p) for p ∈ (0, 1) and βi (0) = 0, βi (1) = 1 for i = 1, 2.)

From the plot, we see that in decision 1, the probability of type I error is very small,
in fact, for all p ≤ 1/2, we have

P (Type I error) = p5 ≤ (1/2)5 = 0.0312.

P (Type II error) = (1 + 5 + 10)p5 ≤ (1/2)5 = 16 × 0.0312 = 0.4992.


However, P (Type II error) = 1 − β1 (p) is too high for most values of p > 1/2.
On the other hand, in decision 2, P (Type II error) is smaller, but P (Type I error) is
larger.

10
Chapter 2

The uniformly most powerful (UMP)


tests

2.1 Definitions
Definition: Let C be a class of all level α tests for testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1 .
A test φ in C is uniformly most powerful (UMP) if and only if, for any other test φ0 in C

βφ (θ) ≡ Eθ φ(X) ≥ Eθ φ0 (X) ≡ βφ0 (θ), for all θ ∈ Θ1 .

Since it is not possible to minimize both type I and type II error probabilities for a
fixed sample size. Our objective is then to select a test φ so as to maximize the power
βφ (θ) for all θ ∈ Θ1 (i.e., minimizing Type II error probability), subject to the condition
Eθ φ(X) ≤ α for all θ ∈ Θ0 .

Definition: The null (or alternative) hypothesis H0 : θ ∈ Θ0 (Θ1 ) is called simple if


Θ0 = {θ0 } (Θ1 = {θ1 }). A hypothesis is call composite if it has more than one element.

2.2 UMP tests for simple hypothesis: Neyman-Pearson


Lemma
The requirements in UMP test are so strong that UMP tests do not exist in many realistic
problems. However, things become simple if both null and alternative hypotheses are
simple.

Theorem 2.2.1 (Neyman-Pearson Lemma) Consider testing H0 : θ = θ0 versus


H1 : θ = θ1 , where the probability density function (pdf ) or probability mass function
(pmf ) corresponding to θi is fi (x), i = 0, 1.

(i) (Existence of a UMP test.) For every α, there exists a UMP test of size
α, which is equal to

φ(x) = 1 when f1 (x) > Cf0 (x)


= γ when f1 (x) = Cf0 (x).
= 0 when f1 (x) < Cf0 (x).

11
where γ ∈ [0, 1] and C ≥ 0 (C = +∞ is allowed) are some constants chosen
so that

Eθ0 φ (X) = α.

(ii) (Uniqueness.) If φ∗ is a UMP test of size α, i.e. Eθ0 φ∗ (X) = α, then a.s.

φ∗ (x) = 1 when f1 (x) > Cf0 (x)


= 0 when f1 (x) < Cf0 (x). (2.1)

Proof. (i). Note that Eθ0 φ (X) = α is equivalent to Eθ0 φ (X) = Eθ0 [φ (X) I{f0 (X) > 0}] .
So we only need to consider the set where f0 (x) > 0.

First we show that there exists C and γ such that the test φ is of size α, i.e., Eθ0 φ (X) = α.

(1). If α = 0, take φ(X) = 0 a.s., corresponding to the choice C = +∞. (This


means that we always accept H0 a.s.)

(2). If α = 1, take φ(X) = 1 a.s., corresponding to the choice C = 0. (This


means that we always reject H0 a.s.)

³ ´
(3). Now consider 0 < α < 1. Let α(t) = Pθ0 (f1 (X) ≤ tf0 (X)) = Pθ0 ff10 (X)
(X)
≤t .
So α(t) is a cdf, and it is non-decreasing
³ and´ right-continuous, α(−∞) = 0,
f1 (X)
α(∞) = 1 and α(t) − α(t−) = Pθ0 f0 (X) = t . Let C be such that α(C−) ≤
1 − α ≤ α(C). Set
α − [1 − α(C)]
γ = if α(C−) 6= α(C)
α(C) − α(C−)
= 0 if α(C−) = α(C).

With such choices of C and γ, we now show that Eθ0 φ (X) = α.

(a). If α(C−) 6= α(C), we have

Eθ0 φ (X) = Pθ0 (f1 (X) > Cf0 (X)) + γPθ0 (f1 (X) = Cf0 (X))
= [1 − α(C)] + γ (α(C) − α(C−))
= [1 − α(C)] + {α − [1 − α(C)]} = α.

(b). If α(C−) = α(C), then α(C) = 1 − α, hence

Eθ0 φ (X) = Pθ0 (f1 (X) > Cf0 (X)) = 1 − α(C) = α.

Nest we’ll show that φ is a UMP test. Suppose that φ0 is any other test with Eθ0 φ0 (X) ≤ α.
We need to show that βφ (θ1 ) = Eθ1 φ (X) ≥ Eθ1 φ0 (X) = βφ0 (θ1 ).
We claim that the following inequality always holds:

[φ (x) − φ0 (x)] [f1 (x) − Cf0 (x)] ≥ 0, for all x (2.2)

12
Proof of (2.2). Assume that 0 < γ < 1 for simplicity. (The case for γ = 1
or 0 can be proved similarly.) Then,

(a). If φ0 (x) − φ (x) > 0, then φ (x) < φ0 (x) ≤ 1, and f1 (x) ≤
Cf0 (x). So (2.2) holds.
(b). If φ0 (x) − φ (x) < 0, then φ (x) > φ0 (x) ≥ 0, and f1 (x) ≥
Cf0 (x). Again (2.2) holds.
(c). If φ0 (x) − φ (x) = 0, clealy (2.2) holds.

This completes the proof of (2.2).

Finally, we have

βφ (θ1 ) − βφ0 (θ1 ) = Eθ1 φ(X) − Eθ1 φ0 (X)


Z
= [φ (x) − φ0 (x)] f1 (x)dx
Z
≥ C [φ (x) − φ0 (x)] f0 (x)dx (from (2.2))
= C [Eθ0 φ (X) − Eθ0 φ0 (X)]
= C [α − Eθ0 φ0 (X)]
≥ 0,

as has to be proved.

(ii) (Uniqueness.) If α = 0 and φ∗ is UMP of size α = 0, i.e., Eθ0 φ∗ (X) = 0, then


we have φ∗ (X) = 0, a.s. Then we can choose C = ∞. The proof for the case α = 1 is
similar. Assume now that 0 < α < 1 only.
Let φ∗ be a UMP test of size α (i.e., Eθ0 φ∗ (X) = α). Define

A = {x : φ(x) 6= φ∗ (x), f1 (x) 6= Cf0 (x)},

that is, A is the set on which φ and φ∗ differ, with the set {x :f1 (x) 6= Cf0 (x)}, (that is,
either the set {x :f1 (x) > Cf0 (x)} or the set {x :f1 (x) < Cf0 (x)}). We shall now show
that the set A has measure 0.
Note from the proof of part (i), we have

[φ(x) − φ∗ (x)] [f1 (x) − Cf0 (x)] > 0 when x ∈ A


= 0 when x ∈ Ac . (2.3)

From this, we get


Z
[φ(x) − φ∗ (x)] [f1 (x) − Cf0 (x)] dx
ZA
= [φ(x) − φ∗ (x)] [f1 (x) − Cf0 (x)] dx (from (2.3))
Z Z
= [φ(x) − φ∗ (x)] f1 (x)dx − C [φ(x) − φ∗ (x)] f0 (x)dx
= [Eθ1 φ(X) − Eθ1 φ∗ (X)] − C [Eθ0 φ(X) − Eθ0 φ∗ (X)]
= [βφ (θ1 ) − βφ∗ (θ1 )] − C [α − α]
= 0.

13
where the last line follows since both φ(x) and φ∗ (x) are UMP of size α. Thus, the set A
has measure 0. The theorem is proved.

Remark 2.2.1 The theorem shows that, when both H0 and H1 are simple, there exists a
UMP test that can be determined by (2.1) uniquely (up to sets of measure 0) except on
the set B = {x : f1 (x) = Cf0 (x)}.

Remark 2.2.2 There are two different situations:

(1). If P (X ∈ B) = 0, then from the N-P lemma, we have a unique (a.s.)


nonrandomized UMP test of size α, given by

φ(x) = 1 when f1 (x) > Cf0 (x)


= 0 when f1 (x) < Cf0 (x),

where C is determined from

Eθ0 φ (X) = Pθ0 [f1 (X) > Cf0 (X)] = α.

Also in this case, the critical region is

R = {x : f1 (x) > Cf0 (x)} = {x : φ(x) = 1}.

(2). If P (X ∈ B) > 0, then the UMP tests are randomized on the set B and
the randomization is necessary for the UMP tests to have the given size α.

Remark 2.2.3 In the N-P lemma, we don’t have to consider the case where both f1 (x)
and f0 (x) are equal to 0 since in that case, we can always shrink the sample space. There-
fore, we can rewrite the UMP test as

φ(x) = 1 when f1 (x)/f0 (x) > C


= γ when f1 (x)/f0 (x) = C,
= 0 when f1 (x)/f0 (x) < C.

Here, we regard 1/0 = ∞. (It happens when f0 (x) = 0 but f1 (x) > 0.

14
2.3 The relationship between the power and the size
Theorem 2.3.1 Let β be the power of the UMP test of size α, where 0 ≤ α ≤ 1, for
testing H0 : θ = θ0 versus H1 : θ = θ1 , where θ0 6= θ1 .

(i). α ≤ β for all α.


(ii). If 0 < α < 1, then α < β.

Proof. (i). Choose φ0 (x) ≡ α. Clearly, Eθ0 φ0 (X) = Eθ1 φ0 (X) = α. That is, this test φ0
is of size α with power α. Since β is the power of a UMP, then we have α ≤ β.
(ii). Suppose 0 < α < 1. From (i), α ≤ β. Now let us show by contradiction that we
must have α < β. If not, then α = β(< 1). Therefore, φ0 (x) ≡ α(= β) is the UMP test
and must satisfy (2.1). Then we must have f1 (x) = Cf0 (x). (Otherwise, φ0 (x) will either
be 1 or 0, which is contradictray with the assumption 0 < φ0 (x) ≡ α < 1.) Integrating
both sides, we get C = 1, hence f1 (x) = f0 (x), that is, θ0 = θ1 .

Remark. We have seen that for a UMP test, β ≥ α. This is in fact a very basic
requirement for any test. Otherwise, if α > β, we would have

PH0 true (Reject H 0 ) > P H0 not true (Reject H0 ),

which would sound very strange indeed.

15
2.4 Some examples.
2.4.1 Binomial example.

Example. Let X1 , . . . , Xn ∼ Bin(1, p). Find a UMP test of size α for testing H0 : p = p0
versus H1 : p = p1 , where 0 < p0 < p1 < 1.
Pn Pn
Xi
Solution. For x = (x1 , ..., xn ), fp (x) = p i=1 (1 − p)n− i=1
Xi
. Thus,
à !Pn Xi à !n−Pn Xi
f1 (x) p1 i=1
1 − p1 i=1
λ(x) ≡ = ,
f0 (x) p0 1 − p0
P Pn
is a strictly increasing function in ni=1 xi . Note that T = i=1 Xi ∼ Bin(n, p). Therefore,
from the N-P lemma, the UMP test of size α is

φ(x) = 1 when λ(x) > C


= γ when λ(x) = C
= 0 when λ(x) < C,

or equivalently

φ(x) = 1 when T > m


= γ when T = m
= 0 when T < m,

where m and γ satisfy

α = Ep0 [φ(x)] = Pp0 (T > m) + γPp0 (T = m)


n
X
= Cnj pj0 (1 − p0 )n−j + γ Cnm pm
0 (1 − p0 )
n−m
.
j=m+1

P
If α = nj=m+1 Cnj pj0 (1 − p0 )n−j for some integer m, then we can choose γ = 0, in which
case the UMP test φ is a nonrandomized test. Otherwise, it is a randomized test.

16
2.4.2 Normal example.

Example. Let X1 , . . . , Xn ∼ N (µ, σ02 ). Find a UMP test of size α for testing H0 : µ = µ0
versus H1 : µ = µ1 , where µ1 < µ0 .

Solution. For x = (x1 , ..., xn ),


n
( n
)
Y X
fµ (x) = f (xi ) = C0 exp − (xi − µ)2 /(2σ02 )
i=1 i=1
(Ã n n
! )
X X
= C0 exp − x2i + 2µ xi + nµ 2
/(2σ02 ) .
i=1 i=1

So (Ã ! )
n
X
f1 (x)
λ(x) ≡ = exp 2[µ1 − µ0 ] xi + n[µ21 − µ20 ] /σ02
f0 (x) i=1

is a strictly decreasing function in x̄. Therefore, from the N-P lemma, a UMP test of size
α is

φ(x) = 1 when λ(x) > C


= γ when λ(x) = C,
= 0 when λ(x) < C,

or equivalently

φ(x) = 1 when x̄ < m


= γ when x̄ = m,
= 0 when x̄ > m,

where m satisfies α = Eµ0 [φ(X)] = Pµ0 (X̄ < m) = Φ ( n(m − µ0 )/σ0 ) . That is, the
critical value is

m = µ0 + Φ−1 (α) σ0 / n. (4.4)

Since P (T = m) = 0, then the UMP test is, a.s.

φ(x) = 1 when x̄ < m


= 0 when x̄ > m,

where m is given in (4.4). This is a nonrandomized test. In terms of the critical region,
we reject H0 if √
X̄ < µ0 + Φ−1 (α) σ0 / n.

Remark 2.4.1 Note in the last example that the UMP test does not depend on the alter-
native µ1 . Therefore, the test is a UMP test for testing H0 : µ = µ0 versus H1 : µ > µ0 .
See the section on monotone likelihood functions later.

17
2.4.3 Testing Normal Against Double Exponential.

Example. Suppose X has density function


( )
1 x2
f (x) = √ exp −
2π 2
under H0 and
1
g(x) =
exp {−|x|}
2
under H1 . How do we choose between f and g on the basis of a single observation?
Solution. Define fθ (x) = (1 − θ)f (x) + θg(x). Then we wish to test
H0 : θ = 0
H1 : θ = 1.
Note that P (f1 (X) = Cf0 (X)) = 0. Then there exists a unique nonrandomized UMP
test. By the N-P Lemma, the UMP test has a critical region of the form
√ ( )
f1 (x) 2π x2
λ(x) = ≡ exp −|x| + > C.
f0 (x) 2 2
But
x2 1³ 2 ´ 1 1 1
− |x| = |x| − 2|x| + 1 − = (|x| − 1)2 − .
2 2 2 2 2
Hence, λ(x) > C if and only if ||x| − 1| > k for some k > 0 if and only
|x| > k + 1, or, |x| < 1 − k if k ≤ 1.
(Note that the second inequality is only meaningful for the case k ≤ 1; otherwise the set
{x : |x| < 1 − k} is empty.) This means that if either a very large or a very small value
of X is observed, we suspect that H1 is true rather than H0 . This is consistent with the
fact that g has more probability in its tails and near zero than f has. See the plot.

(PLOT HERE.)

The size of the test is given by


Pf (|X| > k + 1) + Pf (|X| < 1 − k)
= 2 [1 − Φ(k + 1)] + [2Φ(1 − k) − 1] I{k ≤ 1}.
Note that the second probability Pf (|X| < 1 − k) is 0 if k ≤ 1.
The power of the test is given by
Pg (|X| > k + 1) + Pg (|X| < 1 − k)
h i
= e−(k+1) + 1 − e−(1−k) I{k ≤ 1}.

18
2.5 The Neyman-Pearson Lemma in terms of suffi-
cient statistics
Remark 2.5.1 In the two examples in the last section, we have seen that the UMP tests
are functions of the sufficient statistics. This is in fact true in general. To see why,
suppose that T = T (X) is a sufficient statistic, then from the Neyman Factorization
Theorem, we have
fθ (x) = g (T (x), θ) h (x) .
Assume h (x) 6= 0 for either θ = θ0 or θ = θ1 . Otherwise, we can shrink our sample space.
Therefore, from the N-P lemma, the UMP test is

φ(x) = 1 when f1 (x) > Cf0 (x)


= γ when f1 (x) = Cf0 (x)
= 0 when f1 (x) < Cf0 (x).

This is clearly equivalent to

φ(x) = 1 when g (T (x), θ1 ) > Cg (T (x), θ0 )


= γ when g (T (x), θ1 ) = Cg (T (x), θ0 )
= 0 when g (T (x), θ1 ) < Cg (T (x), θ0 ),

which is a function of T (x). In other words, in our search for a UMP test for simple
hypothesis, we only need to concentrate on those tests which are functions of sufficient
statistics.

Remark 2.5.2 Let φ(x) be a test, based on the observations x. Suppose that T is a
sufficient statistic, then we can define a new test, based on T by

η(T ) = Eθ (φ(X)|T ) .

(It is a proper test since 0 ≤ φ(X) ≤ 1 implies 0 ≤ η(T ) ≤ 1.) Note that

Eθ η(T ) = Eθ Eθ (φ(X)|T ) = Eθ φ(X).

In other words, the power function for the original test φ(X) is the same as that based on
sufficient statistics.

19
Chapter 3

UMP tests For One-Sided


Hypotheses

3.1 One-sided hypotheses


We are interested in finding UMP tests for one-sided hypotheses of the following type:
(a) H0 : θ ≤ θ 0
H1 : θ > θ 0 ,

(b) H0 : θ ≥ θ 0
H1 : θ < θ 0 .

3.2 Monotone Likelihood Ratio (MLR)


The N-P lemma points out that there exists a unique UMP test for testing simple null
against simple alternative hypothesis. For composite hypothesis (null or alternative),
however, a UMP test may not exist. But, under further conditions, a UMP test does
exist.

Definition. A family of densities fθ (x) (pdf’s or pmf’s), where θ ∈ R, is said to have


Monotone Likelihood Ratio (MLR) if there exists a real-valued function T (x) such
that for any θ1 < θ2 , fθ2 (x)/fθ1 (x) = g[T (x)] is a nondecreasing function of T (x) for
values of x at which at least one of fθ1 (x) and fθ2 (x) is positive.
f (x)
Remark 3.2.1 From the Neyman Factorization Theorem, fθθ2 (x) is a function of sufficient
1
statistic, T (X), say. Therefore, in order to check for MLR, it is enough to check that the
f (x)
ratio fθθ2 (x) is a function of the sufficient statistic T = T (X).
1

Remark 3.2.2 Also, the above definition implies that T = T (X) is a 1-dim statistic. If
it is more than 1-dim, then it is difficult to define non-increasing function.

The following lemma states a useful result for a family with MLR.
Lemma 3.2.1 (MLR Lemma) Suppose that the family of distributions fθ (x) of X has
MLR in T (x).

20
(i). If ψ(t) is a nondecreasing function of t, then h(θ) = Eθ [ψ(T )] = Eθ [ψ[T (X)]]
is a nondecreasing function of θ.
(ii). For any θ1 < θ2 , the cumulative distribution function of T (X) under θ1
and θ2 satisfy
Pθ1 [T (X) > t] ≤ Pθ2 [T (X) > t] , for all t,
or equivalently F(T (X),θ1 ) (t) ≥ F(T (X),θ2 ) (t), for all t. (That is, if Ti ∼ FT (X),θi (t),
i = 1, 2, then T1 is stochastically smaller than T2 .)
Proof. (i). For θ1 < θ2 , define
A = {x : fθ1 (x) > fθ2 (x)}, a = sup ψ (T (x)) ,
x∈A

B = {y : fθ1 (y) < fθ2 (y)}, b = inf ψ (T (y)) .


y∈B
Clearly, for any x ∈ A and y ∈ B, we have
fθ (x) fθ (y)
g[T (x)] = 2 <1< 2 = g[T (y)],
fθ1 (x) fθ1 (y)
which implies that T (x) ≤ T (y) by the definition of MLR. It follows that ψ (T (x)) ≤
ψ (T (y)) since ψ(t) is nondecreasing in t. Taking supx∈A and inf y∈B in sequence, we get
a ≤ b. Also note that
Z
0 = [fθ2 (x) − fθ1 (x)] dx
Z Z
= [fθ2 (x) − fθ1 (x)] dx + [fθ2 (y) − fθ1 (y)] dy (2.1)
x∈A y∈B

Thus
h(θ2 ) − h(θ1 ) = Eθ2 [ψ(T (X))] − Eθ1 [ψ(T (X))]
Z
= ψ(T (x)) [fθ2 (x) − fθ1 (x)] dx
Z Z
= ψ(T (x)) [fθ2 (x) − fθ1 (x)] dx + ψ(T (x)) [fθ2 (y) − fθ1 (y)] dy
x∈A y∈B
Z Z
≥ a [fθ2 (x) − fθ1 (x)] dx + b [fθ2 (y) − fθ1 (y)] dy
x∈A y∈B
Z
= (b − a) [fθ2 (y) − fθ1 (y)] dy (from (2.1))
y∈B
≥ 0.

(ii). In (i), simply take ψ(T ) = I{T > t0 }, which is an increasing function of T .

Example. Let θ be real-valued and η(θ) be a nondecreasing function of θ. Then the


one-parameter exponential family with
fθ (x) = exp{η(θ)T (x) − ξ(θ)}h(x)
has MLR in T (X).
Proof. For any θ1 < θ2 , we have η(θ1 ) ≤ η(θ2 ). So
fθ2 (x)
= exp{[η(θ2 ) − η(θ1 )] T (x) − [ξ(θ2 ) − ξ(θ1 )]}.
fθ1 (x)
is a nondecreasing function of T (x). By definition, the exponential family has MLR in
T (x).

21
Remark 3.2.3 Some examples of exponential families include

(1). Binomial family {Bin(n, θ)}.


(2). Poisson family {P oisson(θ)}.
(3). Negative binomial family {N eg.Bin(n, r, θ)}.
(4). Normal family {N (θ, σ02 ) or {N (µ0 , θ)}.
(5). Exponential family {Exp(θ)}.
(6). Gamma family {Gamma(θ, c)} or {Gamma(c, θ)}.
(7). Beta family {Beta(θ, c)} or {Beta(c, θ)}.

Examples.

• Let X1 , . . . , Xn ∼ U (0, θ), where θ > 0. Show that the family has MLR in X(n) .
Proof. First the pdf of X = {X1 , . . . , Xn } is fθ (x) = θ−n I{0 < x(n) < θ}. For any
θ1 < θ2 ,
fθ2 (x) θn I{0 < x(n) < θ2 }
= 1n .
fθ1 (x) θ2 I{0 < x(n) < θ1 }
is a nondecreasing function of x(n) for values of x at which at least one of fθ1 (x) and
fθ2 (x) is positive. By definition, the family has MLR in X(n) .

• Show that the following families (which do not belong to the exponential family)
have MLR:

(1). Logistic distribution family {Logistic(θ)}.


(2). Hypergeometric distribution family {HyperGeo(r, θ, N − θ)}.

Proof. Left as an exercise.

• Show that the Cauchy distribution family {Cauchy(θ)} does not have any MLR in
x, but it has stochastic ordering.
Proof. Left as an exercise.

• Show that the Uniform distribution family U (θ, θ + 1) does not have any MLR in x.
Proof. Left as an exercise.

Theorem 3.2.1 Let θ be a real parameter, and let the random variable X have probability
density fθ (x) with MLR in T (x). Consider the problem of testing H0 : θ ≤ θ0 against
H1 : θ > θ0 , where θ0 is a given constant.

(i). There exists a (not necessarily unique) UMP test of size α given by

φ(x) = 1 when T (x) > C


= γ when T (x) = C
= 0 when T (x) < C,

22
where C and γ are determined by
β(θ0 ) ≡ Eθ0 φ(X) = α.

(ii). The power function βφ (θ) ≡ Eθ φ(X) of the test is always nondecreasing.
(iii). For all θ0 , the test φ is UMP for testing H00 : θ ≤ θ0 against H10 : θ > θ0
with size α0 = β(θ0 ).
(iv). The test φ minimizes βφ0 (θ) for any θ < θ0 amongst all tests φ0 satisfying
Eθ0 φ0 (X) = α.

Proof.
(i). First consider testing H0 : θ = θ0 against H1 : θ = θ1 with any θ1 > θ0 .
Firstly we’ll construct a test of size α. Similar to the proof of N-P lemma, we
can show
Lemma: There exists a test of size α given by
φ(x) = 1 if T (x) > C
= γ if T (x) = C
= 0 if T (x) < C, (2.2)
where C and γ are determined by
β(θ0 ) ≡ Eθ0 φ(X) = α.
Proof. Define α(t) = Pθ0 (T (X) ≤ t). Clearly, α(t) is a cdf, and it
is non-decreasing and right-continuous, α(−∞) = 0, α(∞) = 1 and
α(t) − α(t−) = Pθ0 (T (X) = t).
For any 0 < α < 1, (the cases for α = 0, 1 are easy to prove) we can
choose C such that α(C−) ≤ 1 − α ≤ α(C). Set
α − [1 − α(C)]
γ = if α(C−) 6= α(C)
α(C) − α(C−)
= 0 if α(C−) = α(C).
(Note that γ takes value 1 if α(C) = 1 − α.)
Then if α(C−) 6= α(C), we have
Eθ0 φ (X) = Pθ0 (T (X) > C) + γPθ0 (T (X) = C)
= [1 − α(C)] + {α − [1 − α(C)]} = α.
On the other hand, if α(C−) = α(C), then α(C) = 1 − α, hence
Eθ0 φ (X) = Pθ0 (T (X) > C) = 1 − α(C) = α.

f1 (x)
Secondly, we’ll show that the test φ is a UMP test. Denote λ(x) ≡ f0 (x)
=
g[T (x)]. For T (x) = C, we denote
¯
f1 (x) ¯¯
g(C) ≡ ¯ = k.
f0 (x) ¯T (x)=C
Since g[T (x)] is monotone in T (x), then we have

23
(a) if ff10 (x)
(x)
= g[T (x)] > k, then we must have T (x) > C (otherwise
g[T (x)] ≤ k in contradiction with the assumption), thus from (2.2),
we get φ(x) = 1.
f1 (x)
(b) if f0 (x)
= g[T (x)] < k, then T (x) < C, thus φ(x) = 0.

Summarizing (a) and (b), we have

φ(x) = 1 if f1 (x) > kf0 (x)


= 0 if f1 (x) < kf0 (x). (2.3)

By the N-P lemma, φ is a UMP test (of size α).

Since φ does not depend on θ1 , it follows that φ is a UMP of size α for testing
H0 : θ = θ0 versus H1 : θ > θ0 .

Note that if φ is a UMP of size α for testing H0 : θ = θ0 versus H1 : θ > θ0 ,


then it is UMP of size α for testing H0 : θ ≤ θ0 versus H1 : θ > θ0 provided
that
sup Eθ φ(X) = α.
θ≤θ0

To show this, we note that we can write φ(X) = ψ(T (X)) = ψ(T ), where
ψ(t) is nondecreasing in t. Then applying the MLR Lemma (or part (ii) of
the present theorem), we get

sup Eθ φ(X) = sup Eθ ψ(T ) = Eθ0 ψ(T ) = Eθ0 φ(X) = α.


θ≤θ0 θ≤θ0

(ii). Recall βφ (θ) ≡ Eθ φ(X) = Eθ ψ(T ), where φ(X) = ψ(T ) is given in part
(i) or (2.3). For any θ0 and θ1 such that θ0 < θ1 . From the MLR Lemma,
βφ (θ0 ) = Eθ0 ψ(T ) ≤ Eθ1 ψ(T ) = βφ (θ1 ).

(iii). The proof is very similar to part (i), and hence omitted.

(iv). Consider testing

H0 : θ ≥ θ 0 ,
H1 : θ = θ1 , θ 1 < θ0 .

Similarly to parts (i) and (ii), we can show that

1 − φ(x) = 1 when T (x) < C


= γ0 when T (x) = C
= 0 when T (x) > C,

is a UMP test of size Eθ0 [1 − φ(x)] = 1 − α


Let φ0 be any test of size α, i.e., Eθ0 φ0 (X) = α. Then, η 0 (x) = 1 − φ0 (x) is a
test of size 1 − α.

24
Clearly, for any θ < θ0 ,

Eθ [1 − φ(X)] ≥ Eθ [1 − φ0 (X)].

Namely, for any θ < θ0 ,

Eθ φ(X) ≤ Eθ φ0 (X).

Remark 3.2.4 The test φ(x) in the theorem is a UMP test, but it may not be unique.
We shall give some examples a bit later in the chapter. But basically, if

f1 (x)
T (x) > C ⇐⇒ > C1
f0 (x)

then φ(x) will be a unique UMP test. On the other hand, the unique test given by

f1 (x)
> C1
f0 (x)

may yield many different UMP tests in terms of T if the correspondence is NOT one-to-
one.

The next theorem tests H0 : θ ≥ θ0 against H1 : θ < θ0 .

Theorem 3.2.2 Let θ be a real parameter, and let the random variable X have probability
density fθ (x) with MLR in T (x). Consider the problem of testing H0 : θ ≥ θ0 against
H1 : θ < θ0 , where θ0 is a given constant.

(i). There exists a UMP test of size α, which is given by

φ(x) = 1 when T (x) < C


= γ when T (x) = C
= 0 when T (x) > C,

where C and γ are determined by

β(θ0 ) ≡ Eθ0 φ(X) = α.

(ii). The power function


βφ (θ) ≡ Eθ φ(X)
of the test is strictly decreasing for all points θ for which 0 < β(θ) < 1.
(iii). For all θ0 , the test φ is UMP for testing H00 : θ ≥ θ0 against H10 : θ < θ0
with size α0 = β(θ0 ).
(iv). For any θ > θ0 , the test φ minimizes βφ0 (θ) amongst all tests φ0 satisfying
Eθ0 φ0 (X) = α.

25
For exponential family, the test is really simple as shown below.

Corollary. Suppose that X has the one-parameter exponential family with

fθ (x) = exp{η(θ)T (x) − ξ(θ)}h(x), (2.4)

where η(θ) is a strictly monotone function of θ.

(i). If η is increasing, then the UMP test for testing H0 : θ ≤ θ0 versus


H1 : θ > θ0 is

φ(x) = 1 when T (x) > C


= γ when T (x) = C
= 0 when T (x) < C,

where C and γ satisfy Eθ0 φ(X) = α.

(ii). If η is decreasing, then the UMP test for testing H0 : θ ≥ θ0 versus


H1 : θ < θ0 is the same as φ.

(iii). If η is decreasing, then the UMP test for testing H0 : θ ≤ θ0 versus


H1 : θ > θ0 is

φ0 (x) = 1 when T (x) < C


= γ0 when T (x) = C
= 0 when T (x) > C,

where C and γ satisfy Eθ0 φ0 (X) = α.

(iv). If η is decreasing, then the UMP test for testing H0 : θ ≥ θ0 versus


H1 : θ < θ0 is the same as φ0 .

3.3 Some examples


3.3.1 Examples when MLR exists

Normal example. Let X1 , . . . , Xn ∼ N (µ, σ02 ) with known σ02 . test for testing H0 : µ ≤
µ0 versus H1 : µ > µ0 . Find a UMP test of size α.

Solution. The joint d.f. of X = {X1 , . . . , Xn } belongs to the one-parameter family (2.4)
with T (X) = X̄ and η(µ) = nµ/σ02 . From the corollary, the UMP test is

φ(x) = I{T (x) > C} = I{x̄ > C},



where Eµ0 φ(X) = Pµ0 {X̄ > C} = α, i.e., C = µ0 + Φ−1 (1 − α)σ0 / n.

Binomial example. Let X1 , . . . , Xn ∼ Bin(1, p). Consider testing H0 : p ≤ p0 versus


H1 : p > p0 . Find a UMP test of size α.

26
Solution. The joint d.f. of X = {X1 , . . . , Xn } belongs to the one-parameter family (2.4)
P p
with T (X) = Xi ∼ Bin(n, p) and η(p) = log 1−p , which is an increasing function of p.
From the corollary, the UMP test is
P
φ(x) = 1 when T (x) = xi > C
P
= γ when T (x) = xi = C
P
= 0 when T (x) = xi < C,

where C and γ are solved from


³X ´ ³X ´
Ep0 φ(X) = Pp0 Xi > C + γPp0 Xi = C = ...... = α.

Poisson example. Let X1 , . . . , Xn ∼ P oisson(λ). Consider testing H0 : λ ≤ λ0 versus


H1 : λ > λ0 . Find a UMP test of size α.
P
Solution. Here, T (X) = X ∼ P oisson(nλ) and η(λ) = log λ. Thus the UMP test is

φ(x) = 1 when T (x) > C


= γ when T (x) = C
= 0 when T (x) < C,

where C and γ are solved from


³X ´ ³X ´
Eλ0 φ(X) = Pλ0 Xi > C + γPλ0 Xi = C = ...... = α.

Uniform example. (UMP tests may not be unique.) Let X1 , . . . , Xn ∼ U nif (0, θ)
with θ > 0. Consider testing H0 : θ ≤ θ0 versus H1 : θ > θ0 .
(i) Find a UMP test of size α.
(ii) Show that the following test φ0 is also UMP of size α,

φ0 (x) = 1 when x(n) > θ0


= α when x(n) ≤ θ0 .

Therefore, in this case, UMP tests are not unique. (But note that we are
dealing with composite hypotheses not simple ones.)

Solution. (i). The joint d.f. of X = {X1 , . . . , Xn } is in a family with MLR in X(n) ,
which has density fX(n) (x) = nθ−n xn−1 I{0 < x < θ} and

n Z θ n−1 Cn
Pθ {X(n) > C} = n x dx = 1 − n .
θ c θ
Clearly, the UMP test is nonrandomized. From the corollary, the UMP test is

φ(x) = I{T (X) > C} = I{X(n) > C},


Cn
where βφ (θ0 ) = Pθ0 {X(n) > C} = 1 − θ0n
= α, i.e.,

C = θ0 (1 − α)1/n ,

27
which results in the power of φ at any θ > θ0
θ0n
βφ (θ) = 1 − (1 − α). (3.5)
θn

(ii). The power function of φ0 is


βφ0 (θ) = Eθ φ0 (X)
= Pθ (X(n) > θ0 ) + αPθ (X(n) ≤ θ0 )
h i
= Pθ (X(n) > θ0 ) + α 1 − Pθ (X(n) > θ0 )
= α + (1 − α)Pθ (X(n) > θ0 )
à !
θ0n
= α + (1 − α) 1 − n I{θ0 ≤ θ}
θ
= α if θ < θ0
n
θ
1 − 0n (1 − α) if θ ≥ θ0 ,
θ
which is continuous and nondecreasing in θ. Therefore, the test φ0 has the following size
sup βφ0 (θ) = βφ0 (θ0 ) = α.
θ≤θ0

Furthermore, the power of φ0 at any θ > θ0 is


θ0n
βφ0 (θ) = 1 − (1 − α) = βφ (θ).
θn

Remark 3.3.1 UMP tests may or may not be unique for composite tests. Suppose that
the family has MLR in T (x). Then, UMP tests are unique if
fθ2 (x)
= g(T (x))
fθ1 (x)
is strictly monotone in T (x). This can be seen from the proof of the main theorem. Note
that this is the case for Normal, Binomial, Poisson examples above, but not for the case
of Uniform example.
When MLR dose not exist, similar principal applies. See the example in the next
subsection.

3.3.2 A UMP test may exist even if an MLR does not


So far, in deriving UMP tests for one-sided hypothesis, we rely heavily on the assumption
that the family has MLR. When this is not true, UMP tests still can exist, as illustrated
from the next example.
Example. Let X1 , . . . , Xn ∼ U (θ, θ + 1), where θ ∈ R. Suppose that n ≥ 2.
(i). Show that a UMP test of size α for testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 is
φ(x) = 1 when X(1) > θ0 + C(α), or X(n) > θ0 + 1
0 otherwise.
where C(α) = 1 − α1/n .
(ii). The family U (θ, θ + 1) does not have monotone likelihood ratio.

28
Solution. (i). Consider testing H0 : θ = θ0 v.s. H1 : θ = θ1 , where θ0 < θ1 at size α. By
N-P lemma, the unique UMP test will reject H0 if
fθ (x1 , ..., xn ) I{θ1 < x(1) < x(n) < θ1 + 1} A
λ(x) ≡ 1 = ≡ > C, where C > 0.
fθ0 (x1 , ..., xn ) I{θ0 < x(1) < x(n) < θ0 + 1} B
We shall now try to find a UMP test in terms of T = (X(1) , X(n) ), which will be easier
to use in practice. (The UMP test given later is by no means unique. Can you
find some other ones?)
Here we reject H0 for large values of λ(x), which happens when
x(n) > θ0 + 1, or x(1) > θ0 + C, where C ≥ 0.
We now explain why this is the case.
(a). If x(n) > θ0 + 1, then H0 can not be true and must be rejected.
(b). Otherwise, if x(n) < θ0 + 1(< θ1 + 1), then test depends on the value of
x(1) . As x(1) increases, the value of λ(x) also changes from 0 to 1 (under the
assumption x(n) < θ0 + 1). So x(1) can be anywhere when we decide to reject
H0 , but we can choose x(1) > θ0 + C for some C ≥ 0. Here C is chosen to
make the test to be of size α.
We make one remark before we move on. There is no need to consider the case where
both A and B are equal to zero. Therefore, the possible values of R are 0, 1, or ∞.
Coming back to the question, let us now decide C. Note that
³ ´
α = Pθ0 X(1) > θ0 + C, or X(n) > θ0 + 1
³ ´
= Pθ0 {X(1) > θ0 + C} ∪ ∅
³ ´
= Pθ0 X(1) > θ0 + C
= Pθn0 (X1 > θ0 + C)
= (1 − C)n .
Therefore, C = 1 − α1/n . [It appears that x(n) > θ0 + 1 has no effect on the size of the
test, but it does on the power.]
Now that the UMP test for testing H0 : θ = θ0 v.s. H1 : θ = θ1 , where θ1 > θ0 , it
is also UMP for testing H0 : θ = θ0 v.s. H1 : θ > θ0 . It remains to show that it is also
UMP of size α for testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 . That is, we need to check that
supθ≤θ0 Eθ φ(X) = α, as shown below.
³ ´
sup Eθ φ(X) = sup Pθ X(1) > θ0 + C, or X(n) > θ0 + 1
θ≤θ0 θ≤θ0
³ ´
= sup Pθ X(1) > θ0 + C
θ≤θ0
= sup Pθn (X1 > θ0 + C)
θ≤θ0
= sup ((θ + 1) − (θ0 + C))n
θ≤θ0
= ((θ0 + 1) − (θ0 + C))n
= (1 − C)n
= α.
(ii). From Remark 3.2.1, it suffices to consider the minimal sufficient statistics
(X1 , X(n) ). But the definition of MLR does not deal with this case.

29
3.4 Exercises
1. Show that the following family has an MLR.
e−(x−θ)
(i). Logistic family fθ (x) = 2.
[1+e−(x−θ) ]
(ii). Double exponential family fθ (x) = Ce−|x−θ| .

2. Let X be one observation from the two problems in Question 1. Find a UMP test
of size α for testing H0 : θ ≤ 0 v.s. H1 : θ > 0.

3. Let X be one observation from a Cauchy distribution


C
fθ (x) = .
1 + (x − θ)2

(i). Show that this family does not have an MLR in x.


(ii). Show that the test
φ(x) = I{1 < x < 3}
is UMP of its size for testing H0 : θ = 0 v.s. H1 : θ = 1. Calculate the Type I and
type II error probabilities.

4. Let X be one observation from a Cauchy scale distribution


θ
fθ (x) = , θ > 0.
π(θ2 + x2 )

(i). Show that this family does not have an MLR in x.


(ii). Show that the distribution of |X| does have an MLR.

30
Chapter 4

UMP tests for two-sided hypotheses

4.1 Two-sided hypotheses


We are interested in finding UMP tests for two-sided hypotheses of the following type:

(a) H0 : θ = θ0
H1 : θ 6= θ0 ,

(b) H0 : θ1 ≤ θ ≤ θ2
H1 : θ < θ1 or θ > θ2

(c) H0 : θ ≤ θ1 or θ ≥ θ2
H1 : θ1 < θ < θ2 ,

where θ1 < θ2 and θ0 are three given constants.

We will see in the next section that UMP tests do not exist for the first two cases
(a) and (b), but they do exist for the case (c) for some special circumstances, such as
one-parameter exponential family and totally positive family.

4.2 An example where a UMP test does not exist


Here we shall give an example to illustrate that UMP tests do not exist for the two-sided
hypothesis of type (a) and hence for Type (b) [since this can be regarded as a more
general version of type (a)]. Later on, we’ll show that UMP tests do exist for type (c) if
the underlying model is a one-parameter exponential family model.

Example. Let X1 , . . . , Xn ∼ N (µ, σ02 ), σ02 known. Consider testing H0 : θ = θ0 versus


H1 : θ 6= θ0 . Show that a UMP test does not exist for any size α.

Proof. Pick any θ1 < θ0 , then the test φ1 (x) = I{x̄ < θ0 − σ0 zα / n} has the highest
power at θ1 . So if a UMP test exists for all θ 6= θ0 (it also must be a UMP at θ1 ), it must
be φ1 (x) a.s. (since any other test would have strictly less power at θ1 by the necessity of
the N-P Lemma). √
Construct another test φ2 (x) = I{x̄ > θ0 + σ0 zα / n}, which is of size α.

31
Plot the power functions of φi , βφi (θ) = Eθ φi (X), i = 1, 2, respectively. Note that
βφ1 (θ) and βφ2 (θ) are decreasing and increasing in θ, respectively.

(Place the plots here).

From the plots of βφ1 (θ) and βφ2 (θ), we can see that, for any θ2 > θ0 , we have βφ2 (θ2 ) >
βφ1 (θ2 ). Thus, φ1 is not a UMP test of size α since φ2 has a higher power than φ1 at θ2 .
So φ1 can not possibly be a UMP test for this problem, which contradicts with our earlier
conclusion. Thus, no UMP tests of size α exists for this problem.

Remark 4.2.1 It might appear that the test

φ3 (x) = I{x̄ ≤ C1 or x̄ ≥ C2 }

might be a UMP test here. But in order to have size α, we need to choose
√ √
C1 = θ0 − σ0 zα/2 / n, C2 = θ0 + σ0 zα/2 / n.

By plotting its power function again those of φ1 and φ2 , it is easy to see that φ3 is not a
UMP test.

4.3 Generalized N-P Lemma


4.3.1 Revision on N-P Lemma

N-P Lemma. Consider the problem of testing H0 : θ = θ0 against H1 : θ = θ1 .

(i). There exists a constant C ≥ 0 such that

φ(x) = 1 when f1 (x) > Cf0 (x)


= 0 when f1 (x) < Cf0 (x), (3.1)

satisfies

β(θ0 ) ≡ Eθ0 φ(X) = α. (3.2)

(ii). (Sufficiency.) If a test φ satisfies (3.1) and (3.2), then it is a UMP of size
α.
(iii). (Necessity.) If φ is a UMP of size α, then for some C it satisfies (3.1).

32
4.3.2 Generalized N-P Lemma (GNP Lemma)
Lemma 4.3.1 (GNP Lemma) Let f1 (x), ..., fm+1 (x) be real-valued functions. Define
Z
C = {φ(x) : 0 ≤ φ(x) ≤ 1, φ(x)fi (x)dx = ci , i = 1, ..., m}.
Z
C 0 = {φ(x) : 0 ≤ φ(x) ≤ 1, φ(x)fi (x)dx ≤ ci , i = 1, ..., m}.

(i). If there exist constants k1 , ..., km such that


P
φ∗ (x) = 1 when fm+1 (x) > m
i=1 ki fi (x)
Pm
= 0 when fm+1 (x) < i=1 ki fi (x) (3.3)
R
is a member of C, then φ∗ maximizes φ(x)fm+1 (x)dx for all φ ∈ C, that is,
Z Z
φ∗ (x)fm+1 (x)dx ≥ φ(x)fm+1 (x)dx, for all φ ∈ C.

(ii). If there exist constants k1 ≥ 0,R ..., km ≥ 0 such that φ∗ (x) in (i) is a
member of C, then φ∗ (x) maximizes φ(x)fm+1 (x)dx for all φ ∈ C 0 , that is,
Z Z

φ (x)fm+1 (x)dx ≥ φ(x)fm+1 (x)dx, for all φ ∈ C 0 .

(iii).

(a). The set


½µZ Z ¶ ¾
M= φ(x)f1 (x)dx, ..., φ(x)fm (x)dx , for all φ with 0 ≤ φ(x) ≤ 1

is convex and closed.


(b). If (c1 , ..., cm ) is an inner point of M , then there exist constants
k1 , ..., km such that φ∗ (x) defined in (i) is in C.
R
(c). If φ∗ ∈ C and it maximizes φ(x)fm+1 (x)dx for all φ ∈ C, then
(3.3) holds a.s..

Proof. (i). Since 0 ≤ φ(x), φ∗ (x) ≤ 1, then from (3.3), we have


" m
#
X
[φ∗ (x) − φ(x)] fm+1 (x) − ki fi (x) ≥ 0.
i=1

Thus from this and also φ ∈ C, φ∗ ∈ C, we have


Z Z m
X
[φ∗ (x) − φ(x)] fm+1 (x)dx ≥ [φ∗ (x) − φ(x)] ki fi (x)dx
i=1
m
X µZ Z ¶

= ki fi (x)φ (x)dx − fi (x)φ(x)dx
i=1
Xm
= ki (ci − ci )
i=1
= 0,

33
R R
which leads to fm+1 (x)φ∗ (x)dx ≥ fm+1 (x)φ(x)dx.

(ii). If k1 ≥ 0, ..., km ≥ 0, φ ∈ C 0 and φ∗ ∈ C, then similarly to (i), we have


Z m
X µ Z ¶
[φ∗ (x) − φ(x)] fm+1 (x)dx ≥ ki ci − fi (x)φ(x) ≥ 0,
i=1

(iii). The proof is omitted. See Lehmann (1986).

Corollary 4.3.1 Let f1 (x), ..., fm+1 (x) be pdf ’s (with respect to a probability measure
ν), and let 0 < α < 1. Then there exists a test φ of the form (3.3) such that
E1 φ(X) = ... = Em φ(X) = α, and Em+1 φ(X) > α,
Pm
unless fm+1 (x) = i=1 ki fi (x) with probability 1.
Proof. The proof will be done by induction over m.
(i). If m = 1, consider testing H0 : X ∼ f1 v.s. H1 : X ∼ f2 . This reduces to
N-P Lemma.
(ii). Assume now that the theorem holds for any set of m ≥ 2 pdf’s, and now
consider the case m + 1.
If f1 (x), ..., fm (x) are linearly dependent, then the number of fi (x)’s can be
reduced and the result follows from the induction hypothesis.
Assume now that f1 (x), ..., fm (x) are linearly independent. Then for each
1 ≤ j ≤ m, by the induction hypothesis, there exist tests φj such that
E1 φj (X) = ... = Ej−1 φj (X) = Ej+1 φj (X) = ... = Em φj (X) = α,
Ej φj (X) > α.
Let M and C be given as in the GNP lemma. Then from the above, we get
(E1 φ1 (X), α, α, ..., α)m ∈ M, where E1 φ1 (X) > α.
(α, E2 φ2 (X), α, ..., α)m ∈ M, where E2 φ2 (X) > α.
......
(α, ..., α, Em φm (X))m ∈ M, where Em φm (X) > α.
Also, by taking φ(X) ≡ 0 and 1, respectively, we see that (0, ..., 0) ∈ M and
(1, ..., 1) ∈ M .

Since M is convex and closed, we obtain that (α, ..., α)m is an inner point in
M . From the GNP Lemma, there exists k1 , ..., km such that φ∗ ∈ C and φ∗

34
R
maximizes φ(x)fm+1 (x)dx for all φ ∈ C. In particular, we can take φ0 (x) ≡ α.
Clearly, φ0 ∈ C. Then, we have
Z Z
∗ ∗
Em+1 φ (X) = φ (x)fm+1 (x)dx ≥ φ0 (x)fm+1 (x)dx = α.
P
We shall now show that the equality can not hold unless fm+1 (x) = m
i=1 ki fi (x)

with probability 1. Otherwise, if Em+1 φ (X) = α = Em+1 φ0 (X), then φ0 (x) ≡
α is also the optimal test. But from the uniqueness of the optimal test (i.e.,
GNP Lemma(iii)(b)) and the assumption that 0 < α < 1, we can use the part
(i) of the last theorem to get
m
X
fm+1 (x) = ki fi (x), with probability 1.
i=1

4.4 UMP tests for one-parameter exponential fami-


lies.
We have seen earlier that UMP tests don’t usually exist for two-sided hypotheses of
types (a) and (b). For type (c), we shall show that UMP test do exist if the underlying
distribution is from the one-parameter exponential family.

Theorem 4.4.1 Suppose that X follows a one-parameter exponential family with

fθ (x) = exp{η(θ)T (x) − ξ(θ)}h(x),

where θ is real-valued and η(θ) is a strictly increasing function of θ. Suppose we wish to


test the following two-sided hypothesis

H0 : θ ≤ θ1 or θ ≥ θ2 , where θ1 < θ2
H1 : θ 1 < θ < θ2 .

(i). For any 0 < α < 1, a UMP test of size α is

φ(x) = 1 when C1 < T (x) < C2 ,


= γi when T (x) = Ci , i = 1, 2
= 0 when T (x) < C1 or T (x) > C2 , (4.4)

where C1 < C2 and γi are determined by

β(θ1 ) ≡ Eθ1 φ(X) = α, β(θ2 ) ≡ Eθ2 φ(X) = α. (4.5)

(ii). The test φ in (i) minimizes βφ0 (θ) = Eθ φ0 (X) subject to Eθ1 φ0 (X) =
Eθ1 φ0 (X) = α for every fixed θ < θ1 or > θ2 .
(iii). Assuming that η(θ) is also a continuous function of θ. For 0 < α < 1,
the power function of this test has a maximum at point θ0 between θ1 and θ2
and decreases strictly as θ tends away from θ0 in either direction, unless there
exist two values t1 , t2 such that Pθ (T (X) = t1 ) + Pθ (T (X) = t2 ) = 1 for all θ.

35
Proof. (i). One can restrict attention to the sufficient statistic T = T (X), which has
pdf of the form
gθ (t) = exp{η(θ)t − ξ(θ)}h1 (t),
whose proof is illustrated only below for the discrete case:
X
gθ (t) = Pθ (T (X) = t) = fθ (x)
T (x)=t
X
= exp{η(θ)T (x) − ξ(θ)}h(x),
T (x)=t
X
= exp{η(θ)t − ξ(θ)}h(x),
T (x)=t
 
X
=  h(x) exp{η(θ)t − ξ(θ)},
T (x)=t
= exp{η(θ)t − ξ(θ)}h1 (t).

First let us find a UMP test of size α for testing

H0 : θ = θ1 or θ2
H1 : θ = θ3 where θ1 < θ3 < θ2

subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α. This is equivalent to the problem of maximizing
Eθ3 φ(X) = Eθ3 ψ(T ) subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α.

Lemma: Let M be the set of all points (Eθ1 ψ(T ), Eθ2 ψ(T )) as ψ ranges over
the totality of critical functions. Then the point (α, α) is an inner point of M .
Proof. Apply Corollary 4.3.1 by checking the linear independence condition.

From the GNP lemma, there exist constants k1 , k2 such that

ψ(t) = φ(x) = 1 if gθ3 (t) > k1 gθ1 (t) + k2 gθ2 (t),


0 if gθ3 (t) < k1 gθ1 (t) + k2 gθ2 (t),

and
Eθ1 φ(X) = Eθ2 φ(X) = α.
Substituting gθi (x) = exp{η(θi )t − ξ(θi )}h1 (t) for i = 1, 2, 3 into φ(x), also noting that
η(θ) is strictly increasing, then we can write
P
ψ(t) = 1 if exp{η(θ3 )t − ξ(θ3 )}h1 (t) > 2i=1 ki exp{η(θi )t − ξ(θi )}h1 (t),
P
0 if exp{η(θ3 )t − ξ(θ3 )}h1 (t) < 2i=1 ki exp{η(θi )t − ξ(θi )}h1 (t),
= 1 if a1 e−b1 t + a2 eb2 t < 1,
0 if a1 e−b1 t + a2 eb2 t > 1, where b1 , b2 > 0.

We have the following situations:

36
(a). Here a1 and a2 cannot both be ≤ 0, since then φ(x) ≡ 1 (i.e., the test
always reject H0 ), so that α = 1, which is in contradiction with the assumption
0 < α < 1.
(b). If a1 ≤ 0 and a2 > 0, then a1 e−b1 t + a2 eb2 t is a strictly increasing (de-
creasing) function. So

φ(x) = 1, a1 e−b1 t + a2 eb2 t < 1


0, a1 e−b1 t + a2 eb2 t > 1
= 1, t < t0
0, t > t0

This becomes a one-sided test and from the one-sided UMP theorem, we know
that the power function is strictly increasing. But this cannot be true in light
of the condition (4.5).
(c). If a2 ≤ 0 and a1 > 0, we can show similarly that this case can’t be true.
(d). From the above discussion, it follows that a1 > 0 and a2 > 0. In such a
case, a1 e−b1 t + a2 eb2 t is a convex function, approaching ∞ as t → ±∞. Thus
we have

φ(x) = 1 when C1 < T (x) < C2 ,


γi when T (x) = Ci , i = 1, 2
0 when T (x) < C1 or T (x) > C2 ,

where C1 < C2 and γi are determined by

Eθ1 φ(X) = Eθ2 φ(X) = α.

Since φ(x) does not depend on θ3 , therefore, it is UMP test for testing

H0 : θ = θ1 or θ2
H1 : θ1 < θ < θ2

subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α.

To complete the proof, we need to show that the test φ is of size α for the original
problem. Take φ0 ≡ α, so Eθ φ0 (X) ≡ α. From part (ii) of the theorem (whose proof will
be given after this), we get

Eθ ψ(T ) ≤ Eθ φ0 (X) = α, for every fixed θ < θ1 or θ > θ2 .

Also noting that Eθ1 ψ(T ) = Eθ2 ψ(T ) = α, we get supθ∈Θ0 Eθ ψ(T ) = α. This proves the
theorem.

(ii). Let θ4 < θ1 . We wish to find ψ 0 = 1 − π 0 which minimize Eθ4 ψ 0 (T ) subject


to Eθ1 ψ 0 (T ) = Eθ2 ψ 0 (T ) = α, which is equivalent to maximizing Eθ4 π 0 (T ) subject to
Eθ1 π 0 (T ) = Eθ2 π 0 (T ) = 1 − α.

37
We will apply the GNP Lemma. Since (α, α) is an inner point of M defined earlier,
then from the GNP lemma, there exist constants k1 , k2 such that

π 0 (t) = 1 if gθ4 (t) > k1 gθ1 (t) + k2 gθ2 (t),


0 if gθ4 (t) < k1 gθ1 (t) + k2 gθ2 (t),

such that Eθ1 π 0 (X) = Eθ2 π 0 (X) = 1 − α. Substituting gθi (t) = exp{η(θi )t − ξ(θi )}h1 (t)
for i = 1, 2, 4 into π 0 (t), also noting that η(θ) is strictly increasing, then we can write
P
π 0 (t) = 1 if exp{η(θ4 )t − ξ(θ4 )} > 2i=1 ki exp{η(θi )t − ξ(θi )},
P
0 if exp{η(θ4 )t − ξ(θ4 )} < 2i=1 ki exp{η(θi )t − ξ(θi )}
= 1 if a1 eb1 t + a2 eb2 t > k1
0 if a1 eb1 t + a2 eb2 t < k1 ,
(by dividing exp{η(θ1 )t − ξ(θ1 )} on both sides)

where a1 > 0 and b1 < 0 < b2 .


Similarly to (i), we can show that k1 > 0 and a2 > 0, in which case a1 eb1 t + a2 eb2 t is a
convex function approaching ∞ as t → ±∞. Thus we have

π 0 (t) = 0 when C1 < T (x) < C2 ,


γi when T (x) = Ci , i = 1, 2
1 when T (x) < C1 or T (x) > C2 ,

where C1 < C2 and γi are determined by Eθ1 π 0 (X) = Eθ2 π 0 (X) = 1 − α. Thus we have

ψ 0 (t) = 1 when C1 < T (x) < C2 ,


γi when T (x) = Ci , i = 1, 2
0 when T (x) < C1 or T (x) > C2 ,

where C1 < C2 and γi are determined by Eθ1 ψ 0 (T ) = Eθ2 ψ 0 (T ) = α. This takes the same
form as the UMP test obtained in (i). Since UMP tests are unique from Theorem 4.4.2
below, we prove the theorem for the case θ4 < θ1 .
The case for θ4 > θ2 can be shown similarly. The proof is complete.

(iii). From the assumption, the density Rgθ (t) is a continuous function of θ since η(θ)
is continuous. Therefore, β(θ) ≡ Eθ ψ(T ) = ψ(t)gθ (t)dt is continuous in θ. Without loss
of generality, let η(θ) = θ.
If the claim in (iii) is not true, then there exist three points θ0 < θ00 < θ000 such that
β(θ00 ) ≤ β(θ0 ) = β(θ000 ) = c, say. Then 0 < c < 1 since β(θ0 ) = Eθ0 ψ(T ) = 0 (or 1)
implies ψ(t) = 0 or 1 a.s. w.r.t. θ0 (thus w.r.t. any θ due to the fact that the support
of the exponential family is the same for all parameter θ), but this is excluded by the
assumption Eθ1 ψ(T ) = Eθ2 ψ(T ) = α, and 0 < α < 1. As is seen by the proof of (i),
the test φ maximizes Eθ00 ψ(T ) subject to Eθ0 ψ(T ) = Eθ000 ψ(T ) = c for all θ00 such that
θ0 < θ00 < θ000 . The test φ is also unique a.s. by Theorem 4.4.2. However, unless T takes on
at most two values with probability 1 for all θ, gθ0 , gθ00 and gθ000 are linearly independent,
which implies β(θ00 ) > c from Corollary 4.3.1, contradicting with earlier result. The proof
is thus complete.

38
4.4.1 Uniqueness of UMP tests
The constants Ci s and γi ’s given in the test of the last theorem are in fact unique. To
prove this, we need to introduce a lemma.

Lemma 4.4.1 Suppose that X has a p.d.f. in {fθ (x) : θ ∈ Θ ∈ R}. Let h be a function
with a single change of sign, i.e., there exists a value x0 such that

h(x) ≤ 0 for x < x0 ,


≥ 0 for x ≥ x0 . (4.6)

(i). If the family has MLR in x, there exists θ0 such that

Eθ [h(X)] ≤ 0 for θ < θ0 ,


≥ 0 for θ > θ0 ,

unless Eθ h(X) is either positive for all θ or negative for all θ.


(ii). If the family has strict MLR in x, has common support Cs = {x :
Pθ (x − ² ≤ X ≤ x + ²), ∀² > 0}, and fθ (x) > 0 all θ and all x ∈ Cs ,
Pθ (h(X) 6= 0) > 0 and Eθ0 [h(X)] = 0, then

Eθ [h(X)] < 0 for θ < θ0 ,


> 0 for θ > θ0 .

Proof. (i). Let θ1 < θ2 . Assume that Eθ1 [h(X)] > 0. We shall show that
Eθ2 [h(X)] ≥ 0. That is, once the function h(θ) = Eθ [h(X)] becomes positive at some
point θ1 , say, then after that point, it can never become negative (but can be equal to
zero).

To prove this, let us first show that c =: fθ2 (x0 )/fθ1 (x0 ) ∈ [0, ∞).

Proof. It is clear that c ≥ 0, and it suffices to show that c 6= ∞. Other-


wise, if c = ∞, then fθ2 (x)/fθ1 (x) = ∞ for all x ≥ x0 since fθ2 (x)/fθ1 (x)
is a nondecreasing function of x. Thus, fθ1 (x) = 0 for x ≥ x0 and hence
Eθ1 [h(X)] = Eθ1 [h(X)]I{X < x0 } ≤ 0 (as h(x) ≤ 0 for x < x0 ), which
contradicts with the assumption.

Next, we show that Eθ2 [h(X)] ≥ 0.

Proof. Now that 0 ≤ fθ2 (x0 )/fθ1 (x0 ) = c < ∞, it follows that

fθ2 (x)/fθ1 (x) ≤ c, for x < x0 .


≥ c, for x ≥ x0 . (4.7)

Define S = {x : fθ1 (x) = 0, and fθ2 (x) > 0}. Then S c = {x : fθ1 (x) >
0, or fθ2 (x) = 0}. Clearly, if x ∈ S, then fθ2 (x)/fθ1 (x) = ∞ > c, which
implies that x ≥ x0 , that is, we have S ⊂ [x0 , ∞). Hence

h(x) ≥ 0, for x ∈ S. (4.8)

39
Thus,
Z Z Z
Eθ2 [h(X)] = + h(x)fθ2 (x)dx ≥ h(x)fθ2 (x)dx (from (4.8))
S Sc Sc
Z
fθ2 (x)
= h(x) fθ (x)dx
Sc fθ1 (x) 1
Z Z
fθ2 (x)
= + h(x)fθ1 (x)dx
{x<x0 }∩S c {x≥x0 }∩S c fθ1 (x)
Z Z
≥ ch(x)fθ1 (x)dx + ch(x)fθ1 (x)dx (4.9)
{x<x0 }∩S c {x≥x0 }∩S c
(from (4.7) and (4.6))
Z Z
= ch(x)fθ1 (x)dx + ch(x)fθ1 (x)dx (as the second term is 0)
c
ZS S

= ch(x)fθ1 (x)dx
R
= cEθ1 [h(X)]
≥ 0 (as 0 ≤ c < ∞ and Eθ1 [h(X)] > 0 by assumption).

Finally, the result follows by letting θ0 = inf{θ : Eθ h(X) > 0}. Note that θ0 may be
infinite, if Eθ h(X) is either positive for all θ or negative for all θ; otherwise it is finite.

(ii). Let θ1 < θ2 . Assume that Eθ1 [h(X)] ≥ 0. We shall show that Eθ2 [h(X)] ≥ 0.
That is, once the function h(θ) = Eθ [h(X)] becomes zero or positive at some point θ1 ,
say, then after that point, it will remain positive.
First, under the assumed conditions, we have c := fθ2 (x)/fθ1 (x) ∈ (0, ∞) for all x ∈ Cs
(at the boundary of Cs , we could have 0/0, which will be defined to be 1).

Next, we show that Eθ2 [h(X)] > 0.


Proof. Now that 0 ≤ fθ2 (x0 )/fθ1 (x0 ) = c < ∞, it follows that

fθ2 (x)/fθ1 (x) < c, for x < x0 .


> c, for x > x0 . (4.10)

Thus,
Z Z
fθ2 (x)
Eθ2 [h(X)] = h(x)fθ2 (x)dx = fθ (x)dx h(x)
Cs Cs fθ1 (x) 1
Z Z
fθ2 (x)
= + h(x)fθ1 (x)dx
{x<x0 }∩Cs {x>x0 }∩Cs fθ1 (x)
Z Z
> ch(x)fθ1 (x)dx + ch(x)fθ1 (x)dx (4.11)
{x<x0 }∩Cs {x≥x0 }∩Cs
[from (4.10) and (4.6)) and Pθ (h(X) 6= 0) > 1]
Z
= ch(x)fθ1 (x)dx
Cs
= cEθ1 h(X).

So we have shown that Eθ2 [h(X)] > cEθ1 [h(X)]. In particular, by choosing
θ ≡ θ2 > θ1 ≡ θ0 , we get Eθ [h(X)] > cEθ0 [h(X)] = 0 for θ > θ0 . Similarly, we
can show that Eθ [h(X)] < 0 for θ < θ0 .

40
Remark 4.4.1 The lemma means that if h(x) has a change of sign somewhere, then
Eθ [h(x)] also has a (strict) change of sign somewhere, if the family has (strict) MLR.

The next theorem shows that Ci ’s and γi ’s in the UMP tests for two-sided hypothesis
for the exponential family (in the previous theorem) are uniquely determined.

Theorem 4.4.2 Assume that the conditions in Theorem 4.4.1 hold.

(i). If φ1 and φ2 are two tests of the form (4.4) satisfying Eθ1 φ1 (X) =
Eθ1 φ2 (X) = α. If the region {φ2 (x) 6= 0} is to the right of {φ1 (x) 6= 0}
a.s. (see the following remark for more explanations), then

Eθ φ1 (X) < Eθ φ2 (X) for θ > θ1 ,


> Eθ φ2 (X) for θ < θ1 .

(ii). If φ1 and φ2 satisfy not only Eθ1 φ1 (X) = Eθ1 φ2 (X) = α, but also
Eθ2 φ1 (X) = Eθ2 φ2 (X) = α, then φ1 (x) = φ2 (x) a.s..

Proof. (i). Define h(t) = φ2 (t) − φ1 (t), then Eθ1 h(X) = 0 from the assumption. Since h
has a single change of sign, the result follows from the last lemma (ii).

(ii). From (i), it follows that φ1 can not be either to the left or to the right of φ2 . So
φ1 has to be in the middle of φ2 with probability 1 or the other way around. Suppose φ1
is in the middle of φ2 . Then, φ1 ≤ φ2 a.s. From the assumption, we have

Eθi |φ2 (X) − φ1 (X)| = Eθi [φ2 (X) − φ1 (X)] = 0, i = 1, 2.

Then |φ2 (X) − φ1 (X)| = 0, a.s. w.r.t fθi (i = 1, 2). So, φ1 (X) = φ2 (X) a.s. w.r.t. fθ .

Remark 4.4.2 Suppose that φ1 and φ2 are two tests of the form (4.4).

(a) We say the region {φ2 (x) 6= 0} is to the right of {φ1 (x) 6= 0} if

(i) C1φ2 ≥ C1φ1 and C2φ2 ≥ C2φ1 , and at least one of inequalities hold
strictly.
(ii) C1φ2 = C1φ1 and C2φ2 = C2φ1 , but γ1φ2 ≥ γ1φ1 and γ2φ2 ≥ γ2φ1 , and
at least one of inequalities hold strictly.

In (i) and (ii), if both inequalities hold strictly, we say that {φ2 (x) 6= 0} is
strictly to the right of {φ1 (x) 6= 0}.
(b) It can be shown easily that if {φ2 (x) 6= 0} is strictly to the right of
{φ1 (x) 6= 0}, then φ2 (x) − φ1 (x) has a strict change of signs (from negative to
positive).
However, if {φ2 (x) 6= 0} is to the right (but not strictly) of {φ1 (x) 6= 0},
then φ2 (x) − φ1 (x) is always non-negative or non-positive. In such cases, it
can be shown that Eθ [h(X)] = 0 has no solution.
(c) We can define the region {φ2 (x) 6= 0} to be (strictly) inside {φ1 (x) 6= 0}
and derive similar results to those in (a) and (b).

41
Remark 4.4.3 This theorem shows that Ci ’s and γi ’s are uniquely determined. It also
indicates how to determine them in practice. One can start with some initial trial values
C1∗ , γ1∗ , find C2∗ , γ2∗ such that β ∗ (θ1 ) = α, and compute β ∗ (θ2 ), which will usually be either
too large or too small. If β ∗ (θ2 ) < α, the correct acceptance region is to the right of the
chosen one, that is, it satisfies either C1 > C1∗ or C1 = C1∗ and γ1 < γ1∗ . The converse is
true if β ∗ (θ2 ) > α.

42
4.5 UMP tests for totally positive families.
The theorems given in the last section apply to the one-parameter families. However, they
hold for some other cases too. One such example is the case for totally positive families.
(See Lehmann, 1986, p119.)
Definition. A family of distributions with p.d.f.’s fθ (x), where θ and x are real-valued,
is said to be totally positive of order r (TPr ) if for all x1 < ... < xn and θ1 < ... < θn ,
¯ ¯
¯ fθ (x1 ), ..., fθ (xn ) ¯
¯ 1 1 ¯
¯ f (x ), ..., f (x ) ¯
¯ ¯
∆n = ¯¯ θ2 1 θ2 n
¯ ≥ 0, for all n = 1, 2, ..., r.
¯ ... ... ... ¯
¯
¯ f (x ), ..., f (x ) ¯
θn 1 θn n

If ∆n > 0, then it is said to be strictly totally positive of order r (STPr ).

Remark 4.5.1 We can prove the following facts:


(i). For r = 1, |∆n | ≥ 0 (or > 0) means fθ (x) ≥ 0 (or > 0).

(ii). For r = 2, |∆n | ≥ 0 (or > 0) means fθ (x) has (strict) MLR in x.
Proof. |∆n | = fθ1 (x1 )fθ2 (x2 ) − fθ2 (x1 )fθ1 (x2 ) ≥ 0 implies
fθ2 (x2 ) fθ (x1 )
≥ 2 .
fθ1 (x2 ) fθ1 (x1 )

(iii). Suppose a(θ) > 0, b(x) > 0. Then if fθ (x) is STPr , so is a(θ)b(x)fθ (x).

(iv). The one-parameter exponential family with T (x) = x and η(θ) = θ is


STP∞ .
Proof. Left as an exercise.

Lemma 4.5.1 Suppose that fθ (x) is STP3 . Define


g(x) = k1 fθ1 (x) + k2 fθ2 (x) + k3 fθ3 (x), k1 > 0.
Then if g(x1 ) = g(x2 ) = 0, then the function is positive outside the interval (x1 , x2 ) and
negative inside.
Proof. For any x1 < x3 < x2 any θ1 < θ2 < θ3 , we have
¯ ¯ ¯ ¯
¯ g(x ) g(x3 ) g(x2 ) ¯ ¯ f (x ) f (x ) f (x ) ¯
¯ 1 ¯ ¯ θ1 1 θ1 3 θ1 2 ¯
0 ¯ ¯ ¯ ¯
∆3 = ¯ fθ2 (x1 ) fθ2 (x3 ) fθ2 (x2 ) ¯ = k1 ¯ fθ2 (x1 ) fθ2 (x3 ) fθ2 (x2 ) ¯ > 0.
¯ ¯ ¯ ¯
¯ fθ (x1 ) fθ3 (x3 ) fθ3 (x2 ) ¯ ¯ fθ (x1 ) fθ (x3 ) fθ (x2 ) ¯
3 3 3 3

If g(x1 ) = g(x2 ) = 0, then we have


0 < ∆03 = −g(x3 ) [fθ2 (x1 )fθ3 (x2 ) − fθ2 (x2 )fθ3 (x1 )]
" #
fθ2 (x1 ) fθ2 (x2 )
= −g(x3 )fθ3 (x1 )fθ3 (x2 ) − ,
fθ3 (x1 ) fθ3 (x2 )
f (x)
which implies that g(x3 ) < 0 for all x1 < x3 < x2 (Note that fθθ2 (x) is strictly decreasing
3
and fθi (x) > 0 from Remark 4.5.1(i) and (ii)). Similarly, we can show that g(x3 ) > 0 for
all x3 < x1 or x3 > x2 .

43
Remark 4.5.2 It follows from the lemma that the equation g(x) = 0 has at most two
solutions.

Theorem 4.5.1 Suppose that the distribution of X is in a family of p.d.f.’s indexed by


a real-valued parameter θ, and T (X) is a real-valued sufficient statistic with p.d.f. fθ (u).
Assume that fθ (u) is STP3 . We wish to test

H0 : θ ≤ θ1 or θ ≥ θ2 , where θ1 < θ2
H1 : θ 1 < θ < θ2 .

(i). For any 0 < α < 1, a UMP test of size α is

φ(x) = 1 when C1 < T (x) < C2 ,


= γi when T (x) = Ci , i = 1, 2
= 0 when T (x) < C1 or T (x) > C2 ,

where C1 < C2 and γi are determined by

β(θ1 ) ≡ Eθ1 φ(X) = α, β(θ2 ) ≡ Eθ2 φ(X) = α.

(ii). This test φ minimizes βφ0 (θ) = Eθ φ0 (X) subject to Eθ1 φ0 (X) = Eθ2 φ0 (X) =
α for every fixed θ < θ1 and > θ2 .
(iii). For 0 < α < 1, the power function of this test has a maximum at
point θ0 between θ1 and θ2 and decreases strictly as θ tends away from θ0
in either direction, unless there exist two values t1 , t2 such that Pθ (T (X) =
t1 ) + Pθ (T (X) = t2 ) = 1 for all θ.

Proof. The proof of the theorem is very similar to that of Theorem 4.4.1. For illustration,
here we only give a proof of part (i) below.
One can restrict attention to the sufficient statistic T = T (X). First let us find a
UMP test of size α for testing

H0 : θ = θ1 or θ2
H1 : θ = θ3 where θ1 < θ3 < θ2

subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α. This is equivalent to the problem of maximizing
Eθ3 φ(X) = Eθ3 ψ(T ) subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α. Let M be the set of all points
(Eθ1 ψ(T ), Eθ2 ψ(T )) as ψ ranges over the totality of critical functions. Then the point
(α, α) is an inner point of M . From the GNP lemma, there exist constants a1 , a2 such
that

ψ(t) = φ(x) = 1 if fθ3 (t) > a1 fθ1 (t) + a2 fθ2 (t),


0 if fθ3 (t) < a1 fθ1 (t) + a2 fθ2 (t),

and
Eθ1 φ(X) = Eθ2 φ(X) = α.

44
Then we have

ψ(t) = 1 if fθ3 (t) > a1 fθ1 (t) + a2 fθ2 (t)


0 if fθ3 (t) > a1 fθ1 (t) + a2 fθ2 (t)
= 1 if g(t) < 0
0 if g(t) > 0
= 1 if g1 (t) < 0
0 if g1 (t) > 0,

where
" #
fθ (t) fθ (t)
g(t) = a1 fθ1 (t) − fθ3 (t) + a2 fθ2 (t) = fθ3 (t) a1 1 + a2 2 −1
fθ3 (t) fθ3 (t)
g(t) fθ (t) fθ (t)
g1 (t) = = a1 1 + a2 2 − 1, (note that fθ3 (t) > 0.)
fθ3 (t) fθ3 (t) fθ3 (t)
We have the following situations:
(a). Assume a1 ≤ 0.

(a1). If a2 ≤ 0, then g(t) < 0 or g1 (t) < 0, hence φ(x) ≡ 1 (i.e., the
test always reject H0 ), so that α = 1, which is in contradiction with
the assumption 0 < α < 1.
(a2). If a2 > 0, then g1 (t) is a strictly increasing function. So φ(x)
becomes a one-sided test and the power function can be shown to be
strictly increasing. But this cannot be true in light of the condition
(4.5).

(b). From (a), we know that a1 > 0. Then applying Lemma 4.5.1, we have

φ(x) = 1 when C1 < T (x) < C2 ,


γi when T (x) = Ci , i = 1, 2
0 when T (x) < C1 or T (x) > C2 ,

where C1 < C2 and γi are determined by

Eθ1 φ(X) = Eθ2 φ(X) = α.

Since φ(x) does not depend on θ3 , therefore, it is UMP test for testing

H0 : θ = θ1 or θ2
H1 : θ1 < θ < θ2

subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α.


To complete the proof, we need to show that the test φ is of size α for the original problem.
Take φ0 ≡ α, so Eθ φ0 (X) ≡ α. From part (ii) of the theorem, we get

Eθ ψ(T ) ≤ Eθ φ0 (X) = α, for all θ < θ1 or θ > θ2 .

Also noting that Eθ1 ψ(T ) = Eθ2 ψ(T ) = α, we get supθ∈Θ0 Eθ ψ(T ) = α. This proves the
theorem.

45
4.6 Some examples
4.6.1 Normal example.
Let X1 , . . . , Xn ∼ N (µ, 1). Then a UMP test of size α for testing H0 : µ ≤ µ1 or µ ≥ µ2
v.s. H1 : µ1 < µ < µ2 is
φ(x) = I{x̄ : C1 < x̄ < C2 },
where C1 and C2 are determined from
³ ´
Eµi φ(X) = Pµi C1 < X̄ < C2
√ √
= Φ( n(C2 − µi )) − Φ( n(C1 − µi )) = α i = 1, 2.

4.6.2 A non-regular example∗ . (Optional)


We now give an example of a UMP test where the family is neither an exponential family
or STPr family. So we can only resort to the GNP Lemma.

Example. Let f and g be two known p.d.f.’s. Suppose that X has the p.d.f. θf (x) +
(1 − θ)g(x), 0 ≤ θ ≤ 1. Show that the test φ(x) ≡ α is a UMP test of size α for testing
H0 : θ ≤ θ1 or θ ≥ θ2 v.s. H1 : θ1 < θ < θ2 .

Solution. First let us consider finding a UMP test φ for testing


H0 : θ = θ1 or θ2
H1 : θ = θ3 , where θ1 < θ3 < θ2
subject to
Eθ1 φ(X) = Eθ2 φ(X) = α. (6.12)
That is, we need to maximize Eθ3 φ(X) subject to (6.12), which is equivalent to
Z Z
Eθ1 φ(X) = θ1 φ(x)f (x)dx + (1 − θ1 ) φ(x)g(x)dx = α,
Z Z
Eθ2 φ(X) = θ2 φ(x)f (x)dx + (1 − θ2 ) φ(x)g(x)dx = α.

Here we are looking for a UMP test of size α, that is,


sup Eθ φ(X) = max{Eθ1 φ(X), Eθ1 φ(X)} = α.
θ∈Θ

So either Eθ1 φ(X) = α or Eθ2 φ(X) = α. For simplicity, we assume that Eθ1 φ(X) = α,
then we must have Eθ2 φ(X) = α. Otherwise, if not, we would have
Eθ1 φ(X) = α, Eθ2 φ(X) < α.
However, since the power function is a continuous function of θ, we can always find a
θ ∈ [θ1 , θ2 ] such that Eθ φ(X) < α, which implies that φ can not be UMP (since its power
is not greaterR than the power ofR the simple test φ0 ≡ α). Thus we have have (6.12).
Treating φ(x)f (x)dx and φ(x)g(x)dx as two unknown variables, and noting that
the determinant of the two equations is
¯ ¯ ¯ ¯
¯ θ (1 − θ1 ) ¯ ¯ θ 1 ¯
¯ 1 ¯ ¯ 1 ¯
¯ ¯=¯ ¯ = θ1 − θ2 6= 0,
¯ θ2 (1 − θ2 ) ¯ ¯ θ2 1 ¯

46
then we have Z Z
φ(x)f (x)dx = φ(x)g(x)dx = α.
From this, it follows that
Z Z
Eθ3 φ(X) = θ3 φ(x)f (x)dx + (1 − θ3 ) φ(x)g(x)dx = α.

Note also that Eθ3 α = α. By the uniqueness of a UMP test from the GNP lemma, we get

φ(X) = α, a.s.

Since θ3 is arbitrary, we see that φ(X) = α is a UMP test of size for testing for

H0 : θ ≤ θ1 or θ ≥ θ2
H1 : θ1 < θ < θ2 .

4.7 Summary
UMP tests do not usually exist for two-sided tests. But for one parameter exponential
family or strictly totally positive family, UMP tests do exist for one type of two-sided
tests. Generalized Neyman-Pearson Lemma plays a critical role in deriving the UMP
tests in these cases.

4.8 Exercise
1. If X1 , . . . , Xn is from the exponential family

fθ (x) = exp{η(θ)T (x) − ξ(θ)}h(x),

where η(θ) is strictly monotone in θ. Show that UMP tests do not exist for testing
H0 : θ = θ0 v.s. H1 : θ 6= θ0 .

2. Show that the following one-parameter exponential family

gθ (t) = a(θ)b(t)eη(θ)t

is STP∞ , where η(θ) is strictly increasing.

47
Chapter 5

Unbiased Tests.

5.1 Definitions.
In this chapter, we are interested in finding “optimal” tests for two-sided hypotheses of
the following types:

(a) H0 : θ = θ0
H1 : θ 6= θ0 ,

(b) H0 : θ1 ≤ θ ≤ θ2
H1 : θ < θ1 or θ > θ2

where θ1 < θ2 and θ0 are three given constants.

We’ve seen in the last chapter that UMP tests may NOT exist in these two cases. It is
therefore necessary to narrow down our search in a smaller class of tests. One such class
is that of unbiased tests.

Definition. A test φ for testing H0 : θ ∈ Θ0 v.s. H1 : θ ∈ Θ1 is said to be unbiased if


its power function βφ (θ) = Eθ φ(X) satisfies

(i). βφ (θ) ≤ α, if θ ∈ Θ0 ,
(ii). βφ (θ) ≥ α, if θ ∈ Θ1 . (1.1)

Remark 5.1.1 .

(i). Note that first condition is simply supθ∈Θ0 βφ (θ) ≤ α, i.e., φ is of level α.
(ii). If UMP tests exist, they must be unbiased.
(iii). The second requirement simply means that the test φ is no worse than
the silly test φ0 (x) ≡ α.
(iv). Another interpretation of (ii) is that the probability of rejecting H0 when
H1 is true should be at least as large as that of rejecting H0 when H0 is true,
i.e.,
P (Reject H0 |H1 is true) ≥ P (Reject H0 |H0 is true).

48
Definition. An unbiased test φ of level α for testing H0 : θ ∈ Θ0 v.s. H1 : θ ∈ Θ1 is said
to be uniformly most powerful unbiased (UMPU) if for any other unbiased test φ0 of level
α, we have

βφ (θ) ≥ βφ0 (θ), for θ ∈ Θ1 .

In other words, φ is UMP among all unbiased level α tests.

Lemma 5.1.1 If βφ (θ) is a continuous function of θ, unbiasedness implies that

βφ (θ) = α for all θ ∈ ω, (1.2)

where ω =: ∂Θ0 ∩ ∂Θ1 is the common boundary of Θ0 and Θ1 (i.e., the set of points θ
that are points or limit points of both Θ0 and Θ1 ).

Definition. Tests satisfying (1.2) are said to be similar on the boundary (SOB).

Proof of Lemma 5.1.1. For any point θ ∈ ω, there exists sequences θ01 , θ02 , θ03 , ..., ∈ Θ0
and θ11 , θ12 , θ13 ..., ∈ Θ1 , such that θ0n → θ and θ1n → θ as n → ∞. Since βφ (θ) is
continuous and the test φ is unbiased, we have

α ≥ n→∞
lim βφ (θ0n ) = βφ (θ) = n→∞
lim βφ (θ1n ) ≥ α

Therefore, βφ (θ) = α for any θ ∈ ω.

It is easier to work with (1.2) than (1.1), and the next lemma is useful in determining
a UMPU test.

Lemma 5.1.2 (Similarity Lemma) If X has p.d.f. {fθ (x) : θ ∈ Θ} such that the power
function of every test is continuous, and if φ0 is UMP SOB (i.e., among all tests satisfying
(1.2)) and is a level α test, then φ0 is UMPU.

Proof. All power functions are continuous by assumption. From Lemma 5.1.1, any
unbiased test must be similar on the boundary. Since φ0 is UMP of level α among
all tests satisfying (1.2) by assumption, it is also uniformly as least as powerful as any
unbiased test. Finally, we note that φ0 is also unbiased since it is at least as powerful as
the trivial test φ(x) ≡ α (since the later is unbiased and also similar on the boundary).
So φ0 is UMPU.

Lemma 5.1.3 If X has p.d.f. {fθ (x) : θ ∈ Θ} such that the power function of every test
is continuous, and if φ0 is UMP SOB (i.e., among all tests satisfying (1.2)) and is a level
α test, then φ0 is UMPU.

Proof. All power functions are continuous by assumption. From Lemma 5.1.1, any
unbiased test must be similar on the boundary. Since φ0 is UMP of level α among
all tests satisfying (1.2) by assumption, it is also uniformly as least as powerful as any
unbiased test. Finally, we note that φ0 is also unbiased since it is at least as powerful as
the trivial test φ(x) ≡ α (since the later is unbiased and also similar on the boundary).
So φ0 is UMPU.

49
5.2 UMPU for One-parameter exponential family
5.2.1 Case I
Theorem 5.2.1 Let θ be a real-valued parameter, and X = {X1 , . . . , Xn } is a random
sample from a one-parameter exponential family

fθ (x) = exp{θT (x) − ξ(θ)}h(x)

Suppose we wish to test the following two-sided hypothesis

H0 : θ1 ≤ θ ≤ θ2 ,
H1 : θ < θ1 or > θ2 , (where θ1 < θ2 ).

(i). For any 0 < α < 1, a UMPU test of size α is

φ(x) ≡ ψ(T (x)) = 1 when T (x) < C1 or T (x) > C2 ,


= γi when T (x) = Ci , i = 1, 2
= 0 when C1 < T (x) < C2 ,

where C1 < C2 and γi are determined by

β(θ1 ) ≡ Eθ1 φ(X) = α, β(θ2 ) ≡ Eθ2 φ(X) = α. (2.3)

(ii). The power function of this test, Eθ φ(X), has a minimum at point θ0
between θ1 and θ2 and increases strictly as θ tends away from θ0 in either
direction.

Proof. (i). It is easy to show that the power function Eθ φ(X) is continuous in θ, so one
can apply Lemma 5.1.2 (i.e., Similarity Lemma). Clearly the boundary set is ω = {θ1 , θ2 },
and by the lemma, we consider first the problem of maximizing Eθ0 φ(X) for θ0 outside
the interval [θ1 , θ2 ], subject to (2.3). This is equivalent to minimizing Eθ0 [1 − φ(X)] for θ0
outside the interval [θ1 , θ2 ], subject to

Eθ1 [1 − φ(X)] = Eθ2 [1 − φ(X)] = 1 − α. (2.4)

However, note that we can write

1 − φ(x) = 1 when C1 < T (x) < C2 ,


= 1 − γi when T (x) = Ci , i = 1, 2
= 0 when T (x) < C1 or T (x) > C2 . (2.5)

From Theorem 4.4.1, we see that 1 − φ(x) does minimize Eθ0 [1 − φ(X)] for θ0 6∈ [θ1 , θ2 ],
subject to (2.4). Therefore, the test φ(x) is UMP amongst those satisfying (2.3), and
hence is UMPU by the above lemma.
(ii). Continued from part (i), also from Theorem 4.4.1, Eθ [1 − φ(X)] has a maximum
at point θ0 between θ1 and θ2 and decreases strictly as θ tends away from θ0 in either
direction. This is, Eθ φ(X) has a minimum at point θ0 between θ1 and θ2 and increases
strictly as θ tends away from θ0 in either direction.

50
5.2.2 Case II
Theorem 5.2.2 Let θ be a real-valued parameter, and X1 , . . . , Xn is a random sample
from a one-parameter exponential family

fθ (x) = exp{θT (x) − ξ(θ)}h(x) = D(θ)eθT (x) h(x).

Suppose we wish to test the following two-sided hypothesis

H0 : θ = θ0 ,
H1 : θ 6= θ0 .

(i). For any 0 < α < 1, a UMPU test of size α is

φ(x) ≡ ψ(T (x)) = 1 when T (x) < C1 or T (x) > C2 ,


= γi when T (x) = Ci , i = 1, 2
= 0 when C1 < T (x) < C2 ,

where C1 < C2 and γi are determined by

Eθ0 ψ(T ) = α, (2.6)


Eθ0 [T ψ(T )] = αEθ0 [T ] . (2.7)

(Note that (2.7) is equivalent to Covθ0 (T, ψ(T )) = 0.)

(ii). The power function of this test, Eθ φ(X), has a minimum at point θ0
between θ1 and θ2 and increases strictly as θ tends away from θ0 in either
direction.

Proof. (i). One can restrict attention to the sufficient statistic T = T (X), which has pdf
of the form
gθ (t) = exp{θt − ξ(θ)}h1 (t) = D(θ)eθt h1 (t).

We need to introduce a couple of lemmas.


Lemma 5.2.1 If ψ(T ) is unbiased, then (2.6) and (2.7) hold.

Proof. Here, Θ0 = {θ0 }, Θ1 = {θ : θ 6= θ0 }, and the common boundary is


ω = {θ0 }.
First, (2.6) follows from Lemma 5.1.1.
Next, let us show (2.7). Denote the power function
Z
βψ (θ) = Eθ [ψ(T )] = ψ(t)D(θ)eθt h1 (t) dt.

If θ 6= θ0 , then θ ∈ Θ1 and θ0 ∈ Θ0 . By the definition of an unbiased test,

βψ (θ) ≥ α ≥ βψ (θ0 ). (2.8)

That is, βψ (θ) must have a minimum at θ = θ0 . For the exponential


family, it can be shown that β(θ) is differentiable, thus

βψ0 (θ0 ) = 0.

51
However,
Z Z
βψ0 (θ) = θt
ψ(t)D(θ)te h1 (t) dt + ψ(t)D0 (θ)eθt h1 (t) dt (2.9)
D0 (θ)
= Eθ [T ψ(T )] + Eθ [ψ(T )]. (2.10)
D(θ)

Note that the above identity is valid for any test ψ. In particular, by taking
ψ(T ) ≡ ψ0 (T ) = α, we have β(θ) ≡ α. Then the above identity reduces

D0 (θ) D0 (θ)
0 = Eθ [T α] + α, i.e., 0 = Eθ T + .
D(θ) D(θ)

Substituting this into (2.10) gives

βψ0 (θ) = Eθ [T ψ(T )] − Eθ [T ]Eθ [ψ(T )] = covθ (T, ψ(T )).

From this, (2.6) and (2.8), we get 0 = β 0 (θ0 ) = Eθ0 [T ψ(T )] − αEθ0 [T ].

Lemma 5.2.2 Let M be the set of points (Eθ0 [ψ(T )], Eθ0 [T ψ(T )]) as ψ ranges
over the totality of critical functions. We’ll now show that the point (α, αEθ0 (T ))
is an inner point of M .

Proof. Note that M is convex (since ψ1 , ψ2 ∈ M =⇒ αψ1 + (1 − α)ψ2 ∈ M


for α ∈ [0, 1]) and contains all points (u, uEθ0 (T )) with u ∈ [0, 1] (since one
can choose the test ψ(x) = u). It also contains points (α, u2 ) with u2 >
αEθ0 (T ) and (α, u1 ) with u1 < αEθ0 (T ). This follows from Lemma 5.2.3 (in
the next subsection) that there exist tests with Eθ0 [ψ(T )] = α and β 0 (θ0 ) =
Eθ0 [T ψ(T )] − αEθ0 [T ] > 0. Similarly, M also contains points (α, u1 ) with
u1 < Eθ0 (T ). Therefore, the point (α, αEθ0 (T )) is an inner point of the region
spanned by the four points (0, 0), (α, u1 ), (1, Eθ0 [T ]) and (α, u2 ). This proves
our statement.

We now return to the main proof of the theorem. Consider finding a UMP test for simple
hypotheses

H0 : θ = θ0 ,
H1 : θ = θ 0 , where θ0 6= θ0

subject to (2.6) and (2.7). Since the point (α, αEθ0 (T )) is an inner point of M , then by
the GNP lemma, there exists constants k1 and k2 and a test ψ(t) such that

ψ(t) = 1 if gθ0 (t) > k1 gθ0 (t) + k2 tgθ0 (t)


0 if gθ0 (t) < k1 gθ0 (t) + k2 tgθ0 (t)
0
= 1 if D(θ0 )eθ t > k1 D(θ0 )eθ0 t + k2 tD(θ0 )eθ0 t
0
0 if D(θ0 )eθ t < k1 D(θ0 )eθ0 t + k2 tD(θ0 )eθ0 t
= 1 if ebt > a1 + a2 t
0 if ebt < a1 + a2 t

52
Note that the left hand side of the inequality of ebt > a1 + a2 t is exponential and the right
hand side is linear. Thus the region is either one-sided or the outside of an interval. The
former is impossible since the power function has a strictly monotone power function and
therefore the test φ can not be possibly unbiased. Therefore, we get

φ(x) = 1 when T (x) < C1 or T (x) > C2 ,


= γi when T (x) = Ci , i = 1, 2
= 0 when C1 < T (x) < C2 ,

where C1 < C2 and γi are determined by (2.6) and (2.7). The proof of part (i) follows
then from the lemma in the last section.

(ii). The power function β(θ) = Eθ [ψ(T )] is a continuous function and has a minimum
at θ = θ0 (see the proof in Lemma 5.2.1). Thus there exists θ1 < θ2 such that θ0 ∈ [θ1 , θ2 ]
and β(θ1 ) = β(θ2 ) = c where α ≤ c < 1. Therefore, the test ψ is also UMPU test of size c
for testing H0 : θ1 ≤ θ ≤ θ2 , and it follows from the last theorem that the power increases
strictly as θ moves away from θ0 in either direction.

Remark 5.2.1 .

(1). The two conditions in testing H0 : θ = θ0 versus H1 : θ = θ0 are

Eθ0 [ψ(T )] = α,

Eθ0 [T ψ(T )] = Eθ0 T Eθ0 [ψ(T )] = αEθ0 T, (i.e., Covθ0 (T, ψ(T )) = 0.)
which is equivalent to
Eθ0 [1 − ψ(T )] = 1 − α,
Eθ0 [T (1 − ψ(T ))] = Eθ0 T Eθ0 [1 − ψ(T )] = (1−α)Eθ0 T, (i.e., Covθ0 (T, 1−ψ(T )) = 0.)
which is often simpler to use.

(2). The above two conditions can sometimes be simplified. If for θ = θ0 , the
distributionof T is symmetric about some point a, that is,

Pθ0 (T < a − u) = Pθ0 (T > a + u) for all real u.

Then any test which is symmetric about a and satisfies (2.6) must also satisfy
(2.7). To see this, first note that we can write ψ(t) = ψ0 (t − a) and F (x) =
F0 (x − a) where ψ0 and F0 are both symmetrical around 0, and hence
Z Z
Eθ0 [(T − a)ψ(T )] = (t − a)ψ0 (t − a)dF0 (t − a) = tψ0 (t)dF0 (t) = 0,

Eθ0 [T ψ(T )] = Eθ0 [(T − a)ψ(T )] + aEθ0 [ψ(T )] = aα = αEθ0 [T ] .


The C’s and γ’s are thus determined by
α
Pθ0 (T < C1 ) + γ1 Pθ0 (T = C1 ) = ,
2
C2 = 2a − C1 , γ 2 = γ1 .

53
Although it appears that there is only one condition, however, the other con-
dition is symmetry of the test, which is hidden.

(3). The results presented here can be generalized to the STPr family.

(4). If X has pdf of the form

fθ (x) = exp{Q(θ)T (x) − ξ(θ)}h(x) = D(θ)eηT (x) h(x).

and η = Q(θ) is strictly monotone, similar results in this section are also true
since we can test η first and transform back to θ. One such example is the
Poisson example.

5.2.3 A lemma used in the proof of the last theorem.


Let θ be a real-valued parameter, and X = {X1 , . . . , Xn } is a random sample from a
one-parameter exponential family

fθ (x) = exp{Q(θ)T (x) − ξ(θ)}h(x) = D(θ)eQ(θ)T (x) h1 (x).

where Q(θ) is strictly increasing. Consider testing the one-sided hypothesis

H0 : θ ≤ θ0 , versus H1 : θ > θ0 .

It is known from the earlier chapter that for any 0 < α < 1, there exists a UMP test φ of
size α of the form

φ(x) = 1 when T (x) > C,


= γ when T (x) = C, (2.11)
= 0 when T (x) < C,

where C and γ are determined by Eθ0 φ(X) = α and the power function βφ (θ) = Eθ φ(X)
is strictly increasing.

Lemma 5.2.3 (Lehmann, 1986, p117, Question 22.) If Q(θ) is differentiable, then βφ0 (θ) >
0 for all θ for which Q0 (θ) > 0.

Proof. First Z
βψ (θ) = Eθ [ψ(T )] = ψ(t)D(θ)eQ(θ)t h1 (t) dt.
For exponential family, it can be shown that β(θ) is differentiable, and the order of
integration and differentiation can be interchanged, so that for all tests ψ(t), we have
Z Z
βψ0 (θ) = 0
ψ(t)D(θ)Q (θ)te Q(θ)t
h1 (t) dt + ψ(t)D0 (θ)eQ(θ)t h1 (t) dt
D0 (θ)
= Q0 (θ)Eθ [T ψ(T )] + Eθ [ψ(T )].
D(θ)
For ψ(T ) ≡ α, we have βα (θ) ≡ α, then the above becomes
D0 (θ)
0 = Q0 (θ)Eθ T + .
D(θ)

54
Substituting this into the expression for β 0 (θ) gives

βψ0 (θ) = Q0 (θ) (Eθ [T ψ(T )] − Eθ [T ]Eθ [ψ(T )]) . (2.12)

For any point θ0 , consider the problem of maximizing the derivative β 0 (θ0 ) subject
to βψ (θ0 ) = Eθ0 ψ(T ) = α, where 0 < α < 1. In light of (2.12), this is equivalent
to maximizing Eθ0 [T ψ(T )] subject to Eθ0 φ(X)R = α, when Q0 (θ0 ) > 0. However, since
0 < α < 1, α is certainly an inner point of M = ( φ(x)fθ0 (x)dx, for all 0 ≤ φ ≤ 1) = [0, 1].
Then by the GNP lemma, there exists a constant C and an optimal test ψ such that

ψ(t) = 1 when tgθ0 (t) > Cgθ0 (t)


0 when tgθ0 (t) < Cgθ0 (t)
= 1 when t > C
γ when t = C
0 when t < C

where C and γ are determined from Eθ0 ψ(T ) = α. However, this is exactly the same test
as the one in (2.11). We already knew that its power function is strictly increasing, which
implies that β 0 (θ0 ) ≥ 0. Again from (2.12), this means

Eθ0 [T ψ(T )] ≥ Eθ0 [T ]Eθ0 [ψ(T )] = αEθ0 [T ].

We now show that the equality can not hold here. Since if Eθ0 [T ψ(T )] = αEθ0 [T ], then
we see that ψ0 (t) ≡ α is also the optimal test. From the GNP lemma, the optimal test is
unique almost surely. Then we have T (x) = C, however this is impossible. So we get

Eθ0 [T ψ(T )] > αEθ0 [T ],

and thus βψ0 (θ0 ) > 0. This completes the proof.

55
5.2.4 Some examples

Example 1. (Binomial.) Let X ∼ Bin(n, p). Find a UMPU test of size α for testing

H0 : p = p 0 , versus H1 : p 6= p0 ,

Solution. Write
( Ã ! )
p
f (x) = Cnx px (1 − p)n−x
= Cnx exp x ln + n ln(1 − p)
1−p
³ ´
p
So it is in an exponential family with T (X) = X, and θ = Q(p) = ln 1−p
, which is
strictly increasing since Q0 (p) = p1 + 1−p
1
> 0. So testing

H0 : p = p0 , versus H1 : p 6= p0

is equivalent to testing

H0 : θ = θ 0 , versus H1 : θ 6= θ0 .

And a UMPU test of size α is

ψ(t) = 1 when t < C1 or > C2


γi when t = Ci , i = 1, 2
0 when C1 < t < C2

where C’s and γ’s are determined from

Ep0 [1 − ψ(T )] = 1 − α, Ep0 [T (1 − ψ(T ))] = np0 (1 − α),

or equivalently
CX
à ! à !
2 −1 X2
n x n Ci
p0 (1 − p0 )n−x + (1 − γi ) p0 (1 − p0 )n−Ci = 1 − α,
x=C1 +1 x i=1 Ci
CX
à ! à !
2 −1 X2
n x n Ci
x p0 (1 − p0 )n−x + (1 − γi )Ci p0 (1 − p0 )n−Ci = np0 (1 − α).
x=C1 +1 x i=1 Ci

Example 2. (Normal variance.) Let X = {X1 , . . . , Xn } ∼ N (0, σ 2 ). Find a UMPU test


of size α for testing
(1). H0 : σ 2 = σ02 , versus H1 : σ 2 6= σ02 .
(2). H0 : σ 2 ≤ σ02 , versus H1 : σ 2 > σ02 .

Solution. The distribution belongs to the exponential family with sufficient statistic
P
T (X) = Xi2 . Under H0 , T /σ02 has p.d.f.:
1
χ2n (t) = t(n/2)−1 e−t/2 , t > 0.
2n/2 Γ(n/2)

56
Also note that
1
tχ2n (t) = t((n+2)/2)−1 e−t/2
2n/2 Γ(n/2)
" #
(n+2)/2
1 2 Γ((n + 2)/2)
= (n+2)/2 t((n+2)/2)−1 e−t/2
2 Γ((n + 2)/2) 2n/2 Γ(n/2)
= nχ2n+2 (t), (as Γ(a + 1) = aΓ(a).) (2.13)
(1). A UMPU test of size α for testing H0 : σ 2 = σ02 v.s. H1 : σ 2 6= σ02 is
ψ(t) = I{T (x) < D1 , or > D2 }
= 1 − I{D1 ≤ T (x) ≤ D2 }
= 1 − I{C1 ≤ T (x)/σ02 ≤ C2 },
where C’s are determined from
Eσ0 [1 − ψ(T )] = 1 − α,
" # " #
T T
Eσ0 2
(1 − ψ(T )) = Eσ0 2 Eσ0 [(1 − ψ(T ))] = n(1 − α),
σ0 σ0
or equivalently
Z C2
χ2n (t)dt = 1 − α,
C1
Z C2 Z C2
tχ2n (t)dt = nχ2n+2 (t)dt = n(1 − α),
C1 C1

or equivalently
Z C2 Z C2
χ2n (t)dt = χ2n+2 (t)dt = (1 − α).
C1 C1

If n is large such that (n + 2)/n ∼ 1, then C1 ≈ χ2n (α/2) and C2 ≈ χ2n (1 − α/2),
respectively. This is roughly the “equal-tailed” chi-square test in elementary statistics.
R C2
As a final remark, one could also integrate C1 tχ2n (t)dt by parts to reduce it to
n/2 n/2
C1 e−C1 /2 = C2 e−C2 /2 .

(2). Similarly to (1), a UMP test of size α for testing H0 : σ 2 ≤ σ02 v.s. H1 : σ 2 > σ02 is
ψ(t) = I{T (x)/σ02 > C0 },
where C’s are determined from
Z C0
Eσ0 [1 − ψ(T )] = gn (t)dt = (1 − α).
0

In other words, the critical value is C0 = χ2n (1 − α). This is exactly the chi-square test
used in elementary statistics.

Example 3. (Poisson.) Let X ∼ P oisson(λ). Find a UMPU test of size α for testing
H0 : λ = λ0 , versus H1 : λ 6= α0 ,
Solution. Similar to Example 1.

57
5.3 UMPU tests for multiparameter exponential fam-
ilies
In many important problems, the hypothesis concerns a single-valued parameter, but the
distribution depends on certain other nuisance parameters. Similarity Lemma (S.O.B.)
is still the main tool in finding UMPU tests for multi-parameter exponential families,
however, it is NOT so easy to use here in multi-parameter exponential family as in one-
parameter exponential families. We shall instead use another much easier “Neyman
structure” (N.S.) condition.
The relationship between N.S and S.O.B. is as follows.

(1). If it is N.S, it must be S.O.B.


(2). Under the bounded completeness condition, then

“N.S.” ⇐⇒ “S.O.B.” (The proof will be given later.)

Hence,
“UMP N.S.” ⇐⇒ “UMP S.O.B.” =⇒ “UMPU”.

5.3.1 Complete and Boundedly Complete Statistics


Definition: Let fθ (t) be a family of pdf’s for a statistic T .

(a). The family is called complete if Eθ g(T ) = 0 for all θ =⇒ Pθ (g(T ) = 0) = 1


for all θ.

(b). The family is called boundedly complete if for all bounded function g(t),
Eθ g(T ) = 0 for all θ =⇒ Pθ (g(T ) = 0) = 1 for all θ.

Clearly, completeness implies bounded completeness. That is, completeness is a stronger


requirement. However, we shall deal with test functions which are always bounded, thus
the weaker bounded completeness condition will suffice.

Some examples of complete families

Theorem 5.3.1 (Full rank exponential family is complete.) If F is in an exponential


family of full rank with p.d.f.’s given by
( k )
X
fη (x) = exp ηj Tj (x) − A(η) h(x).
i=1

Then T (X) = (T1 (X), · · · , Tk (X)) is complete (and minimal sufficient).

Remark 5.3.1 If the range of (η1 (θ), · · · , ηk (θ)) contains an open k-dim rectangle, then
for any fixed θ0 , it can be shown that (η1 (θ) − η1 (θ0 ), · · · , ηk (θ) − ηk (θ0 )) is linearly inde-
pendent. Then the family is of full rank k.

58
P
Example. Assume X1 , . . . , Xn ∼ Bin(1, p). Then T = Xi ∼ Bin(n, p) for 0 < p < 1.
Show that T is complete.
Proof. The conclusion is obvious by applying the last theorem.
Alternatively, we can prove it directly. If Eθ g(T ) = 0 for 0 < p < 1, then
à ! à !
X n X n
pt q n−t g(t) = q n rt g(t) = 0.
t t

where r = p/q with r > 0. So g(t) = 0 for t = 0, 1, 2, · · · , n. That is, P(g(T)=0)=1.


Hence T is complete.

Example. X1 , . . . , Xn is from U (0, θ). Show that T = X(n) is complete.


Proof. (Here the family does not belong to exponential families.) F (x) = (x/θ)I(0 <
x < θ) + I(x ≥ θ), P (T ≤ t) = F (t)n . If Eθ g(T ) = 0 for θ > 0, then
Z θ
g(t)ntn−1 θ−n dt = 0.
0

Differentiate w.r.t. θ, one gets g(θ)θn−1 = 0, i.e., g(θ) = 0 for all θ > 0. Therefore, T is
complete.

Some Examples of Non-Complete Families

Example. Show that the minimal sufficient statistic for U (θ, θ + 1) is not complete.
³ ´
Solution. It has been shown that the minimal sufficient statistic is T = X(1) , X(n) .
Since F (x) = (x − θ)I(θ < x < θ + 1) + I(x ≥ θ + 1), we get P (X(n) ≤ x) = F (x)n .
Therefore,
Z Z θ+1
E(X(n) ) = xdF n (x) = nx(x − θ)n−1 dx
θ
Z 1
n−1 n
= n (y + θ)y dy = +θ
0 n+1
1
= (θ + 1) − .
n+1
1
Similarly, we can get E(X(1) ) = θ + n+1
. Therefore,
· ¸
n−1
E X(n) − X(1) − = 0.
n+1
But X(n) − X(1) − (n − 1)(n + 1) 6= 0. So T is not complete.

Remark 5.3.2 This example shows that minimal sufficiency does not imply complete-
ness.

59
Pn−1
Example. Show that T = (Xn , i=1 Xi ) for Bin(1, p) is not complete.
h Pn−1 i Pn−1
Solution. E Xn − i=1 Xi /(n − 1) = 0. But Xn − i=1 Xi /(n − 1) 6= 0. So T is not
complete.

Example. Let X1 , . . . , Xn be iid n(θ, θ), θ > 0. Show that the statistic T = (X̄, S 2 )
is a sufficient statistic for θ, but the family of distributions is not complete. (This is an
example where the family of distributions is exponential, but it is not of full rank.)
h i
Solution. E X̄ − S 2 = θ − θ = 0. But X̄ − S 2 6= 0. So T is not complete.

Bounded Completeness does not imply completeness.

We mentioned earlier that completeness implies bounded completeness. However, the


reverse is not true in general, as is illustrated in the next example.

Example. Let Pθ (X = 0) = θ, Pθ (X = n) = (1 − θ)2 θn−1 , n = 1, 2..., and 0 < θ < 1.


Show that the family is boundedly complete, but not complete.

Proof. If Eθ g(X) = 0 for all θ, 0 < θ < 1, then



X
g(0)θ + (1 − θ)2 g(n)θn−1 = 0.
n=1

That is,

X X∞
−θ
g(n)θn−1 = g(0) = − ng(0)θn ,
n=1 (1 − θ)2 n=1
or
g(1) + g(2)θ + g(3)θ2 + ...... = −g(0)θ − 2g(0)θ2 − 3g(0)θ3 − ......,
thus
g(1) = 0, g(n) = −(n − 1)g(0), n = 2, 3, 4, ....
If g is required to be bounded, then g(0) = 0, thus,

g(0) = g(1) = g(2) = ...... = 0.

Therefore, the family is boundedly complete.


However, the family is not complete, since we can choose

g(0) = −1, g(1) = 0, g(n) = −(n − 1), n = 2, 3, 4, ....

Clearly, Eθ g(X) = 0 for all θ, 0 < θ < 1. But

Pθ (g(X) = 0) = Pθ (X = 1) = (1 − θ)2 < 1.

60
5.3.2 Similarity and Neyman Structure
We have seen earlier that similarity on the boundary is useful in deriving a UMPU test
(see the Similarity Lemma) when there is only one parameter of interest and no other
nuisance parameters. When there are nuisance parameters, there is a simpler and easier
condition to use (Neyman structure) when the family has a boundedly complete and
sufficient statistic.
Suppose that the observation X ∼ Fθ . We shall now concentrate on the family
FX = {Fθ : θ ∈ ω}, where ω is a parameter set, and in particular, will be chosen
to be the boundary of Θ0 and Θ1 . Let T be sufficient statistic for θ with distribution
function FθT . Let FT = {FθT : θ ∈ ω}.

Definition. A test φ is said to have Neyman structure with respect to a sufficient statistic
T if
E [φ(X)|T ] = α, a.s. FT
(i.e. it holds except on a set N with P (N ) = 0 for all P ∈ FT )

Remark 5.3.3 .

(i). The conditional distribution X|T is free of θ (sufficiency). Also E [φ(X)|T = t]


is the conditional power function on the surface T = t. If the test is nonran-
domized, then it is simply the conditional probability of rejection on the surface
T = t.
(ii). If φ has Neyman structure, then it must be similar on the boundary since

Eθ φ(X) = Eθ E [φ(X)|T ] = α, for all θ ∈ ω

The reverse is also true provided that T is also boundedly complete. See the
lemma below.
(iii). It is often easier to obtain a UMP test having Neyman structure (as
it can be done on each surface). Then the resulting test is UMPU if every
similar test has Neyman structure.

Theorem 5.3.2 Let X ∼ Fθ , θ ∈ ω. T is a sufficient statistic for θ. Then a necessary


and sufficient condition for all similar tests to have Neyman structure w.r.t. T is that the
family FX = {Fθ : θ ∈ ω} is boundedly complete.

Proof. Suppose that FX = {Fθ : θ ∈ ω} is boundedly complete and let φ(X) be similar
w.r.t. ω, that is, Eθ φ(X) = α for all θ ∈ ω, or equivalently

Eθ [E (φ(X)|T ) − α] = 0, for all θ ∈ ω

Note that E (φ(X)|T = t) as a function of t can be taken to be bounded since φ(X) is


bounded. By the definition of bounded completeness, we get

E (φ(X)|T ) − α = 0, a.s. FT .

So φ has Neyman structure.

61
Conversely, suppose that FX = {Fθ : θ ∈ ω} is NOT boundedly complete. Then there
exists a bounded function g, say, |g(t)| ≤ M for some M , such that Eθ g(T ) = 0, ∀θ ∈ ω,
but Pθ0 (g(T ) = 0) < 1 for some θ0 ∈ Θ. Now define

φ(t) = α + cg(t), where c = min(α, 1 − α)/M .

Then we have

(a) φ is a test function, i.e., 0 ≤ φ(t) ≤ 1.

Proof. If 0 < α ≤ 1/2, then c = α/M and

φ(t) ≥ α + (α/M )(−M ) = 0,


φ(t) ≤ α + (α/M )(M ) = 2α ≤ 1.

Similarly, if 1/2 < α < 1, we can show 0 ≤ φ(t) ≤ 1.

(b) φ is S.O.B. since Eθ φ(T ) = α + cEθ g(t) = α for all θ ∈ ω.

But φ does NOT have Neyman structure since

Pθ0 (E (φ(T )|T ) − α = 0) = Pθ0 (cg(T ) = 0) = Pθ0 (g(T ) = 0) < 1.

5.3.3 UMPU tests for multiparameter exponential families


Let X be distributed according to the (k + 1)-dimensional exponential family of full rank
( k
)
X
fθ,ν (x) = C(θ, ν) exp θU (x) + νi Ti (x) h(x), (θ, ν) ∈ Ω. (3.14)
i=1

Write ν = (ν1 , ..., νk ) and T = (T1 , ..., Tk ).

Consider the following hypotheses:

(1). H0 : θ ≤ θ0 v.s. H1 : θ > θ0 ,


(2). H0 : θ ≤ θ1 or ≥ θ2 v.s. H1 : θ1 < θ < θ2 ,
(3). H0 : θ1 ≤ θ ≤ θ2 v.s. H1 : θ < θ1 or > θ2 ,
(4). H0 : θ = θ0 v.s. H1 : θ 6= θ0 .

In search of UMPU tests for θ, attention can be restricted to the sufficient statistics
(U, T ) which has joint p.d.f.
( k
)
X
gθ,ν (u, t) = C(θ, ν) exp θu + νi ti h1 (u, t), (θ, ν) ∈ Ω.
i=1

62
Also the conditional p.d.f. of U given T = t is

gθ,ν (u, t)
qθ (u|t) = R
gθ,ν (u, t)du
nP o
k
C(θ, ν) exp {θu} exp i=1 νi ti h1 (u, t)
= R nP
k
o
C(θ, ν) exp {θu} exp i=1 νi ti h1 (u, t)du
exp {θu} h1 (u, t)
= R
exp {θu} h1 (u, t)du
= Ct (θ) exp {θu} ht (u), θ ∈ Θ. (3.15)

When T = t is given, U is the only remaining variable which is also in an exponential


family as seen from (3.15). From the last chapter, the UMPU tests conditional on T = t
for testing the above hypotheses (1)-(4) are given below.

(1) For testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 , a UMPU test of size α is

ψ1 (u, t) = 1 when u > C0 (t)


γ0 (t) when u = C0 (t)
0 when u < C0 (t)

where the functions C0 (t) and γ0 (t) are determined from

Eθ0 {ψ1 (U, T )|T = t} = α, for all t.

(2). For testing H0 : θ ≤ θ1 or ≥ θ2 v.s. H1 : θ1 < θ < θ2 , a UMPU test of


size α is

ψ2 (u, t) = 1 when C1 (t) < u < C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when u < C1 (t) or > C2 (t)

where the functions Ci (t)’s and γi (t)’s are determined from

Eθ1 {ψ2 (U, T )|T = t} = Eθ2 {ψ2 (U, T )|T = t} = α, for all t.

(3). For testing H0 : θ1 ≤ θ ≤ θ2 v.s. H1 : θ < θ1 or > θ2 , a UMPU test of


size α is

ψ3 (u, t) = 1 when u < C1 (t) or > C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t)

where the functions Ci (t)’s and γi (t)’s are determined from

Eθ1 {ψ3 (U, T )|T = t} = Eθ2 {ψ3 (U, T )|T = t} = α, for all t.

63
(4). For testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 ,

ψ4 (u, t) = 1 when u < C1 (t) or > C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t)

where the functions Ci (t)’s and γi (t)’s are determined from

Eθ0 {ψ4 (U, T )|T = t} = α,


Eθ0 {U ψ4 (U, T )|T = t} = αEθ0 [U |T = t] , for all t.

The next theorem states that these tests are in fact UMPU tests unconditionally.

Theorem 5.3.3 Suppose that X is from the exponential family (3.14) of full rank. Then
the tests ψi (u, t), i = 1, 2, 3, 4 are UMPU tests of size α for the respective hypotheses.

Proof. We only need to concentrate on test functions based on the sufficient statistics
(U, T ). It is easy to show that the power function of any test ψ, Eθ,ν ψ(U, T ), is continuous
in (θ, ν), so one can apply Lemma 5.1.2 (i.e., Similarity Lemma). Note that the power
function of a test ψ against an alternative (θ, ν) ∈ Ω1 is
Z Z
βψ (θ, ν) ≡ Eθ,ν [ψ(U, T )] = ψ(u, t)gθ,ν (u, t) dudt
Z ·Z ¸
= ψ(u, t)qθ (u|t)du rθ,ν (t)dt (3.16)
Z
=: βψ (θ|t)rθ,ν (t)dt,

where βψ (θ|t) is the power of ψ conditional on T = t, and rθ,ν (t) is the marginal p.d.f. of
T given by
Z Z ( k )
X
rθ,ν (t) = gθ,ν (u, t)du = C(θ, ν) exp {θu} exp νi ti h1 (u, t)du
i=1
( k )
X
= C(θ, ν) exp νi ti h3 (t).
i=1

Let Ω be the whole parameter space of (θ, ν), which is necessarily convex. Clearly the
boundary sets for i = 1, 2, 3, 4 are

ω1 = ω4 = {(θ, ν) ∈ Ω : θ = θ0 },
ω2 = ω3 = {(θ, ν) ∈ Ω : θ = θ1 , θ2 }.

In order to show that ψ is UMPU, we only need to show that it is UMP SOB.

(1). Let us consider the case i = 1, 2, 3 first. For simplicity, we shall only
consider the case i = 1, other cases follow similarly. By Similarity Lemma, we
consider the problem of maximizing βψ (θ, ν) = Eθ,ν ψ(U, T ) subject to

βψ (θ0 , ν) = Eθ0 ,ν ψ(U, T ) = Eθ0 ,ν Eθ0 [ψ(U, T )|T ] = α,

64
or equivalently
Eθ0 ,ν (Eθ0 [ψ(U, T )|T ] − α) = 0.
For fixed θ0 , T is complete and sufficient for ν. In view of this and (3.16), the
problem boils down to maximizing the conditional power on T = t:
Z
βψ (u|t) = ψ(u, t)qθ (u|t)du for every t

subject to

Eθ0 [ψ(U, T )|T = t] − α = 0 for every t, (Neyman structure) (3.17)

(Note the last equation does not depend on ν since T is sufficient for ν.)
Since U follows an one-parameter exponential family conditional on T = t, so
ψ1 (u, t) is such a solution, and hence a UMPU test.

(2). Let us now consider the case i = 4. First, unbiasedness of a test ψ implies

Eθ0 ,ν [ψ(U, T )] = α, (similar on ω4 ) (3.18)


¯
∂ ¯
¯
Eθ,ν [ψ(U, T )]¯ = 0. (3.19)
∂θ ¯
θ=θ0

Similarly to the one-parameter exponential family, one can show that (3.19)
is equivalent to
Eθ0 ,ν [U ψ(U, T ) − αU ] = 0
Then we can rewrite (3.18) and (3.19) as

Eθ0 ,ν {Eθ0 [ψ(U, t)|T ] − α} = 0,


Eθ0 ,ν {Eθ0 (U [ψ(U, t)] |T ) − αEθ0 (U |T )} = 0.

For fixed θ0 , the statistic T is complete and sufficient for ν. Thus, we have
(a.s.)

Eθ0 [ψ(U, t)|T = t] − α = 0,


Eθ0 (U [ψ(U, t)] |T = t) − αEθ0 (U |T = t) = 0.

The rest of the argument is the same as that for ψi , i = 1, 2, 3.

65
5.3.4 UMPU tests for linear combinations of parameters in mul-
tiparameter exponential families
Through a transformation of parameters, the theorem in the last section can be used to
find UMPU tests for parameters of the form
k
X

θ = a0 θ + ai νi , a0 6= 0.
i=1

Lemma 5.3.1 The exponential family


( k
)
X
fθ,ν (x) = C(θ, ν) exp θU (x) + νi Ti (x) h(x), (θ, ν) ∈ Ω
i=1

can also be written as


( k
)
X
∗ ∗ ∗ ∗
fθ,ν (x) = C (θ , ν) exp θ U (x) + νi Ti∗ (x) h(x), (θ∗ , ν) ∈ Ω∗ ,
i=1

where
U ai
U∗ = , Ti∗ = Ti − U.
a0 a0
Proof.
(Ã k
! k
)
X U (x) X
fθ,ν (x) = C(θ, ν) exp θ∗ − ai νi + νi Ti (x) h(x),
i=1 a0 i=1
( k · ¸)
X ai νi
∗ ∗
= C(θ, ν) exp θ U (x) + νi Ti (x) − U (x) h(x),
i=1 a0
( k
)
X
∗ ∗
= C(θ, ν) exp θ U (x) + νi Ti∗ (x) h(x).
i=1

5.3.5 Power calculation.


Frequently, we are interested in calculating the power of the UMPU tests. Given an
alternative value θ0 , the power of a UMPU test ψ is given by [see (3.16)]
βψ (θ0 , ν) ≡ Eθ0 ,ν [ψ(U, T )] ,
which depends on the unknown nuisance parameter ν. On the other hand, the conditional
power given T = t is
βψ (θ0 |t) ≡ Eθ0 [ψ(U, t)|T = t] ,
which is independent of ν, and therefore has a known value.

Remark 5.3.4 .
(i). βψ (θ0 |t) can be interpreted as the conditional probability of rejecting H0
once we observe T = t. It is an unbiased estimator of the true known power
βψ (θ0 , ν) since
Eθ0 ,ν [βψ (θ0 |T )] = βψ (θ0 , ν).
(ii). One disadvantage of the conditional power is that it is available only
after the observation is taken. So it can not be used to plan the experiment in
advance, particularly to determine the sample size.

66
5.3.6 Some examples.

Example. Let X1 ∼ P ossion(λ1 ), X2 ∼ P ossion(λ2 ), and X1 and X2 are independent.


We wish to test H0 : λ2 = a0 λ1 versus H1 : λ2 6= a0 λ1 , where a0 is known. Find a UMPU
test of size α.

Solution. The joint p.d.f. of (X1 , X2 ) is

e−(λ1 +λ2 )
f (x1 , x2 ) = exp {x1 log(λ1 ) + x2 log(λ2 )}
x1 !x2 !
e−(λ1 +λ2 )
= exp {x2 log(λ2 /λ1 ) + (x1 + x2 ) log(λ1 )} .
x1 !x2 !
Here U = X2 ∼ P (λ2 ), T = X1 + X2 ∼ P (λ1 + λ2 ), and θ = log(λ2 /λ1 ). So we can find
UMPU tests for θ = log(λ2 /λ1 ) or η = λ2 /λ1 . Therefore,

H0 : λ2 = a0 λ1 , versus H1 : λ2 6= a0 λ1

is equivalent to

H0 : θ = ln a0 = θ0 , versus H1 : θ 6= ln a0 = θ0 .

Now we can apply the theorem in this section. But we need the following result:
à !
λ2
{U |T = t} = {X2 |(X1 + X2 ) = t} ∼ Bin t, p = .
λ1 + λ2
a0
Under H0 , we have p = 1+a0
≡ p0 .

A UMPU test for testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 ,

ψ4 (u, t) = 1 when u < C1 (t) or > C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t),

where the functions Ci (t)’s and γi (t)’s are determined from

Eθ0 {ψ4 (U, T )|T = t} = α,


Eθ0 {U ψ4 (U, T )|T = t} = αEθ0 [U |T = t] = αtp0 , for all t,

or
C2 (t)−1 Ã ! 2
à !
X t k X t C (t)
p0 (1 − p0 )t−k + [1 − γi (t)] p0 i (1 − p0 )t−Ci (t) = 1 − α,
k=C1 (t)+1
k i=1 Ci (t)
C2 (t)−1 Ã ! 2
à !
X t k X t C (t)
t−k
k p0 (1 − p0 ) + [1 − γi (t)]Ci (t) p0 i (1 − p0 )t−Ci (t) = (1 − α)tp0 .
k=C1 (t)+1
k i=1 Ci (t)

67
Example. (Testing for odds ratio.) Let X1 ∼ Bin(m, p1 ), X2 ∼ Bin(n, p2 ), and X1
and X2 are independent. Let qi = 1 − pi , i = 1, 2. We wish to test H0 : p2 /q2 = a0 p1 /q1
versus H1 : p2 /q2 6= a0 p1 /q1 for some known a0 . Find a UMPU test of size α.
Solution. Write qi = 1 − pi , i = 1, 2. The joint p.d.f. of (X1 , X2 ) is
à ! à !
m x1 m−x1 n x2 n−x2
f (x1 , x2 ) = p q p q
x1 1 1 x2 2 2
à !à ! à !x à !x
m n m n p1 1 p2 2
= q q
x1 x2 1 2 q 1 q2
à !à ! ( à ! )
m n m n p2 p1 p1
= q q exp x2 log − log + (x1 + x2 ) log .
x1 x2 1 2 q2 q1 q1
Here U = X2 ∼ Bin(n, p2 ), T = X1 + X2 , and θ = log pp12 qq21 . So we can find UMPU tests
for θ. Therefore, the original testing problem
H0 : θ = θ0 = ln a0 , versus H1 : θ 6= θ0 .
Now we can apply the theorem in this section. But we need to find the conditional
distribution
Pθ (X2 = u, X1 = t − u)
Pθ (U = u|T = t) = Pθ (X2 = u|(X1 + X2 ) = t) =
Pθ (X1 + X2 = t)
Pθ (X2 = u) Pθ (X1 = t − u)
= Pt
k=0 Pθ (X2 = k) Pθ (X1 = t − k)
³ ´ ³ ´
m n u n−u
t−u
pt−u
1 q1
m−t+u
p q
u 2 2
= Pt ³ m ´ t−k m−t+k ³n´ k n−k
k=0 t−k p1 q1 k
p2 q2
³ ´ ³ ´
m n
t−u
p−u u
1 q1 u
pu2 q2−u
= Pt ³
m
´ ³ ´
n
k=0 t−k
p−k k
1 q1 k
pk2 q2−k
³ ´³ ´
m n
t−u u
euθ
= Pt ³
m
´³ ´
n
k=0 t−k k
ekθ

So a UMPU test for testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 ,


ψ4 (u, t) = 1 when u < C1 (t) or > C2 (t)
γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t),
where the functions Ci (t)’s and γi (t)’s are determined from
Eθ0 {ψ4 (U, t)|T = t} = α,
Eθ0 {U ψ4 (U, t)|T = t} = αEθ0 [U |T = t] .

Remark 5.3.5 If a0 = 1, then θ0 = 0, and


³ ´³ ´ ³ ´³ ´
m n m n
t−u u t−u u
Pθ0 (U = u|T = t) = Pt ³
m
´³ ´ =
n
³
m+n
´
k=0 t−k k t

is the hypergeometric distribution.

68
Example. (Testing for independence in a 2×2 contingency table). Let A an B be
two different events in a probability space related to a random experiment. Suppose that
n independent trials of the experiment are carried out and that we observe the frequencies
of the occurrance of the events A ∩ B, A ∩ B c , Ac ∩ B and Ac ∩ B c . The results can be
summarized in the following 2 × 2 contingency table.

A Ac Total
B X11 X12 n1
Bc X21 X22 n2
Total m1 m2 n

We wish to test

H0 : A and B are independent,


H1 : A and B are not independent.

Solution. Note X11 + X12 + X21 + X22 = n and p11 + p12 + p21 + p22 = 1. The p.d.f of
X = (X11 , X12 , X21 , X22 ) is multinomial with probabilities p = (p11 , p12 , p21 , p22 ), where
p = EX/n. That is,
n!
f (x) = px11 px12 px21 px22
x11 !x12 !x21 !x22 ! 11 12 21 22
( Ã ! Ã ! Ã !)
n! n p11 p12 p21
= p exp x11 log + x12 log + x21 log ,
x11 !x12 !x21 !x22 ! 22 p22 p22 p22
which is in an exponential family. So we can derive UMPU tests for any parameter of the
form
à ! à ! à !
p11 p12 p21
θ = a0 log + a1 log + a2 log , (3.20)
p22 p22 p22
where ai ’s are known constants.

Now A and B are independent (which implies that A and B c are independent, and Ac
and B are independent, and Ac and B c are independent) iff P (A ∩ B) = P (A)P (B) iff

p11 = p1· p·1


= (p11 + p12 )(p11 + p21 )
= p11 p11 + p11 p21 + p12 p11 + p12 p21
= p11 p11 + p11 p21 + p12 p11 + p11 p22 − p11 p22 + p12 p21
= p11 (p11 + p21 + p12 + p22 ) − p11 p22 + p12 p21
= p11 − p11 p22 + p12 p21 .

iff p11 p22 = p12 p21 .

Now take a0 = 1, a1 = a2 = −1 in (3.20), then we have


à ! à ! à !   à !
p11
p12 p21 p11 p11 p22
θ = − log − log + log = log p12p22p21
  = log .
p22 p22 p22 p22 p22
p12 p21

69
So A and B are independent iff θ = 0.

Now rewrite
( Ã ! Ã !)
n! p12 p21
f (x) = pn22 exp x11 θ + (x11 + x12 ) log + (x11 + x21 ) log ,
x11 !x12 !x21 !x22 ! p22 p22
So we have U = X11 ∼ Bin(n, p11 ), T = (X11 + X12 , X11 + X21 ).
P (U = u|T = t)
= P (X11 = u|X11 + X12 = t1 , X11 + X21 = t2 )
P (X11 = u, X12 = t1 − u, X21 = t2 − u)
= Pt1 ∧t2
k=0 P (X11 = k, X12 = t1 − k, X21 = t2 − k)
n!
pu pt1 −u pt212 −u p22
u!(t1 −u)!(t2 −u)!(n+u−t1 −t2 )! 11 12
n+u−t1 −t2
= Pt1 ∧t2 n! k t1 −k t2 −k n+k−t1 −t2
k=0 k!(t1 −k)!(t2 −k)!(n+k−t1 −t2 )! p11 p12 p21 p22
n!
pu p−u p−u pu
u!(t1 −u)!(t2 −u)!(n+u−t1 −t2 )! 11 12 21 22
= Pt1 ∧t2 n! k −k −k k
k=0 k!(t1 −k)!(t2 −k)!(n+k−t1 −t2 )! p11 p12 p21 p22
t1 ! (n−t1 )!
u!(t1 −u)! (t2 −u)!((n−t1 )−(t2 −u))!
euθ
= Pt1 ∧t2 t1 ! (n−t1 )! kθ
k=0 k!(t1 −k)! (t2 −k)!((n−t1 )−(t2 −k))! e
³ ´³ ´
t1 n−t1 uθ
u t −u
e
= Pt1 ∧t2 ³ 2´³ ´ , (i.e. the noncentral hypergeometric distribution.)
t1 n−t1 kθ
k=0 k t2 −k
e
Note that under H0 : A and B are independence, ⇐⇒ θ = 0 = θ0 , and therefore,
³ ´³ ´ ³ ´³ ´
t1 n−t1 t1 n−t1
u t −u u t −u
Pθ0 (U = u|T = t) = Pt1 ∧t2 ³ 2´³ ´ = ³ 2´ . (3.21)
t1 n−t1 n
k=0 k t2 −k t2

So a UMPU test for testing H0 : θ = 0 v.s. H1 : θ 6= 0,

ψ(u, t) = 1 when u < C1 (t) or > C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t),
where the functions Ci (t)’s and γi (t)’s are determined from
Eθ0 {ψ(U, t)|T = t} = α,
Eθ0 {U ψ(U, t)|T = t} = αEθ0 [U |T = t] .

Remark 5.3.6 Note that the above UMPU test is conditional on T = (X11 + X12 , X11 +
X21 ), i.e., the marginal sums are fixed. Then the only free r.v. is U = X11 . In particular,
when η = 1, then (3.21) is the hypergeometric distribution, which is exactly the same as
that in the last example for testing odd ratios in the binomial example. Also, the UMPU
test when η = 1 is also also called the Fisher’s exact test. In practice, an equal-
tailed Fisher’s two-sided test is often employed to simplify the calculations, so it is only
approximately optimal test.
Remark 5.3.7 Another approximate test is the χ2 test. However, it is not as powerful.

70
5.4 Summary
We derived UMPU tests for one- and multi-parameter exponential families. The critical
notion is the similarity on the boundary (S.O.B.) and Neyman structure (N.S.), respec-
tively.

5.5 Exercises
1. Let X1 , ..., X10 be iid Bin(1, p).

(i). Find a UMP test of size α = 0.1 for testing H0 : p ≤ 0.2 or p ≥ 0.7
v.s. H1 : 0.2 < p < 0.7.
(ii). Find the power of the UMP test in (i) when p = 0.4.
(iii). Find a UMPU test of size α = 0.1 for testing H0 : p = 0.2 v.s.
H1 : p 6= 0.2
(iv). Find the power of the UMPU test in (iii) when p = 0.4.

2. Let X1 , ..., Xn be iid from some distribution function Fθ (x). Find a UMPU test for
testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 if

(i). Fθ (x) is Poisson(θ), that is,

Pθ (X = x) = θx e−θ /x!, x = 0, 1, 2, ... and θ > 0.

(ii). Fθ (x) is Geometric(θ), that is,

Pθ (X = x) = (1 − θ)x−1 θ, x = 1, 2, ... and 0 ≤ θ ≤ 1.

3. Let θ be a real-valued parameter, and X is a random sample from a one-parameter


exponential family

fθ (x) = exp{Q(θ)T (x) − ξ(θ)}h(x) = D(θ)eQ(θ)T (x) h1 (x).

where Q(θ) is strictly increasing. The power function of a test ψ(T ) is


Z
βψ (θ) = Eθ [ψ(T )] = ψ(t)D(θ)eQ(θ)t h1 (t) dt.

which is differentiable. Show that

β 0 (θ) = Q0 (θ) (Eθ [T ψ(T )] − Eθ [T ]Eθ [ψ(T )]) .

71
Chapter 6

Unbiased tests for special families


(e.g., Normal and Gamma families)

UMPU tests for (multi-parameter) exponential families have been found in the last chap-
ter. For some special cases, such as normal families, the UMPU tests given in the last
chapter (usually in conditional format) may sometimes be simplified to the familiar tests
we studied in elementary statistics courses. So the results here justify some of the tests
used in elementary textbooks.

6.1 Ancillary Statistics and Basu’s Theorem


We shall use the concept of ancillary statistics and its associated properties.

Definition. Suppose X ∼ Fθ , θ ∈ Θ. A statistic V (X) is said to be ancillary if its


distribution does not depend on θ.

Remark 6.1.1 .

(i). Clearly, V (X) ≡ C is ancillary.


(ii). A sufficient statistic T (X) contains all the information about θ, while an
ancillary statistic V (X) contains no information about θ. Their relationship
is given below.

Theorem 6.1.1 (Basu’s Theorem) Suppose X ∼ Fθ , θ ∈ Θ. Let V (X) be ancillary,


and T (X) be boundedly complete and sufficient. Then V and T are independent.

Proof. Let B be any event on V . So P (V ∈ B) is a constant as V is ancillary. Since T is


sufficient, then g(T ) = P {(V ∈ B)|T } = E{I(V ∈ B)|T } is a function of T only (which
does not depend on θ). Now

E [g(T ) − P (V ∈ B)] = E [E{I(V ∈ B)|T } − P (V ∈ B)]


= E{I(V ∈ B)} − P (V ∈ B)
= P (V ∈ B) − P (V ∈ B)
= 0.

72
Since T is boundedly complete and sufficient, we get

P {(V ∈ B)|T } = P (V ∈ B), a.s. FT .

So let A be any event on the range of T such that P (T ∈ A) =


6 0, we have

P {V ∈ B, T ∈ A} = P {V ∈ B|T ∈ A}P (T ∈ A) = P {V ∈ B}P (T ∈ A).

Therefore, V and T are independent. The proof is complete.

The next lemma is often useful in practice.


Lemma 6.1.1 Assuming T = g(Y1 , ..., Yn ) is a (Borel measurable) function not depending
on any parameters.
(i). If the random vector (Y1 , ..., Yn ) is ancillary, so is T = g(Y1 , ..., Yn ).

(ii). If Y1 , ..., Yn are independent with pdf f1 (y), ..., fn (y) respectively, and
ancillary. Show that T = g(Y1 , ..., Yn ) is also ancillary.
Proof. For (i) and (ii), we have, respectively
Z
P (T ≤ t) = f (y1 , ..., yn )dy1 · · · dyn
g(y1 ,...,yn )≤t

and Z
P (T ≤ t) = f1 (y1 ) · · · fn (yn )dy1 · · · dyn .
g(y1 ,...,yn )≤t

In either case, they do not depend on any unknown parameters, and hence are ancillary.
The proof is complete.

Remark 6.1.2 If Y1 , ..., Yn are dependent, and is ancillary individually, then T = g(Y1 , ..., Yn )
may not be ancillary. In other words, marginal ancillariness does not imply joint ancil-
lariness. For instance, one such example is the bivariate normal (X, Y ) with standard
normal marginals and correlation coefficient ρ 6= 0.

Example. Let X1 , . . . , Xn ∼ N (µ, σ02 ), then


(1). X̄ and S 2 are independent.
(2). (n − 1)S 2 /σ02 ∼ χ2n−1 .

(3). n(X̄ − µ)/S ∼ tn−1 .
Proof. (1). Rewrite
n
X n
X
(n − 1)S 2 = (Xi − X̄)2 = (Yi − Ȳ )2 , where Yi = Xi − µ ∼ N (0, σ02 ).
i=1 i=1

Clearly, the distribution of V = S 2 does not depend on µ (since Yi does not) and hence
V = S 2 is ancillary. Also it is known that T = X̄ is complete and sufficient. Hence the
result follows from the Basu’s Theorem.

73
(2). First we have
à !2 n µ ¶ Ã√ !2
(n − 1)S 2 X n
Xi − X̄ X Xi − µ 2 n(X̄ − µ)
2
= = − .
σ0 i=1 σ0 i=1 σ0 σ0
or Ã√ !2
n µ ¶
X Xi − µ 2 (n − 1)S 2 n(X̄ − µ)
W ≡ = + ≡ W1 + W2 .
i=1 σ0 σ02 σ0
It is known that
(a). W ∼ χ2n with m.g.f. EetW = (1 − 2t)−n/2 ,
(b). W2 ∼ χ21 with m.g.f. EetW2 = (1 − 2t)−1/2 ,
Also, EetW = EetW1 +tW2 = EetW1 EetW2 as W1 and W2 are independent. Therefore, the
m.g.f. of W1 is
EetW1 = EetW /EetW2 = (1 − 2t)−(n−1)/2 .
Thus, W1 ≡ (n − 1)S 2 /σ02 ∼ χ2n−1 .

(3). From n(X̄ − µ)/σ0 ∼ N (0, 1) and (n − 1)S 2 /σ02 ∼ χ2n−1 and their independence,
we have
√ √ √
n(X̄ − µ) n(X̄ − µ)/σ0 n(X̄ − µ)/σ0 N (0, 1)
= =q =q ∼ tn−1 .
S S/σ0 [(n − 1)S 2 /σ02 ]/(n − 1) χ2n−1 /(n − 1)

Example. Let X1 , . . . , Xn ∼ N (µ, σ 2 ), then X̄ and S 2 are independent.


Proof. We can not use Basu’s Theorem here. But we can use other more direct ap-
proaches to prove independence, such as m.g.f. method. It suffices to show that X̄ and
T = (X1 − X̄, ..., Xn − X̄) are independent.
( n
)
X
MX̄,T (s, t1 , ..., tn ) = E exp sX̄ + ti (Xi − X̄)
i=1
( n )
X
s
= E exp ai Xi , where ai = n
+ (ti − t̄)
i=1
n
Y n
Y ½ ¾
1
= Eeai Xi = exp µai + σ 2 a2i (Xi ’s i.i.d. normal)
i=1 i=1 2
( n n n
) ( " #)
1 X
X 1 s2 X
= exp µ ai + σ 2 a2i = exp µs + σ 2 + (ti − t̄)2
i=1 2 i=1 2 n i=1
Pn Pn 2 s2 Pn
(since i=1 ai = s and i=1 ai = n
+ i=1 (ti − t̄)2 )
( ) ( n
)
2 X
σ 2 1 2
= exp µs + s exp σ (ti − t̄)2
2n 2 i=1
n
( )
1 2X
= EX̄ (s) exp σ (ti − t̄)2 .
2 i=1
n Pn o
1 2 2
Taking s = 0, we get MT (t1 , ..., tn ) = exp 2
σ i=1 (ti − t̄) . Thus,

MX̄,T (s, t1 , ..., tn ) = MX̄ (s)MT (t1 , ..., tn ).

74
Example. Let X1 , . . . , Xn ∼iid N³(µx , σx2 ), and ´Y1 , . . . , Yn ∼iid N (µy , σy2 ), and X’s and Y ’s
are independent. Show that T = X̄, Sx2 , Ȳ , Sy2 is independent of the sample correlation
coefficient defined by P
(Xi − X̄)(Yi − Ȳ )
V = qP P .
(Xi − X̄)2 (Yi − Ȳ )2

³ ´
Proof. First it can be shown that T = X̄, Sx2 , Ȳ , Sy2 is sufficient and complete for
θ = (µx , σx2 , µy , σy2 ).
Next define Ui = (Xi − µx )/σx and Wi = (Yi − µy )/σy , so Ui ∼ N (0, 1) and Wi ∼
N (0, 1). And we can rewrite
P
(Ui − Ū )(Wi − W̄ )
V = qP P ,
(Ui − Ū )2 (Wi − W̄ )2

which is clearly ancillary from the lemma following Basu’s theorem. By Basu’s Theorem,
V and T are independent.

Example. Let U1 /σ12 ∼ χ2f1 , and U2 /σ22 ∼ χ2f2 and they are independent. Suppose that
σ22 /σ12 = a. Show that U2 /U1 and aU1 + U2 are independent. In particular, if σ1 = σ2 ,
U2 /U1 and U1 + U2 are independent.

Proof. The joint distribution of U = (U1 , U2 ) is


" #
(f /2)−1 (f2 /2)−1 −1
f (u1 , u2 ) = Cu1 1 u2 exp (au1 + u2 )
2σ22

Clearly, T = aU1 + U2 is complete and sufficient. On the other hand,

U2 U2 /σ22
V ≡ =a ∼ aFf1 ,f1 (F distribution)
U1 U1 /σ12

does not depend on σ12 or σ22 , hence is ancillary. By Basu’s Theorem, V and T are
independent.

Remark 6.1.3 In the following, “A −→ B” means “A implies B”.

(1) Complete −→ Boundedly complete.


(2) T is complete and S = h(T ), −→ S is complete.
(3) S is sufficient and S = g(T ), −→ T is sufficient.
(4) T is complete and sufficient −→ T is minimal sufficient.

None of the reverse statements are true necessarily.

75
6.2 UMPU tests for multi-parameter exponential fam-
ilies
6.2.1 Review.
Let X be distributed according to the (k + 1) exponential family of full rank
( k
)
X
fθ,ν (x) = C(θ, ν) exp θU (x) + νi Ti (x) h0 (x), (θ, ν) ∈ Ω. (2.1)
i=1

Write ν = (ν1 , ..., νk ) and T = (T1 , ..., Tk ).

Consider the following hypotheses:


(1). H0 : θ ≤ θ0 v.s. H1 : θ > θ0 ,
(2). H0 : θ ≤ θ1 or ≥ θ2 v.s. H1 : θ1 < θ < θ2 ,
(3). H0 : θ1 ≤ θ ≤ θ2 v.s. H1 : θ < θ1 or > θ2 ,
(4). H0 : θ = θ0 v.s. H1 : θ 6= θ0 .

Then, we have the following.

(1) For testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 , a UMPU test of size α is

ψ1 (u, t) = 1 when u > C0 (t)


γ0 (t) when u = C0 (t)
0 when u < C0 (t)

where the functions C0 (t) and γ0 (t) are determined from

Eθ0 {ψ1 (U, T )|T = t} = α, for all t.

(2). For testing H0 : θ ≤ θ1 or ≥ θ2 v.s. H1 : θ1 < θ < θ2 , a UMPU test of


size α is

ψ2 (u, t) = 1 when C1 (t) < u < C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when u < C1 (t) or > C2 (t)

where the functions Ci (t)’s and γi (t)’s are determined from

Eθ1 {ψ2 (U, T )|T = t} = Eθ2 {ψ2 (U, T )|T = t} = α, for all t.

(3). For testing H0 : θ1 ≤ θ ≤ θ2 v.s. H1 : θ < θ1 or > θ2 , a UMPU test of


size α is

ψ3 (u, t) = 1 when u < C1 (t) or > C2 (t)


γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t)

76
where the functions Ci (t)’s and γi (t)’s are determined from
Eθ1 {ψ3 (U, T )|T = t} = Eθ2 {ψ3 (U, T )|T = t} = α, for all t.

(4). For testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 ,


ψ4 (u, t) = 1 when u < C1 (t) or > C2 (t)
γi (t) when u = Ci (t), i = 1, 2
0 when C1 (t) < u < C2 (t)

where the functions Ci (t)’s and γi (t)’s are determined from


Eθ0 {ψ4 (U, T )|T = t} = α,
Eθ0 {U ψ4 (U, T )|T = t} = αEθ0 [U |T = t] , for all t.

6.2.2 UMPU tests for special families


For special families such as normal families, the UMPU tests in the last subsection can
be simplified.

Theorem 6.2.1 Suppose that X is from the exponential family (2.1) of full rank, and
that V = h(U, T ) is independent of T when θ = θi , i = 0, 1, 2.
(a) If h(u, t) is increasing in u for each t, then UMPU tests of size α for the
first three hypothesis testing problems given in the last subsection are equivalent
to those with (U, T ) replaced by V and with Ci (t) and γi (t) replaced by Ci and
γi , where i = 1, 2, 3.
(b) If there are functions a(t) > 0 and b(t) such that
V = h(U, T ) = a(T )U + b(T ), i.e., V is linear in U for fixed T ,
then the UMPU test of size α for the fourth hypothesis testing problems given
in the last subsection is equivalent to that with (U, T ) replaced by V and with
Ci (t) and γi (t) replaced by Ci and γi , where i = 4.
For simplicity, we list these UMPU tests in more details.
(1). A UMPU test for testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 is
ψ1 (v) = 1 when v > C0
γ0 when v = C0
0 when v < C0

where C0 and γ0 are determined from


Eθ0 {ψ1 (V )} = α.

(2). A UMPU test for testing H0 : θ ≤ θ1 or ≥ θ2 v.s. H1 : θ1 < θ < θ2 is


ψ2 (v) = 1 when C1 < v < C2
γi when v = Ci , i = 1, 2
0 when v < C1 or > C2

77
where Ci ’s and γi ’s are determined from

Eθ1 {ψ2 (V )} = Eθ2 {ψ2 (V )} = α.

(3). A UMPU test of size α for testing H0 : θ1 ≤ θ ≤ θ2 v.s. H1 : θ < θ1 or


> θ2 , is

ψ3 (v) = 1 when v < C1 or > C2


γi when v = Ci , i = 1, 2
0 when C1 < v < C2

where Ci ’s and γi ’s are determined from

Eθ1 {ψ3 (V )} = Eθ2 {ψ3 (V )} = α.

(4). A UMPU test of size α for testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 ,

ψ4 (v) = 1 when v < C1 or > C2


γi when v = Ci , i = 1, 2
0 when C1 < v < C2

where Ci ’s and γi ’s are determined from

Eθ0 {ψ4 (V )} = α,
Eθ0 {V ψ4 (V )} = αEθ0 [V ] .

Proof. (1). First consider testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 . Since h(u, t) is increasing
in u for each t, then a UMPU test of size α

ψ1 (u, t) = 1 when u > C0 (t)


γ0 (t) when u = C0 (t)
0 when u < C0 (t)

where the functions C0 (t) and γ0 (t) are determined from

Eθ0 {ψ1 (U, T )|T = t} = α, for all t, (2.2)

is equivalent to

ψ1 (v, t) = 1 when v > D0 (t)


γ0 (t) when v = D0 (t) (2.3)
0 when v < D0 (t)

subject to the constraint (2.2), which, by the independence of V and T , reduces to

Eθ0 {ψ1 (U, T )|T = t} = Eθ0 {ψ1 (V, t)|T = t} = Eθ0 {ψ1 (V, t)} = α,

that is,
Pθ0 (V > D0 (t)) + γ0 (t)Pθ0 (V = D0 (t)) = α.

78
Clearly, D0 (t) and γ0 (t) do not depend on t, as can be seen from the proof of the Neyman-
Pearson Lemma. We denote D0 (t) = C0 and γ0 (t) = γ0 . Therefore, from (2.3), ψ1 (v, t)
does not depend on t either. That is, a UMPU test of size α is

ψ1 (v) = 1 when v > C0


γ0 when v = C0
0 when v < C0

where C0 and γ0 satisfy Eθ0 [ψ1 (V )] = α.


(2) & (3). The proof is similar to that in (1).
(4). The first part of the proof is similar to that in part (1). Namely, the UMPU test
is

ψ4 (v, t) = 1 when v < D1 (t) or v > D2 (t)


γi (t) when v = Di (t), i = 1, 2,
0 when D1 (t) < v < D2 (t),

where Di (t)’s and γi (t)’s are determined from

Eθ0 [ψ4 (V, t)|T = t] − α = 0,


Eθ0 (U ψ4 (V, t)|T = t) − αEθ0 (U |T = t) = 0,

i.e.,

Eθ0 [ψ4 (V, t)] − α = 0,


Ã" # ¯ ! Ã ¯ !
V − b(t) ¯ V − b(t) ¯¯
¯
Eθ0 ψ4 (V, t)¯ T = t − αEθ0 ¯ T = t = 0.
a(t) ¯ a(t) ¯

i.e.,

Eθ0 [ψ4 (V, t)] = α,


Eθ0 (V ψ4 (V, t)) = αEθ0 (V ) ,

or
2
X
Pθ0 (D1 (t) < V < D2 (t)) + [1 − γi (t)]Pθ0 (V = Di (t)) = 1 − α,
1
Z D2 (t)
vdFV (v) = (1 − α)Eθ0 (V ) ,
D1 (t)

from which we can see that Di (t)’s and γi (t)’s do not depend on t.

79
6.2.3 Some basic facts about bivariate normal distribution
We shall introduce some basic facts concerning normal and bivariate normal distributions,
which will be useful in some examples to follow.

Theorem 6.2.2 Let X1 , ..., Xn be independent with Xi ∼ N (µi , σ 2 ) (different means


P
but the same variance). Let Zi = nj=1 aij Xj be an orthogonal transformation (i.e.,
Pn
aij aik = 1 or 0 as j = k or j 6= k. The Z’s are normally distributed with means
i=1P
ξi = nj=1 aij µj and variance matrix σ 2 I.

Proof. The density of the Z’s are obtained from that of the X’s by substituting xi =
Pn
j=1 bij zj , where (bij ) is the inverse of the matrix (aij ), and multiplying by the Jacobian,
which is 1.

Alternatively, we can use the Moment Generating Function method.

The following corollary is a direct consequence of the above theorem. Or it can be


derived by using Basu’s theorem.

Corollary. Let X1 , ..., Xn be iid from N(0, σ 2 ).


Pn
(1) (n − 1)S 2 /σ 2 ≡ i=1 (Xi − X̄)2 /σ 2 ∼ χ2n−1 .

(2) nX̄/S ∼ tn−1 .


Proof. If we make an orthogonal transformation such that Z1 = nX̄, then, by AAT = I,
n
X n
X
Zi2 = Zi2 − Z12 == (Z1 , ..., Zn )(Z1 , ..., Zn )T − Z12
i=2 i=1

n
X n
X
= (X1 , ..., Xn )AAT (X1 , ..., Xn )T − Z12 = Xi2 − n(X̄)2 = (Xi − X̄)2 .
i=1 i=1

The rest is easy.

Theorem 6.2.3 Let (Xi , Yi ) ∼ Bivariate Normal with parameters (µ1 , µ2 , σ12 , σ22 , ρ).
(1). µ ¶
ρσ2
Yi |Xi = x ∼ N µ2 + (x − µ1 ), σ22 (1 − ρ2 )
σ1
P
(2). If (xi − x̄)2 > 0 (i.e., non-degenerate), and ρ = 0, then conditional on
X1 = x1 , ..., Xn = xn , we have
√ ¯
n − 2ρ̂ ¯¯
√ ¯ X = x ∼ tn−2
1 − ρ̂2 ¯
and because X and Y are independent under ρ = 0, we have

n − 2ρ̂
√ ∼ tn−2 .
1 − ρ̂2

80
Proof. The first part is in any elementary statistics book. We shall only prove the second
part below. First,
P
(Xi − X̄)(Yi − Ȳ )
ρ̂ = V = ³P ´1/2 ³P ´1/2
(Xi − X̄)2 (Yi − Ȳi )2
P
ai Yi
= qP where ai = √PXi −X̄
(Xi −X̄)2
Yi2 − n(Ȳ )2

and
qP P
q Yi2 − n(Ȳi )2 − ( ai Yi )2
1− ρ̂2 = qP
Yi2 − n(Ȳ )2

Thus, √ P P
n − 2V a i Yi a i Yi
W = √ 2
= P
r P ≡ q Q .
1−V 2 2
Yi −n(Ȳ ) −( ai Yi ) 2
n−2
n−2

Note that X X
ai = 0, a2i = 1,
from which we see that (a1 , ..., an ) and (n−1/2 , ..., n−1/2 ) are orthnormal.
From the expression of W , we can certainly assume that Yi ∼ N (0, 1), otherwise we
can always renormalize. If √ we make an orthogonal transformation from (Z1 , ..., Zn ) to
P
(Y1 , . . . , Yn ) such that Z1 = nȲ and Z2 = ni=1 ai Yi , then
n
X n
X n
X X
Zi2 = Zi2 − Z12 − Z22 = Yi2 − n(Ȳ )2 − ( ai Yi )2 ≡ Q.
i=3 i=1 i=1

Therefore, we can rewrite W as

Z2 N (0, 1)
W = r Pn = r 2 ∼ tn−2
Z2
i=3 i
χn−2
n−2 n−2

since the numerator and denominator are independent.

81
6.2.4 Application 1: one-sample problem.

Example. Let X1 , . . . , Xn ∼ N (µ, σ 2 ) with unknown µ and σ 2 > 0, where n ≥ 2. Find


UMPU tests for testing
(1). H0 : σ 2 ≤ σ02 versus H0 : σ 2 > σ02 .
(2). H0 : σ 2 = σ02 versus H0 : σ 2 6= σ02 .
(3). H0 : µ ≤ µ0 versus H0 : µ > µ0 .
(4). H0 : µ = µ0 versus H0 : µ 6= µ0 .

Solution. The joint p.d.f. of X1 , . . . , Xn is


( )
1 1 X 2 nµ nµ2
f (x) = exp − x i + x̄ −
(2πσ 2 )n/2 2σ 2 σ2 2σ 2

(1). Take
−1 X nµ
θ= 2
, U = Xi2 , ν = 2, T = X̄.
2σ σ
So the hypotheses H0 : σ 2 ≤ σ02 versus H0 : σ 2 > σ02 is equivalent to H0 : θ ≤ θ0 versus
−1 2 2
H0 : θ > θ0 for θ0 = 2σ 2 since θ = −1/(2σ ) is a strictly increasing function of σ .
0
When θ = θ0 (i.e., σ 2 = σ02 ), T = X̄ is complete and sufficient for µ. Take
P
(n − 1)S 2 Xi2 − n(X̄)2 U − nT 2
V = = = = h(U, T ) ∼ χ2n−1 .
σ02 σ02 σ02
So V ∼ χ2n−1 and hence is ancillary and thus independent of T . Also, V (U, T ) is increasing
in U for each T . Hence, a UMPU test of size α is

ψ(v) = I{v ≥ C0 },

where C0 satisfies

Eθ0 ψ(V ) = Pθ0 {V > C0 } = P {χ2n−1 > C0 } = α, i.e., C0 = χ2n (1 − α).

This is the test used in the elementary statistics textbook.

(2). Note that V = V (U, T ) = [U − nT 2 ] /σ02 is linear in U . So a UMPU test of size


α is
ψ(v) = I{v < C1 , or > C2 },
where C1 , C2 satisfy
Z C2
Eθ0 [1 − ψ(V )] = χ2n−1 (v)dv = P {C1 < χ2n−1 < C2 } = 1 − α.
C1
Z C2 Z C2
Eθ0 {V [1 − ψ(V )]} = vχ2n−1 (v)dv = (n − 1)χ2n+1 (v)dv = (n − 1)(1 − α),
C1 C1

where we have used the fact vχ2n−1 (v) = (n − 1)χ2n+1 (v), which follows from (2.13) in the
last chapter. So C1 , C2 satisfy
Z C2 Z C2
χ2n−1 (v)dv = χ2n+1 (v)dv = (1 − α).
C1 C1

82
If n − 1 ≈ n + 1, then C1 , C2 are nearly the (α/2)th and (1 − α/2)th quantiles of χ2n−1 ,
respectively (since they are roughly the solutions which are unique). This is, the UMPU
test here is nearly the same as the “equal-tailed” chi-square test in elementary textbooks.

(3). We make the transformation Xi0 = Xi − µ0 and µ0 = EXi0 = µ − µ0 , then the


hypothesis H0 : µ ≤ µ0 versus H0 : µ > µ0 is equivalent to H0 : µ0 ≤ 0 versus H0 : µ0 > 0.
So without loss of generality, we assume that µ0 = 0, (otherwise, we simply look at the
transformed data). We take
nµ −1 X
θ= , U = X̄, ν= , T = Xi2 .
σ2 2σ 2
Then the hypothesis H0 : µ ≤ 0 versus H0 : µ > 0 is equivalent to H0 : θ ≤ 0 versus
H0 : θ > 0.
P
When θ = θ0 (i.e., µ = 0), T = Xi2 is complete and sufficient for σ 2 . Take
q

nX̄ n(n − 1)U
V = = √ = V (U, T ) ∼ tn−1 ,
S T − nU 2
so V is ancillary and therefore is independent of T , and V ∼ tn−1 . Also, V (U, T ) is
increasing in U for each T . Hence, a UMPU test of size α is

ψ(v) = I{v ≥ C0 },

where C0 satisfies

Eθ0 ψ(V ) = Pθ0 {V ≥ C0 } = P {tn−1 ≥ C0 } = α, i.e., C0 = tn−1 (1 − α).

This is the test used in the elementary statistics textbook.

(4). As in (3), assume that µ0 = 0. The transformation in (3) can not be used to test
H0 : θ ≤ 0 versus H0 : θ > 0 since V is not linear in U . Let us try a new transformation
√ √ √ √
nX̄ nU nX̄/σ nZ̄
W = W (U, T ) = qP = √ = qP = qP , where Zi = Xiσ−0
Xi2 T Xi2 /σ 2 Zi2

Since (Z1 , ..., Zn ) ∼i.i.d. N (0, 1), so the d.f. of W does not depend on σ 2 , hence W is
P
ancillary and thus is independent of T = Xi2 , which is complete and sufficient for σ 2 .
Furthermore, we note that W is linear in U and has a symmetrical distribution under
H0 . The symmetry of distribution of W follows from

nȲ
−W = qP , where Yi = −Xi ∼ N (0, σ 2 )
2
Yi

which clearly has the same distribution as W .


So a UMPU test of size α is

ψ(w) = I{|w| ≥ C0 }, (2.4)

where C0 satisfies
α
Eθ0 ψ(W ) = Pθ0 {|W | ≥ C0 } = . (2.5)
2

83
(The second constraint follows from the first one if W has a symmetric distribution.)
Now from the following identity
q
(n − 1)W
V = √ ,
1 − W2
we see that V is strictly √
increasing in W and V also has a symmetrical distribution
(n−1)(−W )
under H0 (since −V = √ 2
has the same distribution as W which also has a
1−(−W )
symmetrical distribution). Noting that V is an odd function of W , it follows from (2.4)
and (2.5) that a UMPU test of size α is

ψ(v) = I{|v| ≥ D0 },

where D0 satisfies

Eθ0 ψ(V ) = Pθ0 {|V | ≥ D0 } = P {|tn−1 | ≥ D0 } = α.

In other words, we reject H0 iff |V | > tn−1 (1−α/2). This is the test used in the elementary
statistics textbook.

84
6.2.5 Application 2: two-sample problem.
Example. Let X1 , ..., Xm ∼ N (µ1 , σ12 ) and Y1 , . . . , Yn ∼ N (µ2 , σ22 ) with unknown µi ’s
and σi2 ’s. Find UMPU tests for testing

(1). H0 : σ22 /σ12 ≤ ∆0 versus H1 : σ22 /σ12 > ∆0 .


(2). H0 : σ22 /σ12 = ∆0 versus H1 : σ22 /σ12 6= ∆0 .
(3). H0 : µ2 − µ1 ≤ 0 versus H1 : µ2 − µ1 > 0, assuming σ12 = σ22 .
(4). H0 : µ2 − µ1 = 0 versus H1 : µ2 − µ1 6= 0, assuming σ12 = σ22 .

Solution. The joint p.d.f. of X1 , ..., Xm and Y1 , ..., Yn is


 
 1 X m
1 X n
mµ1 nµ2 
f (x, y) = C(µ1 , µ2 , σ12 , σ22 ) exp − 2 x2i − 2 yj2 + 2 x̄ + 2 ȳ
 2σ1 2σ2 j=1 σ1 σ2 
i=1

(1). Given σ22 /σ12 = ∆0 , we can write

f (x, y) = C(µ1 , µ2 , σ12 , σ22 )


 Ã ! Ã Pn ! 
 n m 2
nµ2 
j=1 yj
1 1 X 1 X mµ1
2
× exp − − y − x2 + + 2 x̄ + 2 ȳ  .
2σ22 2σ12 ∆0 j=1 j 2σ12 i=1 i ∆0 σ1 σ2

So we can take
à ! à !
1 1 1 mµ1 nµ2
θ=− 2
− 2 , ν = − 2, 2 , 2 ,
2σ2 2σ1 ∆0 2σ1 σ1 σ2
Ãm Pn !
n
X X j=1 Yj2
U= Yj2 , T = Xi2 + , X̄, Ȳ .
j=1 i=1 ∆0
So H0 : σ22 /σ12 ≤ ∆0 v.s. H1 : σ22 /σ12 > ∆0 is equivalent to H0 : θ ≤ 0 v.s. H0 : θ > 0.

Given θ = 0, T is complete and sufficient for ν. Take


Pn 2
j=1 (yj − ȳ) U − nT32
V = Pm 2
P n = ³
nT32
´ = h(U, T ) (2.6)
∆0 i=1 (xi − x̄) + j=1 (yj − ȳ)2 ∆0 T1 − mT22 − ∆0
(n − 1)S22 /σ22 (n − 1)W
= 2 2 2 2
= ,
(m − 1)S1 /σ1 + (n − 1)S2 /σ2 (m − 1) + (n − 1)W

where W = [S22 /σ22 ]/[S12 /σ12 ] ∼ Fn−1,m−1 . Clearly, V is ancillary and hence independent
of T . Also, V (U, T ) is increasing in U for each T and is linear in U . Also note that V is
a strictly increasing function of W . Hence, a UMPU test of size α is

ψ(v) = I{v ≥ C0 },

where C0 satisfies Eθ0 ψ(V ) = Pθ0 {V ≥ C0 } = α, or equivalently

ψ(w) = I{w ≥ D0 },

where D0 satisfies Eθ0 ψ(W ) = Pθ0 {Fn−1,m−1 ≥ D0 } = α. This is the F -test in elementary
statistics textbooks.

85
(2). We provide two methods here.

Method 1. From (2.7), we see that


W2
V = , (2.7)
W1 + W2
where W1 = (m − 1)S12 /σ12 ∼ χ2m−1 and W2 = (n − 1)S22 /σ22 ∼ χ2n−1 and they are
independent. It can be shown (see the lemma after this example) that V ∼ Beta((n −
1)/2, (m − 1)/2), that is,

Γ[ 12 (m + n − 2)] n−1 m−1


B 1
(n−1), 12 (m−1) (v) = 1 1 v 2 −1 (1 − v) 2 −1 , 0 ≤ v ≤ 1.
2 Γ[ 2 (m − 1)]Γ[ 2 (n − 1)]
Also we can prove
n−1
EV =
m+n−2
and
n−1
vB 1 (n−1), 1 (m−1) (v) = B1 1 (v) = [Eθ0 V ]B 1 (n+1), 1 (m−1) (v) (2.8)
2 2 m + n − 2 2 (n+1), 2 (m−1) 2 2

From (2.6), we see that V = V (U, T ) is linear in U . So a UMPU test of size α is

ψ(v) = I{v ≤ C1 , or ≥ C2 },

where C1 , C2 satisfy
Z C2
Eθ0 [1 − ψ(V )] = B 1 (n−1), 1 (m−1) (v)dv = 1 − α
C1 2 2
Z C2 Z C2
Eθ0 {V [1 − ψ(V )]} = vB 1 (n−1), 1 (m−1) (v)dv = B 1 (n+1), 1 (m−1) (v)dvEθ0 V
C1 2 2 C1 2 2

= (1 − α)Eθ0 V.

or equivalently (by noting (2.8))


Z C2 Z C2
B 1 (n−1), 1 (m−1) (v)dv = B 1 (n+1), 1 (m−1) (v)dv = (1 − α).
C1 2 2 C1 2 2

If n − 1 ≈ n + 1 (i.e., if n is large), then C1 , C2 are nearly the (α/2)th and (1 − α/2)th


quantiles of B 1 (n−1), 1 (m−1) , respectively (since they are roughly the solutions which are
2 2
unique).

It can be shown then that this UMPU test can then be approximated by the F -test
which rejects H0 : θ = 0 iff F < F(n−1),(m−1) (α/2) or F > F(n−1),(m−1) (1 − α/2). See
Method 2 below.

Lemma 6.2.1 If X ∼ χ2m and Y ∼ χ2n and they are independent. Show that
X
U= ∼ Beta(m/2, n/2), V = X + Y ∼ χ2m+n
X +Y
and U and V are independent.

86
X
Proof. Define U = X+Y and V = X +Y . Then its inverse is X = U V and Y = V (1−U ),
with Jacobian ∂(X, Y )/∂(U, V ) = V . So

f (u, v) = |J|f (x, y) = vfX (x)fY (y)


= Cvxm/2−1 e−x/2 y n/2−1 e−y/2
= Cve−v/2 (uv)m/2−1 [v(1 − u)]n/2−1
= Cv (m+n)/2−1 e−v/2 um/2−1 (1 − u)n/2−1 .

Thus,
fU (u) = Cum/2−1 (1 − u)n/2−1 , fV (v) = Cv (m+n)/2−1 e−v/2 .
Therefore, U and V are independent and U ∼ Beta(m/2, n/2) and V ∼ χ2m+n .

Method 2. From (2.7), we see that

F
V = ,
1+F
where F = W2 /W1 ∼ F(n−1),(m−1) . It can be shown (see the lemma after this example)
that
F(n−1),(m−1) (v) =????????, v ≤ 0.
?????

???????????

87
(3). Given σ12 = σ22 = σ 2 , we can write the joint p.d.f.
 
1 X m 1 X n
mµ1 nµ2 
f (x, y) = C(µ1 , µ2 , σ12 , σ22 ) exp − 2 x2i − 2 yj2 + 2 x̄ + 2 ȳ 
2σ1 i=1 2σ2 j=1 σ1 σ2
   

1 Xm Xn
1 
= C(µ1 , µ2 , σ12 , σ22 ) exp − 2  x2i + yj2  + 2 (mµ1 x̄ + nµ2 ȳ)
2σ i=1 j=1 σ

But
µ ¶
1 1
+ (mµ1 x̄ + nµ2 ȳ)
m n
m n
= µ1 x̄ + µ2 ȳ + µ1 x̄ + µ2 ȳ
n m
m n
= (µ2 − µ1 )(ȳ − x̄) + µ1 ȳ + µ2 x̄ + µ1 x̄ + µ2 ȳ
n m
1 1
= (µ2 − µ1 )(ȳ − x̄) + µ1 (mx̄ + nȳ) + µ2 (nȳ + mx̄)
n m
1
= (µ2 − µ1 )(ȳ − x̄) + (mµ1 + nµ2 ) (mx̄ + nȳ)
mn

So
  
 (µ − µ )(ȳ − x̄) (mµ + nµ ) (mx̄ + nȳ) 1 m
X n
X 
2 1 1 2  2 2
f (x, y) = C exp ³ ´ + − x + y
 σ2 1 + 1 (m + n)σ 2 2σ 2 i=1 i j=1 j 
m n

We take
(µ2 − µ1 )
θ= ³
1 1
´, U = Ȳ − X̄,
σ2 m
+ n
à !  
m
X n
X
(mµ1 + nµ2 ) −1
ν= , , T = mX̄ + nȲ , Xi2 + Yj2  .
(m + n)σ 2 2σ 2 i=1 j=1

Then the hypothesis H0 : µ1 ≤ µ2 versus H1 : µ1 > µ2 is equivalent to H0 : θ ≥ 0 versus


H0 : θ < 0.
Given µ1 = µ2 , T is complete and sufficient. Take
³ ´1/2
1 1
(Ȳ − X̄)/ m
+ n
V = qP Pn
m
i=1 (Xi − X̄)2 + j=1 (Yj − Ȳ )2 / (m + n − 2)1/2
·³ ´1/2 ¸
1 1
(Ȳ − X̄)/ m
+ n
= (Sp2 is the pooled variance)
Sp
³ ´1/2
1 1
(Ȳ − X̄)/ m
+ n
= qP Pn
m
i=1 Xi2 − m(X̄)2 + j=1 Yj2 − n(Ȳ )2 / (m + n − 2)1/2
· ³ ´1/2 ¸
1 1
(Ȳ − X̄)/ σ m
+ n
= qP Pn
m
i=1 (Xi − X̄)2 /σ 2 + j=1 (Yj − Ȳ )2 /σ 2 / (m + n − 2)1/2
≡ V1 /V2

88
Clearly, the numerator V1 ∼ N (0, 1) and denominator V2 ∼ χ2m+n−2 and they are inde-
P P
pendent since X̄, Ȳ , ni=1 (Xi − X̄)2 and nj=1 (Yj − Ȳ )2 are independent. Therefore, we
get V ∼ tm+n−2 , which does not depend on any parameters, and hence is ancillary and
independent of T , Now we’ll express V in terms of U and T . From
U = Ȳ − X̄, and T1 = mX̄ + nȲ ,
we have
T1 − nU T1 + mU
X̄ = , Ȳ = .
m+n m+n
So
m(X̄)2 + n(Ȳ )2
1 h i
2 2 2 2 2 2
= (mT 1 − 2mnT 1 U + mn U ) + (nT1 + 2mnT 1 U + m nU )
(m + n)2
T12 mn
= + U2
(m + n) (m + n)

Therefore, Given µ1 = µ2 , T is complete and sufficient. Then we can rewrite


³ ´1/2
1 1
U/ m
+ n
V = V (U, T ) = q .
T2 − 1
T2
(m+n) 1
− mn
(m+n)
U 2 / (m + n − 2)1/2

So V (U, T ) is increasing in U for each T . Hence, a UMPU test of size α is


ψ(v) = I{v ≥ C0 },
where C0 satisfies Eθ0 ψ(V ) = Pθ0 {V ≥ C0 } = P {tm+n−2 ≥ C0 } = α. That is, C0 =
tm+n−2 (1 − α).
This is the test used in the elementary statistics textbook.
(4). In order to test H0 : µ2 − µ1 = 0 versus H0 : µ2 − µ1 6= 0 under the assumption
that σ12 = σ22 , we can not use the transformation V in (3) since it is not linear in U .
However, we can define
(Ȳ − X̄) U
W = r 2
=r .
Pm 2 Pn 2 (mX̄+nȲ ) T2 −
T12
i=1 Xi + j=1 Yj − m+n m+n

Then it is can shown that


(m + n − 2)1/2 W
V = 1 1 1/2
q .
(m + n) mn
1 − m+n W2
Clearly, as a function of V , W is also ancillary and hence independent of T . In addition,
W is a linear function of U and has a symmetric distribution about 0 when µ1 = µ2 (V
also has a symmetric distribution about 0). The proof is given in the lemma after the
solution.
Therefore, a UMPU test is (noting that V is a odd function of W ),
ψ(v) = I{|w| ≥ C} = I{|v| ≥ C0 },
where C0 satisfies Eθ0 ψ(V ) = Pθ0 {|V | ≥ C0 } = P {|tm+n−2 | ≥ C0 } = α. That is, C0 =
tm+n−2 (1 − α/2). This is the test used in the elementary statistics textbook.

89
Remark 6.2.1 This part is similar to part (4) as in the one-sample problem. See Lehmann
(1986, p203).

Proof of symmetric distributions for W and V when µ1 = µ2 .


We only need to show that W =d −W and V =d −V .
Note that if µ1 = µ2 = µ, then we can write

Ȳ − X̄
W = r 2
Pm 2 Pn 2 (mX̄+nȲ )
i=1 Xi + j=1 Yj − m+n
((Ȳ − µ) − (X̄ − µ))
= r 2
Pm 2 Pn 2 2 (mX̄+nȲ )
i=1 Xi + j=1 Yj − (m + n)µ − m+n
+ (m + n)µ2
((Ȳ − µ) − (X̄ − µ))/σ
= s
2
Pm Pn (m(X̄−µ)+n(Ȳ −µ))
i=1 (Xi − µ)2 /σ 2 + j=1 (Yj − µ)2 /σ 2 − (m+n)σ 2

Ȳ 0 − X̄ 0
= r 2
Pm 02 Pn 02 (mX̄ 0 +nȲ 0 )
i=1 Xi + j=1 Yj − m+n

where Xi0 ∼ N (0, 1) and Yj0 ∼ N (0, 1) and they are all independent of each other. Note
that
−(Ȳ 0 − X̄ 0 )
−W = r 2
Pm 02 Pn 02 (mX̄ 0 +nȲ 0 )
i=1 Xi + j=1 Yj − m+n
00 00
Ȳ − X̄
= r 2
Pm 00 2 Pn 00 2 (mX̄ 00 +nȲ 00 )
i=1 Xi + j=1 Yj − m+n

where Xi00 = −Xi0 ∼ N (0, 1) and Yj00 = −Yj0 ∼ N (0, 1) and they are all independent of
each other. Clearly, W has a symmetrical distribution.
On the other hand, we write

Cm,n W Cm,n (−W )


−V = − q mn
=q mn
1− m+n
W2 1− m+n
(−W )2

which also has a symmetric distribution since W =d −W . This completes the proof of
the lemma.

90
6.2.6 Application 3: Testing for independence in the bivariate
normal family.
Eg. Suppose that (X1 , Y1 ), ..., (Xn , Yn ) are iid with pdf
( )
1 (x − µ1 )2 ρ(x − µ1 )(y − µ2 ) (y − µ2 )2
f (x, y) = √ exp − + −
2πσ1 σ2 1 − ρ2 2σ12 (1 − ρ2 ) σ1 σ2 (1 − ρ2 ) 2σ22 (1 − ρ2 )

Find a UMPU test for testing H0 : ρ = 0, versus H1 : ρ 6= 0.

Solution. The joint p.d.f. is


( P P P )
1 (xi − µ1 )2 ρ (xi − µ1 )(yi − µ2 ) (yi − µ2 )2
f (x, y) = √ exp − + −
[2πσ1 σ2 1 − ρ2 ]n 2σ12 (1 − ρ2 ) σ1 σ2 (1 − ρ2 ) 2σ22 (1 − ρ2 )
( P P 2 P 2 )
ρ xi y i xi yi X X
= C(µ1 , µ2 , σ1 , σ2 , ρ) exp + − − + ν3 xi + ν4 yi
σ1 σ2 (1 − ρ2 ) 2σ12 (1 − ρ2 ) 2σ22 (1 − ρ2 )

So we can take
ρ
θ= , ν = (ν1 , ..., ν4 ),
σ1 σ2 (1 − ρ2 )
X ³X X X X ´
U= Xi Yi , T = Xi , Yi , Xi2 , Yi2 .
Therefore, testing
H0 : ρ = 0, versus H1 : ρ 6= 0
is equivalent to
H0 : θ = 0, versus H1 : θ 6= 0.
Given θ = 0, T is complete and sufficient for ν. Define
P
(Xi − X̄)(Yi − Ȳ )
V = ρ̂ = ³P ´1/2 ³P ´1/2
(Xi − X̄)2 (Yi − Ȳi )2
P P P
Xi Yi − n−1 ( Xi )( Yi )
= P P 1/2 P P 1/2
( Xi2 − n−1 (Xi ( )2 ) Yi2 − n−1 ( Yi )2 )
−1
U − n T1 T2
= 1/2 1/2
(T3 − n−1 T11 ) (T4 − n−1 T22 )

Clearly, in the above expression, if we replace Xi and Yj by (Xi − µ1 )/σ1 and (Yj − µ2 )/σ2
which are distributed as N (0, 1), then the value of V remains the same, therefore, V is
ancillary. By Basu’s theorem, V is independent of T . Furthermore, V is linear in U . So
a UMPU test is

ψ(v) = 1 when v < C1 or > C2


0 when C1 < v < C2 ,

where Ci ’s are determined from

Eθ0 =0 {ψ(V )} = α,
Eθ0 =0 {V ψ(V )} = αEθ0 =0 [V ] .

91
Define √
n − 2V
W = √ ,
1−V2
which is strictly increasing in v. Under H0 : ρ = 0, it can be shown that

W ∼ tn−2 , which is clearly symmetric about 0.

So a UMPU test is

φ(w) = I{|w| > C0 }, where C0 = tn−2 (α/2).

Question: How does one show that W ∼ tn−2 ?

92
6.2.7 Application 4: Regression.
Theorem 6.2.4 Let (x1 , Y1 ), ..., (xn , Yn ) follow the linear regression

Yi = α + βxi + ²i , i = 1, ..., n.

where ²i are iid N (0, σ 2 ). Define ρ = cα + dβ for some known constants c and d. Find a
UMPU test for testing

H0 : ρ = ρ0 , versus H1 : ρ 6= ρ0 .

Solution. Note that we can write


" r #
X (xi − x̄)
α + βxi = (α + β x̄) + β(xi − x̄) = (α + β x̄) + β (xi − x̄)2 qP = γ + δvi
(xi − x̄)2

where r
X (xi − x̄)
γ = (α + β x̄), δ=β (xi − x̄)2 vi = qP
(xi − x̄)2
Then we have
δ x̄
β = qP , α = γ − δ qP
(xi − x̄)2 (xi − x̄)2
and ρ = cα + dβ = aγ + bδ. Note also that
X X
vi = 0, vi2 = 1.

The joint p.d.f. of Y1 , . . . , Yn is


½ ¾
1 1 X
f (y1 , ..., yn ) = √ exp − 2
(yi − γ − δvi )2
[ 2πσ] n 2σ
( P P 2 P )
δ vi yi yi γ yi
= C exp − +
σ2 2σ 2 σ2

So we can take
δ −1 γ
θ= , ν1 = ν2 = ,
σ2 2σ 2 σ2
X X X
U= v i Yi , T1 = Yi2 T2 = Yi .
.....................................

6.2.8 Application 5: Non-normal example.


So far, we only deal with normal distributions. But the main theorem can also be applied
to other multi-parameter exponential families.

Gamma example. See Question 1 in Exercise 4.

93
6.3 The LSE in linear models
One of the most useful statistical models for non-iid data in applications is the following
linear model

Xi = Zi β τ + εi = βZiτ + εi , i = 1, ..., n, (3.9)

where Xi is the ith response random variable, β is a p-vector of unknown parameters with
p ≤ n, Zi is the ith p-vector of (non-random) covariates, and εi ’s are random errors.
Let X = (X1 , ..., Xn ), ε = (ε1 , ..., εn ) and Z be the n × p matrix whose ith row is Zi ,
i = 1, ...n. Then, we can write (3.9) as (X1 , ..., Xn ) = (β τ Z1τ , ..., βZnτ ) + (ε1 , ..., εn ), or it
can be put into the following matrix format

X = βZ τ + ε.

One of the most commonly used estimator is the least square estimator (LSE) is defined
to be β̂ such that ° °2
° ° 2
°X − β̂Z τ ° = minp kX − bZ τ k .
β∈R

For any l ∈ Rp , β̂lτ is called an LSE of βlτ . Differentiating kX − bZ τ k w.r.t. b, we see


that any LSE satisfies
bZ τ Z = XZ.

(i). If the rank of Z is p (i.e., full rank), then Z τ Z is of full rank p, then there
is a unique LSE given by
β̂ = XZ(Z τ Z)−1 .

(ii). If the rank of Z is < p (i.e., not of full rank), then Z τ Z is not of full rank,
then there are infinitely many LSE’s of β, all of which take the form

β̂ = XZ(Z τ Z)− ,

where (Z τ Z)− is called a generalized inverse of Z τ Z satisfying

(Z τ Z)(Z τ Z)− (Z τ Z) = Z τ Z

Proof of (ii). Left as an exercise.

94
6.4 Summary
This chapter is concerned with UMPU tests again for multi-parameter exponential fam-
ilies. With further conditions, we could reduce the UMPU tests in conditional form to
unconditional form. The results have been applied to some well known problems studied
in elementary statistics courses.

6.4.1 Exercises
1. Let X1 , . . . , Xn be iid from the gamma distribution Γ(α, γ) with unknown α and γ,
whose p.d.f. is
1
f (x) = xα−1 e−x/γ I{x > 0}.
Γ(α)γ α
(i). For testing H0 : α ≤ α0 versus H1 : α > α0 and H0 : α = α0 versus
H1 : α 6= α0 , show that
´ there exist UMPU tests whose rejections are based
Qn ³
on W = i=1 Xi /X̄ .
(ii). For testing H0 : γ ≤ γ0 versus H1 : γ > γ0 , show that a UMPU test
P Q
rejects H0 when ni=1 Xi > C ( ni=1 Xi ). (Here, C(t) is a function of t.)

2. Let X1 and X2 be independently distributed as the negative binomial distributions


N B(p1 , r1 ) and N B(p2 , r2 ), respectively, where ri ’s are known and pi ’s are unknown.
Note that a N B(p, r) r.v. has p.d.f.
à !
x−1 r
f (x) = p (1 − p)x−r , x = r, r + 1, r + 2, ......
r−1

(i). Show that there exist a UMPU test for testing H0 : p1 ≤ p2 versus
H1 : p1 > p2 .
(ii). Determine the conditional distribution PU |T =t when r1 = r2 = 1.

3. Let X1 and X2 be independently distributed according to the one-parameter expo-


nential family

fi (x) = exp{ηi (θi )Ti (x) − ξi (θi )}hi (x), i = 1, 2.

Show that there exist a UMPU test of size α for testing

(i). H0 : η2 (θ2 ) − η1 (θ1 ) ≤ η0 versus H1 : η2 (θ2 ) − η1 (θ1 ) > η0 .


(ii). H0 : η2 (θ2 ) + η1 (θ1 ) ≤ η0 versus H1 : η2 (θ2 ) + η1 (θ1 ) > η0 .

95
Chapter 7

Hypothesis Testing by Likelihood


Methods

UMP, UMPU (and UMPI) tests often do not exist in a particular problem. In this chapter,
we shall introduce other tests. These tests may not be optimal, but they are very general
methods, easy to use, and have intuitive appeal. They often coincide with optimal tests
(UMP, UMPU tests). They play similar role to the MLE in the estimation theory.
Throughout the chapter, we assume that a sample

X ∼ F ∈ F = {Fθ : θ ∈ Θ},

and Fθ has p.d.f fθ . We wish to test

H0 : θ ∈ Θ 0 , versus H1 : θ ∈ Θ 1 , (0.1)

where Θ0 ∪ Θ1 = Θ and Θ0 ∩ Θ1 = 0.

7.1 Likelihood Ratio Tests.


7.1.1 Definition.
Definition. Let l(θ) = fθ (x) be the likelihood function. Define the likelihood ratio (LR)
by
supθ∈Θ0 l(θ) l(θ̂0 )
λ(X) ≡ =
supθ∈Θ l(θ) l(θ̂)
where θ̂ is the unrestricted MLE while θ̂0 is the restricted MLE in the parameter space
Θ0 (if they exist). Then a likelihood ratio test (LRT) for testing (0.1) is any test that
rejects H0 iff
λ(X) < C, where C ∈ [0, 1].

Remark 7.1.1 .

(i). If Θ0 = {θ0 } (simple null hypothesis), then λ(X) = l(θ0 )/l(θ̂).


(ii). Note that 0 ≤ λ(X) ≤ 1.

96
(iii). If the distribution of λ(X) is continuous, then we can find a test of size
α exactly. If its distribution is discrete, this may not be achieved. However,
we can use randomization to achieve the exact size.
(iv). LRT is a method to derive tests, however, no optimal properties are
known. Fortunately, in many situations, LRT’s are optimal (either UMP,
UMPU or UMPI etc.) as illustrated in the next section.
(v). LRT in hypothesis testing plays a similar role as MLE in parametric
estimation. Both are generally applicable, may not always have good finite-
sample properties but they do have good asymptotic properties.

97
7.2 LRT with no nuisance parameters.
We shall show that when there is no nuisance parameters, LRT tests are often optimal
(either UMP, UMPU or UMPI). But we first start with a couple of examples.

7.2.1 Some examples.

Example 1. (Normal example with no nuisance parameters.) Let X1 , . . . , Xn ∼


N (θ, 1). Find the LRT of size α for testing H0 : θ = θ0 versus H1 : θ 6= θ0 .

Solution. The likelihood is


1X
l(θ) = f (x) = (2π)−n/2 exp{− (xi − θ)2 }.
2
Here Θ0 = {θ : θ = θ0 } and Θ = {θ : θ ∈ R}.

The unrestricted MLE for θ is θ̂ = x̄. Thus,

supθ∈Θ0 l(θ)
λ(x) ≡
supθ∈Θ l(θ)
P
exp{− 12 (xi − θ0 )2 }
= P
exp{− 12 (xi − x̄)2 }
P
exp{− 21 (xi − x̄)2 − n2 (x̄ − θ0 )2 }
= P
exp{− 12 (xi − x̄)2 }
½ ¾
n
= exp − (x̄ − θ0 )2
2
Thus λ(x) < C iff (x̄ − θ0 )2 > C1 iff |x̄ − θ0 | > C0 , where C0 is determined from
³ ´ ³√ √ ´ √
sup Pθ |X̄ − θ0 | > C0 = Pθ0 n|X̄ − θ0 | > C0 n = 1 − Φ(C0 n) = α,
θ∈Θ0


that is, C0 = Φ−1 (1 − α2 )/ n.

Remark 7.2.1 The LRT test is the same as the UMPU test.

98
Example 2. (Binomial distribution) Let X1 , ..., Xn ∼ Bin(1, θ) with 0 ≤ θ ≤ 1. Find
the LRT of H0 : θ ≤ θ0 versus H1 : θ > θ0 .

Solution. The likelihood is


P P
xi
l(θ) = θ (1 − θ)n− xi
= θnx̄ (1 − θ)n[1−x̄] .
Here Θ0 = {θ : θ ≤ θ0 }.

P
The global MLE for θ is θ̂ = xi /n = x̄ and
l(θ̂) = x̄nx̄ (1 − x̄)n[1−x̄] .
Easy to check that
l0 (θ) = θnx̄−1 (1 − θ)n[1−x̄]−1 n[x̄ − θ].
That is, l0 (θ) > 0, = 0, < 0 depending on θ < x̄, = x̄, > x̄. So l(θ) first increases, achieves
its maximum at x̄, and then decreases. It follows from the plot that
supθ∈Θ0 l(θ)
λ(x) ≡
supθ∈Θ l(θ)
= 1 if θ̂ ≤ θ0
θ0nx̄ (1 − θ0 )n[1−x̄]
if θ̂ > θ0
x̄nx̄ (1 − x̄)n[1−x̄]

Denote
1 θnx (1 − θ0 )n[1−x]
g(x) = log 0nx
n x (1 − x)n[1−x]
= x log θ0 + (1 − x) log(1 − θ0 ) − x log x − (1 − x) log(1 − x).
So for x > θ0 , we have
à ! µ ¶
0 θ0 x
g (x) = log θ0 − log(1 − θ0 ) − log x + log(1 − x) = log − log < 0.
1 − θ0 1−x

Thus, g(x) is decreasing in x. Hence for θ̂ ≤ θ0 , we see that λ(x) is also decreasing in θ̂.
P
Thus λ(x) < C iff Xi > C0 , where
³X ´ ³X ´
sup Pθ Xi > C0 = Pθ0 Xi > C0 = α, (2.2)
θ≤θ0

if such C0 exists, otherwise we can determine C0 and γ0 from


³X ´ ³X ´
Pθ0 Xi > C0 + γ0 Pθ0 Xi = C0 = α,

Remark 7.2.2 The LRT test is also the same as the UMPU test.
Remark 7.2.3 The first equality in (2.2) follows from the fact that
à !
³X ´ n
X n k
g(θ) = sup Pθ Xi > C0 = θ (1 − θ)n−k
θ≤θ0 k=C0 +1 k

is an increasing function of θ as g 0 (θ) > 0. (Please check.)

99
7.2.2 LRT in one-parameter exponential family.
We have seen from the last examples that LRT are equivalent to UMP or UMPU tests.
In fact, it is true in general for one-parameter exponential family.

Theorem 7.2.1 Suppose that X is from the one-parameter exponential family

fθ (x) = exp{η(θ)T (x) − κ(θ)}h(x),

where θ is real-valued and η(θ) is a strictly increasing function of θ.


(a). For testing
H0 : θ ≤ θ0 , versus H1 : θ > θ 0
there exists LRT whose rejection region is T (X) > C0 , which is the same form
as that of UMP test.
(b). For testing

H0 : θ < θ 1 or θ > θ2 versus H1 : θ1 ≤ θ ≤ θ2 ,

there exists LRT whose rejection region is C1 < T (X) < C2 , which is the same
form as that of UMP test.
(c). For testing

H0 : θ1 ≤ θ ≤ θ2 , versus H1 : θ < θ 1 or θ > θ2

there exists LRT whose rejection region is T (X) < C1 or T (X) > C2 , which
is the same form as that of UMPU test.
(c). For testing
H0 : θ = θ0 , versus H1 : θ 6= θ0 ,
there exists LRT whose rejection region is T (X) < C1 or T (X) > C2 , which
is the same form as that of UMPU test.

Proof. Since η(θ) is a strictly increasing function of θ, so the hypotheses are equivalent
to testing η. For instance,

H0 : θ ≤ θ0 , versus H1 : θ > θ0

is the same as

H0 : η ≤ η0 , versus H1 : η > η0 , where η0 = η(θ0 ).

A reparametrization results in

fη (x) = exp{ηT (x) − ξ(η)}h(x),

(a). The log-likelihood and its derivatives are

log l(η) = log fη (x) = ηT (x) − ξ(η) + log h(x)


d log l(η)
= T (x) − ξ 0 (η)

d2 log l(η)
= −ξ 00 (η) = −varη (T (X)) < 0.
dη 2

100
Setting the first derivative to zero, we get the MLE η̂ satisfying
ξ 0 (η̂) = T (x) ≡ t, thus η̂ = η̂(T (x)) = η̂(t).
and
dt
= ξ 00 (η̂) > 0.
dη̂
So l(η) first increases strictly up to η̂ and then decreases strictly after that. Thus,
supθ∈Θ0 l(η)
λ(x) ≡
supθ∈Θ l(η)
= 1 if η̂ ≤ η0
l(η0 )
if η̂ > η0
l(η̂)
= 1 if η̂ ≤ η0
eg(t|η0 ) if η̂ > η0 ,
where
l(η0 )
g(t|η0 ) = g(T (x)|η0 ) = log = log l(η0 ) − log l(η̂)
l(η̂)
= (η0 − η̂)t − [ξ(η0 ) − ξ(η̂)] .
So
dg(t) dη̂
g 0 (t|η0 ) =
dη̂ dt
" #
dη̂ 0 dt
= −t + ξ (η̂) + (η0 − η̂)
dt dη̂
" #
dη̂ dt
= (η0 − η̂)
dt dη̂
= η0 − η̂.

dη̂
g 00 (t|η0 ) = − < 0.
dt
In particular, for η0 < η̂, we have g 0 (t|η0 ) < 0, which means that g(t|η0 ) is strictly
decreasing in t. Hence for η̂ ≤ η0 , we see that λ(x) is also strictly decreasing in η̂. Thus
we reject H0 iff λ(x) < C iff T (x) > C0 , which is the rejection region for the UMP test.
(b). The proof is similar to that in (a). Note that
supθ∈Θ0 l(η)
λ(x) ≡
supθ∈Θ l(η)
= 1 if η̂ < η1 or η̂ > η2
max{l(η1 ), l(η2 )}
if η1 ≤ η̂ ≤ η2
l(η̂)
= 1 if η̂ < η1 or η̂ > η2
( )
l(η1 ) l(η2 )
max , if η1 ≤ η̂ ≤ η2
l(η̂) l(η̂)
= 1 if η̂ < η1 or η̂ > η2
exp {max{g(t|η1 ), g(t|η2 )}} if η1 ≤ η̂ ≤ η2 .

101
Similarly to the first part of the proof, if η1 ≤ η̂ ≤ η2 , we see that g(t|η1 ) is strictly
decreasing in t and g(t|η2 ) is strictly increasing in t.
Hence we reject H0 iff λ(x) < C iff g(t|η1 ) < C10 and g(t|η2 ) < C20 iff C1 < T (x) < C2 .
The proof is complete.

(c) and (d). The proofs are similar.

Remark 7.2.4 .
(i). Although the rejection regions for LRT tests and UMP (or UMPU etc)
tests are of the same form, however, for tests of size α for both, then they
may not necessarily agree on their critical points (i.e., Ci ’s may be different
for LRT and UMP or UMPU tests.); see the next example.
(ii). In some situations, LRT and UMP (or UMPU) tests of size α are the
same. For instance, for the one-sided tests, they are always the same. For
two-sided tests, they are the same if the test statistic is symmetric since the
two side conditions reduce to one in this case.

7.2.3 LRT with non-exponential family (with no nuisance pa-


rameters)
So far, we have seen that LRT are the same or are of the same form as UMP or UMPU
tests in one-parameter exponential family. However, the same phonomonon could also be
true for other non-exponential families. We shall give a couple of examples.

Example. [Uniform distribution] Let X1 , ..., Xn ∼ U nif (0, θ) with θ > 0. Recall the
UMP test of size α for testing H0 : θ = θ0 versus H1 : θ 6= θ0 is
ψ(x) = I{x(n) > θ0 , or x(n) ≤ θ0 α1/n }
Show that this is also the LRT of size α.

Solution. The likelihood is l(θ) = f (x) = θ−n I{x(n) < θ}. The unrestricted MLE for θ
is θ̂ = x(n) and
supθ∈Θ0 l(θ) l(θ0 )
λ(x) ≡ =
supθ∈Θ l(θ) supθ∈Θ l(θ)
−n µ ¶
θ0 I{x(n) < θ0 } x(n) n
= = I{x(n) < θ0 }
x−n
(n) θ0
µ ¶
x(n) n
= if x(n) < θ0
θ0
0 if x(n) > θ0 .
From the plot of λ(x) versus x(n) , we see that the rejection region is λ(x) < C iff X(n) > θ0
or X(n) /θ0 < C0 , where C0 = α1/n in order to make the test of size α.

Example. (Exponential distribution with boundary depending on parameters)


Let X1 , ..., Xn be a random sample from an exponential distribution with pdf
fθ (x) = e−(x−θ) I{x ≥ θ}.

102
(a). Find an LRT of size α for testing H0 : θ ≤ θ0 versus H1 : θ > θ0 .
(b). Is it the same as the UMP test?

Solution. (a). The likelihood is


P
l(θ) = fθ (x) = e− x+nθ
I{θ ≤ x(1) }.
Clearly, l(θ) is a strictly increasing function of θ for θ ≤ x(1) and drops to
P
zero afterwards.
− x+nθ̂
Therefore, the unrestricted MLE for θ is θ̂ = x(1) and supθ∈Θ l(θ) = e . Therefore,
supθ∈Θ0 l(θ)
λ(x) ≡
supθ∈Θ l(θ)
= 1 if x(1) < θ0 .
P
l(θ0 ) e− x+nθ0
= P if x(1) ≥ θ0
l(θ̂) e− x+nθ̂

= 1 if x(1) < θ0 .
en(θ0 −x(1) ) if x(1) ≥ θ0 .
Note that λ(x) is a continuous and nonincreasing function of x(1) . (In fact, it stays flat
for x(1) < θ0 and graduately decreases to zero.) Thus λ(x) < C iff x(1) > C0 , where C0 is
decided by the size of the test.
(b). As an exercise. (Check.)

7.3 Equivalence of LRT and Neyman-Pearson test


when both exist
When both null and alternative hypotheses are simple, then LRT and Neyman-Pearson
tests are the same.
Theorem 7.3.1 When both H0 and H1 are simple, i.e., Θ0 = {θ0 } and Θ1 = {θ1 }.
Suppose that both LRT and UMP test by NP lemma exist for a given size α, then they
are equivalent.
Proof. From NP lemma, one rejects H0 when fθ1 (x)/fθ0 (x) > C0 iff
r(x) ≡ fθ0 (x)/fθ1 (x) < C1 (3.3)
iff
fθ0 (x) supθ∈Θ0 l(θ)
λ(x) = = < C2 . (3.4)
max{fθ0 (x), fθ1 (x)} supθ∈Θ l(θ)
To see this, if (3.3) holds, then
λ(x) < max{1, C1 }.
So we can take C2 = max{1, C1 }.
On the other hand, note that 0 ≤ C2 ≤ 1. So if (3.4) holds, then
λ(x) < C2 ≤ 1,
or λ(x) < 1, then we have
r(x) ≡ fθ0 (x)/fθ1 (x) = C2 .
So we can take C1 = C2 .

103
7.4 LRT with nuisance parameters
LRT has been seen to coincide with optimal tests in many situations when there are no
nuisance parameters. However, LRT proves even more useful when there are nuisance
parameters. We also start with some examples.

7.4.1 Examples with multiparameter exponential families

Example. (One-sample problem). Let X1 , . . . , Xn ∼ N (µ, σ 2 ), both µ and σ 2 un-


known. Find the LRT of size α for testing H0 : µ = µ0 versus H1 : µ 6= µ0 .

Solution. The likelihood is


½ ¾
2 −n/2 1 X 2
l(θ) = f (x) = (2πσ ) exp − 2 (xi − µ) .

Here θ = (µ, σ 2 ), Θ0 = {(µ, σ 2 ) : µ = µ0 , σ 2 > 0} and Θ = {(µ, σ 2 ) : µ ∈ R, σ 2 > 0}.

The unrestricted MLE’s for µ and σ 2 are θ̂ = (µ̂, σ̂ 2 ) where


1X
µ̂ = x̄, σ̂ 2 = (xi − x̄)2
n
and ½ ¾ ½ ¾
2 −n/2 1 X 2 2 −n/2 n
sup l(θ) = (2πσ̂ ) exp − 2 (xi − x̄) = (2πσ̂ ) exp −
θ∈Θ 2σ̂ 2
The restricted MLE for σ 2 when µ = µ0 is
1X X
σ̂02 = (xi − µ0 )2 = (xi − x̄)2 + n(x̄ − µ0 )2 .
n
½ ¾
n
sup l(θ) = (2πσ̂02 )−n/2 exp −
θ∈Θ0 2
Thus
à !n/2
supθ∈Θ0 l(θ) σ̂ 2
λ(x) ≡ =
supθ∈Θ l(θ) σ̂02
à P !n/2
(xi − x̄)2
= P
(xi − x̄)2 + n(x̄ − µ0 )2
à !n/2
1
= P
1 + n(x̄ − µ0 )2 / (xi − x̄)2
à !n/2
1
= √ 2 ,
1 + [ n(x̄ − µ0 )/s] /(n − 1)
P
where s2 = (xi − x̄)2 /(n − 1).
Thus λ(x) < C iff

n|x̄ − µ0 |
> C, where C = tn−1 (1 − α2 ).
s

104
Remark 7.4.1 .

(a). The LRT test is exactly the same as the UMPU test.
(b). In this case, the exact distribution of λ(X) is known. Its approximate
distribution can be found as follows.
µ h√ i2 ¶
−2 log λ(X) = n log 1 + n(X̄ − µ0 )/S /(n − 1)
n h√ i2
≈ n(X̄ − µ0 )/S + op (1)
n−1
∼approx χ21 ,

by CLT and Slutsky’s theorem since n/(n − 1) → 1, S 2 →prob σ 2 .

Example. (Two-sample problem). Let X1 , ..., Xm ∼ N (µ1 , σ12 ), and Y1 , ..., Yn ∼


N (µ2 , σ22 ), where µi , σi2 ’s are all unknown. The two samples are independent.

(a). Find the LRT of size α for testing H0 : σ12 = σ22 versus H1 : σ12 6= σ22 . Is it
the same as the UMPU test?
(b). Find the LRT of size α for testing H0 : µ1 = µ2 versus H1 : µ1 6= µ2 ,
assuming that σ12 = σ22 = σ 2 . Is it the same as the UMPU test?

Solution.
(a). See homework 5 for solutions. It turns out that LRT and UMPU test are the
same.
(b). The likelihood is
  
1 
1 X m Xn 
l(θ) = f (x)f (y) = m+n exp − 2  (xi − µ1 )2 + (yi − µ2 )2  .
(2πσ 2 ) 2  2σ 
i=1 j=1

Here θ = (µ1 , µ2 , σ 2 ).
³ ´
The unrestricted MLE’s for θ is θ̂ = µ̂1 , µ̂2 , σ̂p2 = (X̄, Ȳ , σ̂p2 ), where
 
m n
1 X X mσ̂12 + nσ̂22
σ̂p2 = (xi − x̄)2 + (yi − ȳ)2  = (pooled sample variance)
m + n i=1 j=1 m+n

and
( )
1 hX X i
sup l(θ) = l(θ̂) = (2πσ̂p2 )−(m+n)/2 exp − 2 (xi − x̄)2 + (yi − x̄)2
θ∈Θ 2σ̂p
½ ¾
m+n
= (2πσ̂p2 )−(m+n)/2 exp − .
2

Under H0 : µ1 = µ2 , the restricted MLE’s for θ0 = (µ, µ, σ 2 ) is θ̂0 = (µ̂0 , µ̂0 , σ̂02 ), where
P P
mx̄ + nȳ (xi − µ̂0 )2 + (yi − µ̂0 )2
µ̂0 = , σ̂02 = .
m+n m+n

105
Thus, ½ ¾
m+n
sup l(θ) = l(θ̂0 ) = (2πσ̂02 )−(m+n)/2 exp −
θ∈Θ0 2
Note that
X X
(m + n)σ̂02 = (xi − µ̂0 )2 + (yi − µ̂0 )2
X X
= (xi − x̄)2 + (yi − ȳ)2 + m (x̄ − µ̂0 )2 + n (ȳ − µ̂0 )2
= (m + n)σ̂p2 + m (x̄ − µ̂0 )2 + n (ȳ − µ̂0 )2
à !2 à !2
n(ȳ − x̄) m(ȳ − x̄)
= (m + n)σ̂p2
+m +n
m+n m+n
2
mn 2 nm2
= (m + n)σ̂p2 + (ȳ − x̄) + (ȳ − x̄)2
(m + n)2 (m + n)2
mn
= (m + n)σ̂p2 + (ȳ − x̄)2
(m + n)
so
mn
σ̂02 = σ̂p2 + (ȳ − x̄)2
(m + n)2

Therefore,
à 2 !(m+n)/2
supθ∈Θ0 l(θ) σ̂ p
λ(x) ≡ =
supθ∈Θ l(θ) σ̂02
 (m+n)/2
σ̂p2
=  
σ̂p2 + mn
(m+n)2
(ȳ − x̄)2
 (m+n)/2
1
=  
1+ mn
(m+n)2
(ȳ − x̄)2 /σ̂p2
à !n/2
1
= 2
1 + [(ȳ − x̄) / [σ̂p (1/m + 1/n)1/2 ]]
à !n/2
1
= 2 ,
1 + [(ȳ − x̄) / [Sp (1/m + 1/n)1/2 ]] (m + n − 2)−1

where Sp2 = m+n−2


m+n
σ̂p2 is the (unbiased) pooled variance.
Thus λ(x) < C iff
h i
|Ȳ − X̄|/ Sp (1/m + 1/n)1/2 > C, where C = tm+n−1 (1 − α2 ).

Remark 7.4.2 .

(a). The LRT test is exactly the same as the UMPU test.
(b). In this case, the exact distribution of λ(X) is known. As in the one-sample
case, we can also find its approximate distribution

−2 log λ(X) ∼approx χ21 .

106
Example. (Regression problem). One-sample and two-sample problems considered
above are special cases of the general regression problems. For a general solution, see
Example 6.21 in Shao (1999), p382.

Example. (Testing for independence in bivariate normal distribution.) Let


(X1 , Y1 ), ..., (Xn , Yn ) follow Bivariate Normal distribution with EX1 = µ1 , EY1 = µ2 ,
var(X1 ) = σ12 , var(Y1 ) = σ22 and correlation coefficient ρ. Find the LRT of size α for
testing H0 : ρ = 0 (independence) versus H1 : ρ 6= 0 (dependence).

Solution. The likelihood is


( "
1 1 X µ xi − µ1 ¶2 X µ yi − µ2 ¶2
l(θ) = n exp − +
[2πσ1 σ2 (1 − ρ2 )1/2 ] 2(1 − ρ2 ) σ1 σ2
X µ xi − µ1 ¶ µ yi − µ2 ¶¸¾
−2ρ .
σ1 σ2
Here θ = (µ1 , µ2 , σ12 , σ22 , ρ) and Θ0 = {θ : ρ = 0}.

The unrestricted MLE for θ is θ̂ = (µ̂1 , µ̂2 , σ̂12 , σ̂22 , ρ̂) where
1X 1X
µ̂1 = x̄, µ̂2 = ȳ, σ̂12 = (xi − x̄)2 , σ̂22 = (yi − ȳ)2 ,
n n
P P
(xi − x̄)(yi − ȳ) n−1 (xi − x̄)(yi − ȳ)
ρ̂ = qP P = .
(xi − x̄)2 (yi − ȳ)2 σ̂1 σ̂2
and
( "
1 1 X µ xi − x̄ ¶2 X µ yi − ȳ ¶2
l(θ̂) = n exp − +
[2πσ̂1 σ̂2 (1 − ρ̂2 )1/2 ] 2(1 − ρ̂2 ) σ̂1 σ̂2
µ
X xi − x̄ ¶ µ ¶¸¾
yi − ȳ
−2ρ̂
σ̂1 σ̂2
( )
1 1 h i
= 2 1/2 n exp − 2n(1 − ρ̂2 )
[2πσ̂1 σ̂2 (1 − ρ̂ ) ] 2(1 − ρ̂2 )
e−n
= n.
[2πσ̂1 σ̂2 (1 − ρ̂2 )1/2 ]

On the other hand, the restricted MLE for θ when ρ = 0 is θ̂0 = (µ̂1 , µ̂2 , σ̂12 , σ̂22 ) defined
as before and
( " µ ¶2 #)
1 1 X xi − x̄ X µ yi − ȳ ¶2
l(θ̂0 ) = n exp − +
[2πσ̂1 σ̂2 ] 2 σ̂1 σ̂2
½ ¾
1 2n
= exp −
[2πσ̂1 σ̂2 ]n 2
−n
e
= .
[2πσ̂1 σ̂2 ]n
Thus
supθ∈Θ0 l(θ) ³ ´n/2
λ(x) ≡ = 1 − ρ̂2 .
supθ∈Θ l(θ)

107
Thus λ(x) < C iff |ρ̂| > C1 iff |T | > C0 , where

n − 2ρ̂
T ≡ √ .
1 − ρ̂2

Since T ∼ tn−2 , thus we have C = tn−2 (1 − α2 ).

Remark 7.4.3 .

(a). The LRT test is exactly the same as the UMPU test.
(b). In this case, the exact distribution of λ(X) is known. As in the one- and
two- sample case, we can also find its approximate distribution
³ ´n/2 1
−2 log λ(X) = −2 log 1 − ρ̂2 = n log
1 − ρ̂2
à ! Ã√ !2
ρ̂2 nρ̂2 n − 2ρ̂ n
= n log 1 + 2
= 2
+ .... = √ + ....
1 − ρ̂ 1 − ρ̂ 1 − ρ̂2 n−2
∼approx χ21 .

108
Example. (LRT and UMPU are of the same form but not identical.) Let
X1 , ..., Xn be a random sample from a normal distribution with unknown parameters µ
and σ 2 .

(i) Find the LRT and UMPU test of size α for testing H0 : σ 2 ≤ σ02 versus
H1 : σ 2 > σ02 . Are they the same?
(ii) Find the LRT and UMPU test of size α for testing H0 : σ 2 = σ02 versus
H1 : σ 2 6= σ02 . Are they the same?

Solution. (i). It can be shown that the LRT is exact the same as the UMP test, that is,
we reject H0 iff (n − 1)S 2 /σ02 > χ2n−1 (1 − α).

(ii). First we shall find LRT. Let θ = (µ, σ 2 ). The likelihood is


½ ¾
2 −n/2 1X
l(θ) = f (x) = (2πσ ) exp − (xi − µ)2 /σ 2 .
2
Pn
The unrestricted MLE for θ is θ̂ = (x̄, σ̂ 2 ) where σ̂ 2 = n−1 i=1 (xi − x̄)2 , and therefore,
½ ¾ ½ ¾
1 1X 1 n
l(θ̂) = 2 n/2
exp − (xi − x̄)2 /σ̂ 2 = 2 n/2
exp − .
(2πσ̂ ) 2 (2πσ̂ ) 2

On the other hand, under H0 : σ 2 = σ02 , the restricted MLE for µ is still X̄. Denote
θ̂0 = (X̄, σ02 ), then we have
½ ¾ ( )
1 1X 2 2 1 n σ̂ 2
l(θ̂0 ) = exp − (x i − x̄) /σ 0 = exp −
(2πσ02 )n/2 2 (2πσ02 )n/2 2 σ02

Therefore,

supθ∈Θ0 l(θ) l(θ̂0 )


λ(x) ≡ =
supθ∈Θ l(θ) l(θ̂)
à !n/2 ( " #)
σ̂ 2 n σ̂ 2
= exp 1− 2
σ02 2 σ0
à (" #)!n/2
σ̂ 2 σ̂ 2
= exp 1− 2
σ02 σ0
½ ¾µ ½ ¾¶n/2
n T T
= exp exp − ,
2 n n
where T = nσ̂ 2 /σ02 ∼ χ2n−1 under H0 .
Note that λ(x) first strictly increases, reaches the maximum and then strictly de-
creases. Thus λ(x) < C iff T < C1 or T > C2 , where C1 and C2 satisfy
½ ¾ ½ ¾
C1 C1 C2 C2
exp − = exp −
n n n n
or equivalently
C1n exp {−C1 } = C2n exp {−C2 } .
If we require the test to be of size α, then Pθ0 (C1 < T < C2 ) = 1 − α.

109
In summary, the LRT test of size α is to reject H0 iff T < C1 or T > C2 , where C1
and C2 are determined from the following two equations
Z C2
χ2n−1 (v)dv = 1 −α (4.5)
C1
C1n e−C1 = C2n e−C2 . (4.6)

Next, we shall look at the UMPU test. Recall that the UMPU test of size α is to reject
H0 iff T < D1 or T > D2 , where D1 and D2 are determined from Eθ0 [1−ψ(T )] = 1−α and
Eθ0 T [1−ψ(T )] = (1−α)Eθ0 T (or Eθ0 [T −ET ][1−ψ(T )] = Eθ0 [T −(n−1)][1−ψ(T )] = 0),
that is,
Z D2
χ2n−1 (v)dv = 1 − α
D1
Z D2
[v − (n − 1)]χ2n−1 (v)dv = 0.
D1

Note that the second side condition can be written as


Z D2
0 = [v − (n − 1)]χ2n−1 (v)dv
D1
Z D2
1 n−1
−1 v
= ³
n−1
´ n−1 [v − (n − 1)]v 2 e− 2 dv
Γ 2
2 2 D1

1 h i¯D2
n−1 v ¯
= ³
n−1
´ n−1 −2v 2 e− 2 ¯
Γ 2 2 D1
2
· ¸
−2 n−1

D2 n−1

D1
= ³
n−1
´ n−1 D2 2
e 2 − D1 2
e 2

Γ 2
2 2

n−1 D2 n−1 D1
which is equivalent to D2 2 e− 2 = D1 2 e− 2 or

D2n−1 e−D2 = D1n−1 e−D1 .

In summary, the UMPU test of size α is to reject H0 iff T < D1 or T > D2 , where D1
and D2 are determined from
Z D2
χ2n−1 (v)dv = 1 − α (4.7)
D1
D1n−1 e−D1 = D2n−1 e−D2 . (4.8)

Comparing (4.5)-(4.6) and (4.7)-(4.8), we see that the LRT and UMPU test can not
be the same unless the size α = 1. This is because, if C1 = D1 and C2 = D2 , then the
ratio of (4.6) and (4.8) results in C1 = C2 = D1 = D2 , which in turn implies from (4.5)
or (4.7) that 1 − α = 0.

110
7.5 Bad performance of LRT
Like MLE in estimation, LRT has good large sample asymptotic properties. However, for
fixed sample, it can have bad performance.
Example. (X.R. Chen, p338.) Let X take on 5 values 0, ±1, ±2 with distribution
1 − θ1
Pθ (X = 0) = α
1−α
µ ¶
1 1 − θ1
Pθ (X = 1) = −α
2 1−α
Pθ (X = −1) = Pθ (X = 1)
Pθ (X = 2) = θ1 θ2
Pθ (X = −2) = θ1 (1 − θ2 )
1
where Θ = ((θ1 , θ2 ) : 0 ≤ θ1 ≤ α, 0 ≤ θ2 ≤ 1) and 0 < α < 2
and α is fixed. Consider
testing
1
H0 : θ1 = α, θ2 = .
2
Find the LRT of size α and show that it is biased.
Solution. The likelihood is l(θ) = l(θ1 , θ2 ) = Pθ (X = x) for x = 0, ±1, ±2. The
unrestricted MLE for θ is θ̂ = (θ̂1 , θ̂2 ),
θ̂ = (θ̂1 , θ̂2 ) = (θ̂1 = 0, θ̂2 free) if x = 0
(θ̂1 = 0, θ̂2 free) if x = 1
(θ̂1 = 0, θ̂2 free) if x = −1
(θ̂1 = α, θ̂2 = 1) if x = 2
(θ̂1 = α, θ̂2 = 0) if x = −2,
and hence
α
l(θ̂) = if x = 0
1−α
1/2 − α
if x = 1
1−α
1/2 − α
if x = −1
1−α
α if x = 2
α if x = −2

α
= if x = 0,
1−α
1/2 − α
if x = ±1,
1−α
α if x = ±2.

On the other hand, for θ0 = (θ1 = α, θ2 = 1/2), we have


l(θ0 ) = Pθ0 (X = x) = α if x = 0
1/2 − α if x = ±1
α/2 if x = ±2.

111
Thus
supθ∈Θ0 l(θ) l(θ0 )
λ(x) ≡ =
supθ∈Θ l(θ) l(θ̂)
= 1−α if x = 0, ±1
1/2 if x = ±2

LRT of size α has the rejection region λ(x) < C.


From the plot below, it is easy to see that 1/2 < C ≤ 1 − α.

(Plot of λ(x) vursus x)

However, the power of the test at any θ 6= θ0 becomes

P ower(θ) = Pθ (λ(X) < C) = Pθ (λ(X) = 1/2)


= Pθ (X = 2 or − 2) = θ1 θ2 + θ1 (1 − θ2 )
= θ1 ≤ α.

Therefore, the test is actually biased since P ower(θ) < α. In other words, the LRT test
is not even as good as the trivial test φ(x) = α.

112
7.6 Asymptotic χ2 approximation of the LRT.
7.6.1 Why asymptotics?
As seen earlier, sometimes it is very difficult or impossible to find the exact distribution
of λ(X). So approximations in these cases become necessary.
P
Example. Let θ = (p1 , ..., p5 ), where 5i=1 pi = 1 and pi ≥ 0, i = 1, ..., 5. Suppose
X1 , ..., Xn are iid discrete r.v.’s and Pθ (Xi = j) = pj , j = 1, ..., 5. Find an LRT of size α
for testing

H0 : p1 = p2 = p3 and p4 = p5 versus H1 : H0 is not true.

Solution. Let N1 , · · · , N5 be the total number of observations in the five cells. Therefore,
P5
i=1 N
Pi
= n. Note that the joint distribution of (N1 , · · · , N5 ) is multinomial(n, p1 , · · · , p5 )
with pi = 1, but the joint distribution of (X1 , · · · , Xn ) is not (its form will be l(θ))
given below.
Given the observation x = (x1 , ..., xn ), the likelihood is

l(θ) = Pθ (X = x) = pN N5
1 ...p5 .
1

where θ = (p1 , ..., p5 ) and


5
X
log l(θ) = Ni log pi .
i=1

Define Lagrange multiplier


5
à 5 !
X X
L(θ, λ) = Ni log pi − λ pi − 1 .
i=1 i=1

Set
∂L(θ, λ)
= Ni /pi − λ = 0, i = 1, ..., 5 (6.9)
∂pi
X5
∂L(θ, λ)
=1− pi = 0. (6.10)
∂λ i=1
P P
From (6.9), Ni /λ = pi , so that 5i=1 pi = 5i=1 Ni /λ = n/λ = 1. That is λ = n. Therefore,
the solutions are p̂i = Ni /n. That is, the MLE of θ = (p1 , ..., p5 ) is

θ̂ ≡ (p̂1 , ..., p̂5 ) = (N1 /n, ..., N5 /n)

and
l(θ̂) = p̂N N5 N1 N5
1 ...p̂5 = (N1 /n) ...(N5 /n) .
1

On the other hand, under H0 : p1 = p2 = p3 , p4 = p5 , let θ0 = (p1 , p1 , p1 , p4 , p4 ), we


have
l(θ) = pN
1
1 +N2 +N3 N4 +N5
p4 .
log l(θ0 ) = N1 + N2 + N3 log p1 + (N4 + N5 ) log p4 .

113
Define Lagrange multiplier

L(θ0 , λ) = (N1 + N2 + N3 ) log p1 + (N4 + N5 ) log p4 − λ(3p1 + 2p4 − 1).

Set
∂L(θ, λ)
= (N1 + N2 + N3 )/p1 − 3λ = 0, (6.11)
∂p1
∂L(θ, λ)
= (N4 + N5 )/p4 − 2λ = 0, (6.12)
∂p4
∂L(θ, λ)
= 3p1 + 2p4 − 1 = 0. (6.13)
∂λ
From (6.11), (6.12), we get 3p1 = (N1 + N2 + N3 )/λ and 2p4 = (N4 + N5 )/λ, which
by (6.13) leads to λ = N . Therefore, the solutions are p̂1 = (N1 + N2 + N3 )/(3n) and
p̂4 = (N4 + N5 )/(2n). That is, the MLE of θ0 is

θ̂0 ≡ (p̂10 , p̂10 , p̂10 , p̂40 , p̂40 )

and µ ¶N1 +N2 +N3 µ ¶N4 +N5


N1 + N2 + N3 N4 + N5
l(θ̂0 ) =
3n 2n
Therefore,
³ ´N1 +N2 +N3 ³ ´N4 +N5
N1 +N2 +N3 N4 +N5
l(θ̂0 ) 3n 2n
λ(X) = = ³ ´N1 ³ ´N5
l(θ̂) N1
... N5
n n
³ ´N1 +N2 +N3 ³ ´N4 +N5
N1 +N2 +N3 N4 +N5
3 2
= .
N1N1 ...N5N5

We reject H0 if λ(X) < C, where C is determined by Pθ0 (λ(X) < C) = α.

Remark 7.6.1 Here, the exact distribution of λ(X) (needed for determined the critical
value C) is impossible to find out. Even if we wish to use Monte Carlo simulation to do
this, we can not generate (N1 , ..., N5 ) under H0 since the pi ’s are not specified. However,
we shall see later that
−2 log λ(X) ∼approx χ23
since under H0 , we have three constraints: p1 = p2 , p2 = p3 and p4 = p5 .

A slightly different example is given next, where we shall try to find the asymptotic
distribution of LRT.
P
Example. Let θ = (p1 , ..., pr ), where ri=1 pi = 1 and pi ≥ 0, i = 1, ..., r. Suppose
X1 , ..., Xn are iid discrete r.v.’s and Pθ (Xi = j) = pi , j = 1, ..., r. Find an LRT of size α
for testing

H0 : p1 = pi0 i = 1, ..., r.
H1 : H0 is not true,

114
where θ0 = (p10 , ..., pr0 ) is known probability vector.

Solution. Simlarly to the last question, we can derive


µ ¶N1 µ ¶Nr
l(θ̂0 ) pN Nr
10 ...pr0
1
np10 npr0
λ(X) = =³ ´N1 ³ ´Nr = ...
l(θ̂) N1
... Nr N1 Nr
n n

We reject H0 if λ(X) < C, where C is determined by Pθ0 (λ(X) < C) = α.


Again, the exact distribution of λ(X) is very difficult (if not impossible) to find out.
However, we now show directly that

−2 log λ(X) ∼approx χ2r−1 .

To prove this, denote ∆i = Ninp


−npi0
i0
. Since Ni ∼ Bin(n, pi0 ), we have that E∆i = 0 and
V ar(∆i ) = c/n → 0, implying that ∆i →p 0. Thus,
r µ ¶ r
à !
X npi0 X Ni
−2 log λ(X) = −2 Ni log = 2 [npi0 + (Ni − npi0 )] log
i=1 Ni i=1 npi0
r
à !
X Ni − npi0 Ni − npi0
= 2 npi0 [1 + ] log 1 +
i=1 npi0 npi0
r
X
= 2 npi0 [1 + ∆i ] log (1 + ∆i )
i=1
r
à !
X ∆2 ∆3
= 2 npi0 [1 + ∆i ] ∆i − i + i − higher order term
i=1 2 3
r
X µ ¶
∆i
= 2 npi0 ∆i [1 + ∆i ] 1 − + higher order term
i=1 2
Xr · ¸
1
= 2 npi0 ∆i 1 + ∆i + higher order term
i=1 2
r
X r
X
= 2 npi0 ∆i + npi0 ∆2i + ...
i=1 i=1
r
X r
X
= 2 (Ni − npi0 ) + npi0 ∆2i + ...
i=1 i=1
à r r
! r
X X X
= 2 Ni − n pi0 + npi0 ∆2i + ...
i=1 i=1 i=1
r
X
= 2(n − n) + npi0 ∆2i + higher order term
i=1
r
X (Ni − npi0 )2
= + higher order term
i=1 npi0

where the dominant term above will be shown to follow a χ2r−1 .

Remark. Note that in this example, the exact distribution of LRT can be found by
Monte Carlo simulation, unlike the last example.

115
7.6.2 Review: Asymptotic properties of MLE
The asymptotic distribution of the LRT critically depends on the properties of MLE’s.
So first, we shall list some regularity conditions and results concerning MLE.

Theorem 7.6.1 Let F = {Fθ : θ ∈ Θ}, where Θ is an open set in Rk . Assume that
3
(1). For each θ ∈ Θ, ∂ log fθ (x)
∂θ3
exists for all x, and also there exists a function
H(x) ≥ 0 (possibly depending on θ) such that for θ0 ∈ N (θ, ²) = {θ0 : ||θ0 −θ|| <
²}, ° °
° ∂ 3 log f 0 (x) °
° θ °
° ° ≤ H(x), Eθ0 H(X1 ) < ∞.
° ∂θ 03 °

(2). For gθ (x) = fθ (x) or gθ (x) = ∂ log fθ (x)/∂θ,


∂ Z Z
∂gθ (x)
gθ (x)dx = dx
∂θ ∂θ
(3). For each θ ∈ Θ,
"Ã !τ Ã !#
∂ log fθ (X) ∂ log fθ (X)
0 < IX1 (θ) = Eθ < ∞.
∂θ ∂θ

Let X1 , ..., Xn ∼ Fθ . Then with probability 1, the likelihood equations admit a sequence of
solutions {θ̂n } satisfying
1). Strong consistency: θ̂n → θ with probability 1.
2). Asymptotic efficiency: θ̂n → N (θ, [nIX1 (θ)]−1 ).

Remark 7.6.2 .
∂ log fθ (x)
(1). Condition 1 ensures that ∂θ
, for any x, has a Taylor’s expansion as
a function of θ.
(2). Condition 2 means that
Z Z
∂ log fθ (x)
fθ (x)dx, dx
∂θ
can be differentiated with respect to θ under the integral sign. That is, the
integration and differentiation can be interchanged.
(3). A sufficient condition for Condition 2 is the following:
For each θ ∈ Θ, there exists functions g(x), h(x), and H(x) (possibly depend-
ing on θ) such that for θ0 ∈ N (θ, ²) = {θ0 : ||θ0 − θ|| < ²},
° ° ° ° ° °
° ∂f 0 (x) ° ° ∂ 2 f 0 (x) ° ° ∂ 3 log f 0 (x) °
° θ ° ° θ ° ° θ °
° ° ≤ g(x), ° ° ≤ h(x), ° ° ≤ H(x),
° ∂θ 0 ° ° ∂θ 0 2 ° ° ∂θ 0 3 °

hold for all x and


Z Z
g(x)dx < ∞, h(x)dx < ∞, Eθ0 H(X1 ) < ∞.

∂ log fθ (x)
(4). Condition 3 ensures that the covariance matrix of ∂θ
is finite and
positive definite.

116
7.6.3 Formulation of the problem.
Without any constraints, all the k components of parameter θ = (θ1 , ..., θk ) ∈ Θ ∈ Rk are
free to change, and so it has k degrees of freedom.
For the null hypothesis H0 : θ ∈ Θ0 ∈ Rk , suppose that we have r constraints on the
parameter θ, then only k − r components of θ = (θ1 , ..., θk ) are free to change, and so it
has k − r degrees of freedom. Without loss of generality, we denote these k − r dimension
parameter by ν = (ν1 , ..., νk−r ).

Based on the above arguments, we assume that Θ0 ∈ Rk is determined by

H0 : θ = g(ν),

where ν is a (k − r)-vector of unknown parameters and

g : Rk−r → Rk

is a continuously differentiable function.

We can also write H0 : θ = g(ν) as

θ1 = g1 (ν1 , ..., νk−r ),

θ2 = g2 (ν1 , ..., νk−r ),


..........
θk = gk (ν1 , ..., νk−r ).

Example. Find the transformation g for the following tests.

(a). θ = (θ1 , θ2 ) and H0 : θ1 = a0 , where a0 is known.


(b). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 = b0 , where a0 is known.
(c). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 = θ2 .
(d). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 − θ2 = a0 and θ1 + θ2 = b0 , where a0 and b0
are both known.

Solution.
(a). Here, Θ = R2 , k = 2 and r = 1, and θ2 is the only free changing parameter. Then
we can take ν = θ2 ∈ Rk−r = R1 , and

θ1 = a0 ≡ g1 (θ2 ) = g1 (ν),
θ2 = θ2 ≡ g2 (θ2 ) = g2 (ν)

(b). Here, Θ = R3 , k = 3 and r = 1, and θ2 , θ3 are the two free changing parameters.
Then we can take ν = (θ2 , θ3 ) ∈ Rk−r = R2 , and

θ1 = b0 ≡ g1 (θ2 , θ3 ) = g1 (ν),
θ2 = θ2 ≡ g2 (θ2 , θ3 ) = g2 (ν),
θ3 = θ3 ≡ g3 (θ2 , θ3 ) = g3 (ν).

117
(c). Here, Θ = R3 , k = 3 and r = 1, and θ2 , θ3 are the two free changing parameters.
Then we can take ν = (θ2 , θ3 ) ∈ Rk−r = R2 , and

θ1 = θ2 ≡ g1 (θ2 , θ3 ) = g1 (ν),
θ2 = θ2 ≡ g2 (θ2 , θ3 ) = g2 (ν),
θ3 = θ3 ≡ g3 (θ2 , θ3 ) = g3 (ν).

(d). Here, Θ = R3 , k = 3 and r = 2, and θ3 is the two free changing parameters.


Then we can take ν = (θ3 ) ∈ Rk−r = R, and

b0 + a0
θ1 = ≡ g1 (θ3 ) = g1 (ν),
2
b0 − a 0
θ2 = ≡ g2 (θ3 ) = g2 (ν),
2
θ3 = θ3 ≡ g3 (θ3 ) = g3 (ν).

Remark 7.6.3 .

(i). As can be seen from the examples,

r = number of original (independent) constraints.

(ii). The above theorem can not deal with the composite null hypothesis like
H0 : θ1 ≤ 0.

7.6.4 Asymptotic χ2 approximation to LRT

Theorem 7.6.2 Assume that the regularity conditions in the MLE theorem hold. Assume
further that the hypothesis H0 : θ = g(ν) ∈ Rk , where ν is a (k − r)-vector of unknown
parameters and g : Rk−r → Rk is a continuously differentiable function.

(i). Under H0 , we have −2 log λ(X) ∼d χ2r .


(ii). An LRT of asymptotic size α is given by the rejection region

S = {X : −2 log λ(X) > χ2r (1 − α)},

where χ2r (1 − α) is the (1 − α)th quantile of χ2r , that is, P (χ2r ≤ χ2r (1 − α)) =
1 − α.

Proof. First
n
Y
l(θ) = fθ (X) = fθ (Xi )
i=1
n
X
log l(θ) = log fθ (Xi )
i=1
n
∂ log l(θ) X ∂ log fθ (Xi )
Sn (θ) ≡ =
∂θ i=1 ∂θ

118
à n n
!
X ∂ log fθ (Xi ) X ∂ log fθ (Xi )
= , ......,
i=1 ∂θ1 i=1 ∂θk
(Score function)
∂ 2 log l(θ)
Sn0 (θ) =
∂θτ ∂θ
n
X ∂ 2 log fθ (Xi ) n
X ∂ 2 log fθ (Xi )
= , ......,
i=1 ∂ 2 θ1 i=1 ∂θ1 ∂θk
..............................
..............................
n n
X ∂ 2 log fθ (Xi ) X ∂ 2 log fθ (Xi )
, ......,
i=1 ∂θk ∂θ1 i=1 ∂θk ∂θk
"Ã !τ Ã !#
∂ log l(θ) ∂ log l(θ)
In (θ) = E
∂θ ∂θ
" #
2
∂ log l(θ)
= −E τ
= −E[Sn0 (θ)]
∂θ ∂θ
(Fisher information)

From ESn (θ) = 0, var[Sn (θ)] = nI(θ), E[Sn0 (θ)] = −nI(θ), and together with CLT for
multivariate random vector, we get
n−1/2 Sn (θ) ∼approx N (0, I(θ)) ,
or equivalently,
n−1/2 Sn (θ)I −1/2 (θ) ∼approx N (0, Ik ) ,
which implies
n−1 Sn (θ)I −1 (θ)[Sn (θ)]τ ∼approx χ2k .
(i). Denote the MLE of θ by θ̂, such that supθ∈Θ l(θ) = l(θ̂). Clearly, it satisfies
Sn (θ̂) = 0. By Taylor’s expansion,
0 = Sn (θ̂) = Sn (θ) + (θ̂ − θ)Sn0 (θ) + ...
= Sn (θ) − (θ̂ − θ)[nI(θ)] + ...
So
n1/2 (θ̂ − θ)I(θ) = n−1/2 Sn (θ) + op (1) ∼approx N (0, I(θ)) ,
or
n1/2 (θ̂ − θ)I 1/2 (θ) = n−1/2 Sn (θ)I −1/2 (θ) + op (1) ∼approx N (0, Ik ) ,
which implies
n(θ̂ − θ)I(θ)(θ̂ − θ)τ ∼approx χ2k .

Now
" #
l(θ) h i
−2 log = −2 log l(θ) − log l(θ̂)
l(θ̂)
· ¸
1
= −2 Sn (θ̂)(θ − θ̂)τ + (θ − θ̂)Sn0 (θ̂)(θ − θ̂)τ + ......
2
τ
= n(θ̂ − θ)I(θ)(θ̂ − θ) + op (1)
∼approx χ2k .

119
Next under H0 : θ = g(ν), we define
à !τ
∂L(g(ν)) ∂L(g(ν)) ∂g(ν)
S̃n (ν) ≡ =
∂ν ∂θ ∂ν τ
à !τ
∂ log l(g(ν)) ∂ log l(g(ν)) ∂g(ν)
= =
∂ν ∂θ ∂ν τ
= Sn (θ)D(ν) where D(ν) = ∂g(ν)∂ν τ
(Score function for ν)
"Ã !τ Ã !#
˜ ∂ log l(g(ν)) ∂ log l(g(ν))
I(ν) = E
∂ν ∂ν
à !τ à !
∂g(ν) ∂g(ν)
= I(θ)
∂ν ∂ν
τ
= [D(ν)] I(θ)D(ν)
(Fisher information for ν)

We denote the MLE of ν under H0 by ν̂, such that supθ∈Θ0 l(θ) = supν l(g(ν)) =
l(g(ν̂)). Clearly, it satisfies S̃n (ν̂) = 0. By Taylor’s expansion,

0 = S̃n (ν̂) = S̃n (ν) + (ν̂ − ν)S̃n0 (ν) + ...


˜
= S̃n (ν) − (ν̂ − ν)[nI(ν)] + ...

So
³ ´
˜ = n−1/2 S̃n (ν) + op (1) ∼approx N 0, I(ν)
n1/2 (ν̂ − ν)I(ν) ˜ ,

or

n1/2 (ν̂ − ν)I˜1/2 (ν) = n−1/2 S̃n (ν)I˜−1/2 (ν) + op (1) ∼approx N (0, Ik−r ) ,

which implies
˜
n(ν̂ − ν)I(ν)(ν̂ − ν)τ ∼approx χ2k−r .

Therefore, under H0 ,
" # " #
l(θ) l(g(ν))
−2 log = −2 log = −2 [log l(g(ν)) − log l(g(ν̂))]
l(g(ν̂)) l(g(ν̂))
· ¸
τ 1 0 τ
= −2 S̃n (ν̂)(ν − ν̂) + (ν − ν̂)S̃n (ν̂)(ν − ν̂) + ......
2
= ˜ τ
n(ν̂ − ν)I(ν)(ν − ν̂) + op (1)
∼approx χ2k−r .

From the above, we finally get


supθ∈Θ0 l(θ) l(θ)/l(g(ν̂)
−2 log λ(X) = −2 log = 2 log
supθ∈Θ l(θ) l(θ)/l(θ̂)

120
˜
= n(θ̂ − θ)I(θ)(θ̂ − θ)τ − n(ν̂ − ν)I(ν)(ν̂ − ν)τ + op (1)
= n−1 Sn (θ)I −1 (θ)[Sn (θ)]τ − n−1 S̃n (ν)I˜−1 (θ)[S̃n (ν)]τ + op (1)
= n−1 Sn (θ)I −1 (θ)[Sn (θ)]τ − n−1 Sn (θ)D(ν)I˜−1 (ν)[Sn (ν)D(ν)]τ + op (1)
n o
= n−1 Sn (θ) I −1 (θ) − D(ν)I˜−1 (ν)[D(ν)]τ [Sn (θ)]τ + op (1)
n o
= n−1 Sn (θ)I −1/2 (θ) Ik − I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
[Sn (θ)I −1/2 (θ)]τ + op (1)
= n−1 Sn (θ)I −1/2 (θ)A[Sn (θ)I −1/2 (θ)]τ + op (1),
where A = {Ik − H} and H = I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ), where A and H are both
symmetric. Now
H2 = HH = I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
= I 1/2 (θ)D(ν)I˜−1 (ν)I(ν)
˜ I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
= I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
= H.
Hence,
A2 = (I − H)2 = I − 2H + H 2 = I − 2H + H = I − H = A.
Therefore, A (and H) is idempotent (or a projection matrix) so that its eigenvalues are
either 1’s or 0’s, and its rank (= total number of eigenvalue 1’s) is
h i
rank(A) = tr(A) = k − tr I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
h i
= k − tr I˜−1 (ν)[D(ν)]τ I(θ)D(ν)
h i
= k − tr I˜−1 (ν)I(ν)
˜
= k − tr [Ik−r ]
= r.
Therefore, there exists an orthonormal matrix B such that
à !
Ir 0
A=B Bτ .
0 0
Thus,
−2 log λ(X) = n−1 Sn (θ)I −1/2 (θ)A[Sn (θ)I −1/2 (θ)]τ + op (1)
à !
Ir 0
∼approx N (0, Ik )B B τ [N (0, Ik )]τ
0 0
à !
Ir 0
= [N (0, Ik )B] [N (0, Ik )B]τ
0 0
(since [N (0, Ik )B] ∼ N (0, Ik ))
à !
Ir 0
= [N (0, Ik )] [N (0, Ik )]τ
0 0
à !
τ Ir 0
= (Z1 , ..., Zk ) (Z1 , ..., Zk ) where (Z1 , ..., Zk )τ ∼ N (0, Ik )
0 0
= Z12 + ... + Zr2
∼ χ2r .

121
(ii). Following from part (i),
³ ´ ³ ´
Pθ0 −2 log λ(X) > χ2r (1 − α) ≈ Pθ0 χ2r > χ2r (1 − α) = α.

Remark 7.6.4 The power calculation by χ2 approximation can also be done by using
non-central χ2 approximation under the contiguous alternative. See Xiru Chen, p336.

122
7.7 Wald’s and Rao’s tests and their relation with
LRT
Two other tests closely related to the chi-squared tests are Wald’s test and Rao’s test.
First we note that the hypothesis

H0 : θ = g(ν),

where ν is a (k − r)-vector of unknown parameters and g : Rk−r → Rk is a continuously


differentiable function, is equivalent to a set of r ≤ k equations:

H0 : R(θ) = 0,

where R : Rk → Rr is a continuously differentiable function.

Example. Find the transformation R for the following tests (considered before).
(a). θ = (θ1 , θ2 ) and H0 : θ1 = a0 , where a0 is known.
(b). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 = b0 , where a0 is known.
(c). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 = θ2 .
(d). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 − θ2 = a0 and θ1 + θ2 = b0 , where a0 and b0
are both known.

Solution.
(a). Here, Θ = R2 , k = 2 and r = 1. Then we can take R1 (θ) = θ1 − a0 .
(b). Here, Θ = R3 , k = 3 and r = 1. Then we can take R1 (θ) = θ1 − b0 .
(c). Here, Θ = R3 , k = 3 and r = 1. Then we can take R1 (θ) = θ1 − θ2 .
(d). Here, Θ = R3 , k = 3 and r = 2. Then we can take

R1 (θ) = θ1 − θ2 − a0 , R2 (θ) = θ1 + θ2 − b0 .

Under H0 : R(θ) = 0, then R(θ̂)τ should also be close to 0. This leads to the Wald’s
test.
Definition 7.7.1 (Wald’s test, 1943) Let
n o−1
Wn = R(θ̂)τ R0 (θ̂)[In (θ̂)]−1 [R0 (θ̂)]τ [R(θ̂)],

where θ̂ is the MLE of θ, In (θ) = nI(θ), and R0 (θ) = ∂R(θ)


∂θ
. Wald’s test rejects H0 iff
Wn > C (since under H0 : R(θ) = 0, R(θ̂) tends to be small, so is Wn ).

Similarly, under H0 : θ = g(ν), then Sn (g(ν̂)) should also be close to 0. This leads to
the Rao’s test.
Definition 7.7.2 (Rao’s score test, 1947) Let
n o−1
Rn = Sn (g(ν̂)) In (g(ν̂))]−1 [Sn (g(ν̂))]τ ,

where everything has been defined before. Rao’s test rejects H0 iff Rn > C.

123
Theorem 7.7.1 Assume the regularity conditions as in the last theorem hold.

(i). Under H0 , we have

Wn ∼d χ2r , and Rn ∼d χ2r .

(ii). The confidence regions by using Wald’s test or Rao’s test with critical
value C = χ2r (1 − α) has an asymptotic size α.

Proof. The proof is simple. See, e.g., Shao (1998, page 836). We only prove part (i)
here. Note that R(θ) is a r × 1 vector. Now under H0 , we have
h i
0 = R(θ) = R(θ̂)r×1 + R0 (θ̂)r×k (θ − θ̂)τ + ...
r×k

Thus
h i h i
R(θ̂)r×1 = R0 (θ̂)r×k (θ̂ − θ)τ + ... ≈ R0 (θ)r×k (θ̂ − θ)τ + ...
k×1 k×1
³ ´
∼approx N 0, R0 (θ)In−1 (θ)[R (θ)]τ ,
0
where In (θ) = nI(θ)

Thus, h i−1
R(θ̂)τ R0 (θ)In−1 (θ)[R0 (θ)]τ [R(θ̂)] ∼approx χ2r ,
and hence h i−1
Wn = R(θ̂)τ R0 (θ̂)In−1 (θ̂)[R0 (θ̂)]τ [R(θ̂)] ∼approx χ2r ,

where we have used the fact that θ̂ →prob θ and that

R0 (θ̂)In−1 (θ̂)[R0 (θ̂)]τ →prob R0 (θ)In−1 (θ)[R0 (θ)]τ .

Remark 7.7.1

1. The LRT, Wald’s and Rao’s score tests are all asymptotically χ2r , and thus asymp-
totically equivalent.

2. Note that LRT requires computing both θ̂ and ν̂, Wald’s test requires computing θ̂,
and Rao’s score test requires computing ν̂. One can use which ever is easiest.

7.8 LRT, Wald’s and Rao’s tests in independent but


non-identically distributed r.v.
The major theorems about asymptotic distributions for LRT, Wald’s and Rao’s tests all
deal with iid r.v.’s. They are also true for independent but non-identically r.v.’s, e.g.
GLM. See Theorem 6.7 and Example 6.22 of Shao (1999), p387, for example.

124
7.9 χ2-tests for multinominal distribution
Many tests originate from testing the probabilities in the multinomial distribution. Typ-
ically, their asymptotic distribution is χ2 .

7.9.1 Prelimanaries with the multinominal distribution


Consider a sequence of n independent trials with k possible outcomes for each trial. Let
pi > 0 be the cell probability of occurrences of the the jth outcome in n trials. Then
X = (X1 , ..., Xk ) has the multinomial distribution with parameter p = (p1 , ..., pk ). Let

ξi = (0, ...0, 1, 0, ..., 0)1×k , i = 1, ..., n.

where the single nonzero component 1 is located in the jth position if the ith trial yields
the jth outcome. Then ξ1 , ..., ξn are iid random vector of dim k and
n
X n
1X ¯
X = (X1 , ..., Xk ) = ξi , X/n = ξi = ξ.
i=1 n i=1

The joint distribution of X is


n!
Pp (X = x) = p1 x1 ...pk xk .
x1 !...xk !
(Remark: A more familiar notation is to use (N1 , ..., Nk ) = (n1 , ..., nk ), rather than
(X1 , ..., Xk ) = (x1 , ..., xk ).)

Pn
Theorem 7.9.1 The m.g.f. of X = i=1 ξi is
n
Y ³ ´n
MX (t) = Mξi (t) = p1 et1 + ... + pk etk .
i=1

with V ar(Xi ) = npi (1 − pi ) and Cov(Xi , Xj ) = −npi pj , i 6= j.

Proof. The m.g.f. of ξ1 is

Mξ1 (t) = φ(t1 , ..., tk ) = Eet.ξ1 = p1 et1 + ... + pk etk .


Pn
Thus the m.g.f. of X = i=1 ξi is
n
Y ³ ´n
MX (t) = Mξi (t) = p1 et1 + ... + pk etk .
i=1

Therefore,
∂MX (t) ³ ´n−1
= npi eti p1 et1 + ... + pk etk ,
∂ti
∂ 2 MX (t) ³ ´ ³ ´
2 ti t1 tk n−2 ti t1 tk n−2
= n(n − 1)pi e p1 e + ... + p k e + np i e p 1 e + ... + pk e ,
∂t2i
∂ 2 MX (t) ³ ´n−2
= n(n − 1)pi pj eti etj p1 et1 + ... + pk etk , i 6= j,
∂ti ∂tj

125
from which we get
¯
∂MX (t) ¯¯
EXi = ¯ = npi ,
∂ti ¯t=0
¯
∂ 2 MX (t) ¯¯
E(Xi2 ) = 2
¯ = n(n − 1)p2i + npi , i 6= j,
∂ti ¯
t=0
V ar(Xi ) = E(Xi2 ) − E(Xi )2 = n(n − 1)p2i + npi − n2 p2i = n[pi − p2i ],
¯
∂ 2 MX (t) ¯¯
E(Xi Xj ) = ¯ = n(n − 1)pi pj , i 6= j,
∂ti ∂tj ¯t=0
Cov(Xi , Xj ) = E(Xi Xj ) − E(Xi )E(Xj ) = n(n − 1)pi pj − n2 pi pj = −npi pj .
The proof is complete.

By the Central Limit Theorem, we have


√ √ ³ ´
Zn (p) = n (X/n − p) = n ξ¯ − p ∼approx N (0, Σ) ,

where Σ = V ar (X/n)k×k = (σij )k×k is a symmetric k × k covariance matrix with


σii = pi (1 − pi ), σij = −pi pj , i 6= j.

7.9.2 Tests for multinomial distribution


Consider the problem of testing
H0 : p = p0 versus H1 : p 6= p0 ,
where p0 = (p01 , ..., p0k ) is a known vector of cell probabilities. We shall introduce some
popular tests concerning p.

Definition.
(i). The χ2 -statistic (or Pearson’s statistic) is defined by
k k
2
X (Xi − np0i )2 X (Xi − np0i )2 1
χ = =
i=1 np0i i=1 n p0i

= kZn (p0 )1×k D(p0 )k×k k2


where for p = (p1 , ..., pk ), we define
−1/2 −1/2
D(p) = diag(p1 , ..., pk )
√ √
Zn (p) = n (X/n − p) = n (X1 /n − p1 , ..., Xn /n − pk ) .

(ii). The modified χ2 -statistic is defined by


 2
k k
X (Xi − np0i )2 X (Xi − np0i )2 1
χ̃2 = = q 
i=1 Xj i=1 n Xi /n
2
= kZn (p0 )1×k D(X/n)k×k k .
In other words, we replace p0 by its unbiased estimator X/n (under H0 ).

126
Remark 7.9.1
(1) We shall show below that Pearson’s χ2 is asymptotically χ2 distributed
with k − 1 degrees of freedom (so is the modified Pearson’s χ̃2 by the Slutsky’s
theorem). To illustrate ideas, let us look at the special case k = 2 for instance.
Then
2
X (Xi − np0i )2
χ2 =
i=1 np0i
(X1 − np01 )2 (X2 − np02 )2
= +
np01 np02
2
(X1 − np01 ) ([n − X1 ] − n[1 − p01 ])2
= +
np01 n[1 − p01 ]
2
(X1 − np01 ) (X1 − np01 ])2
= +
np01 n[1 − p01 ]
à !
2 1 1
= (X1 − np01 ) +
np01 n[1 − p01 ]
(X1 − np01 )2
=
np01 [1 − p01 ]
 2
X1 − np01
= q 
np01 [1 − p01 ]
∼ approx χ21 = χ22−1 .

(2) The above example also shows why we use np0i in the denominator of χ2
rather than the variance np0i (1 − p0i ).

Theorem 7.9.2 Both χ2 and modified χ2 -statistics are asymptotically χ2k−1 . That is,

χ2 ∼asymp χ2k−1 , χ̃2 ∼asymp χ2k−1 .

Proof. We shall prove the theorem using two very different methods.

Method 1. Note that we can write

χ2 = YY0 = Y12 + ... + Yk2 ,

where à !
√ X1 − np1 Xk − npk
Y = [X − EX]/[ EX] = √ , ..., √ .
np1 npk
So it suffices to show that Y is asymptotically multi-normal.
P P
First of all, the charateristic function (c.f.) of X = ni=1 ξi = ni=1 ξi is
³ ´n
MX (it) = Eeit1 X1 +...+itk Xk = p1 eit1 + ... + pk eitk .

127
√ ³ ´
Thus the c.f. of Y = [X − EX]/[ EX] = X√1 −np
np1
1
, ..., Xk −npk

npk
is
( Ã ! Ã !)
it.Y X1 − np1 Xk − npk
Ee = E exp it1 √ + ... + itk √
np1 npk
( )
√ √ t1 tk
= exp {− (it1 np1 + ... + itk npk )} E exp iX1 √ + ... + iXk √
np1 npk
  ( )
 k
X √  t1 tk
= exp −i tj npj E exp iX1 √ + ... + iXk √
  np1 npk
j=1
  ( )n

√ X k
√  X k
itj 
= exp −i n tj pj   pj exp √
j=1 j=1 npj

So the cumulant generating function of Y is

κY (t) = log Eeit.Y


 ( )
√ X k Xk
√ itj 
= −i n tj pj + n log  pj exp √
j=1 j=1 npj
 " ( ) #
√ X k Xk
√ 
itj
= −i n tj pj + n log 1 + pj exp √ −1 
j=1 j=1 npj
 " #
√ X k
√ 
Xk
itj 1 i2 t2j
= −i n tj pj + n log 1 + pj √ + + ... 
j=1 j=1 np j 2 np j
 
√ X k
√ Xk
itj pj 1X k
t2j
= −i n 
tj pj + n log 1 + √ − + ...
j=1 j=1 np j 2 j=1 n
 
√ X k k k
√ i X √ 1 X
= −i n 
tj pj + n log 1 + √ tj pj − t2j + ...
j=1 n j=1 2n j=1
   2
√ X k k k k
√ i X √ 1 X n i X √
= −i n tj pj + n  √ tj pj − t2j + ... −  √ tj pj  + ...
j=1 n j=1 2n j=1 2 n j=1
 2
k
X k
X
1 1 √
= − t2j +  tj pj  + ...
2 j=1 2 j=1
  2 
k
X k
X
1 √ 
= −  t2j −  tj pj   + ...
2 j=1 j=1

1 k−1
X
= − u2j + ...
2 j=1

where u = (u1 , ..., uk ) = (t1³, ..., tk )Ak×k =´ tAk×k and where A is an orthonormal matrix
√ √
with the last column being p1 , ..., pk .
Note that Z = Y A has cumulant generating function

τ τY τ τ 1 k−1
X
κZ (u) = log EeiuZ = log EeitAA = log EeitY = − u2 + ...
2 j=1 j

128
From this, we get
Z ∼approx Nk (0, ΣZ )
where

ΣZ = Ik−1 , 0
0, 0

That is,
Z1 , ..., Zk−1 ∼approx N (0, 1) ,
Zk ∼prob 0.
Thus, by Slutsky’s theorem, we have

Y12 + ... + Yk2 = Y Y τ = Y AAτ Y τ = (Y A)(Y A)τ = ZZ τ = Z12 + ... + Zk2 ∼asy χ2k−1

The second part follows from the fact X/n → p with probability 1.

Method 2. Recall that


k
X (Xi − np0i )2
χ2 = = kZn (p0 )1×k D(p0 )k×k k2 = kYn (p0 )k2 ,
i=1 np0i

where
−1/2 −1/2
D(p) = diag(p1 , ..., pk )
√ √
Zn (p) = n (X/n − p) = n (X1 /n − p1 , ..., Xn /n − pk )
Yn (p) = Zn (p)1×k D(p)k×k .

It has been seen earlier that

Zn (p) ∼approx N (0, Σ)

or equivalently
Wn ≡ Zn (p)Σ−1/2 ∼approx N (0, Ik )
where
 
p1 (1 − p1 ), −p1 p2 , ......, −p1 pk
 −p2 p1 , p2 (1 − p2 ), ......, −p1 pk 
Σ = 


.
 ....... ...... ......, ...... 
−pk p1 , −pk p2 , ......, pk (1 − pk )

Note that Σ is not full rank since the sum of each row equals zero. However, we can still
find Σ1/2 even though this may not be unique.
Next we have

Yn (p0 )1×k ∼approx N (0, D(p0 )ΣD(p0 )) = N (0, B)

129
where
B = D(p)ΣD(p)
−1/2 −1/2
= diag(p1 , ..., pk )
 
p1 (1 − p1 ), −p1 p2 , ......, −p1 pk
 −p2 p1 , p2 (1 − p2 ), ......, −p1 pk 
×


 ....... ...... ......, ...... 
−pk p1 , −pk p2 , ......, pk (1 − pk )
−1/2 −1/2
×diag(p1 , ..., pk )
 √ √ 
1 − p1 , − p1 p2 , ......, − p1 pk

 − p p , √
1 − p2 , ......, − p2 pk 
=  
1 2 

 .... ...... ......, ...... 
√ √
− p1 pk , − p2 pk , ......, 1 − pk
 √ √ √ 
p1 p1 , p1 p2 , ......, p1 pk
 √p p , √p p , ......, √p p 
= Ik −  
1 2 2 2 2 k 

 .... ...... ......, ...... 
√ √ √
p1 pk , p2 pk , ......, pk pk
 √ 
p1
 √p  √ √ √
 2 
= Ik −   ( p1 , p2 , ......, pk )
 .... 

pk
τ
= Ik − φ φ,
³√ √ √ ´
where φ = p1 , p2 , ......, pk
Now
kYn (p)k2 = Yn (p)Yn (p)τ = Zn (p0 )D(p0 )D(p0 ) [Zn (p0 )]τ
= Zn (p0 )Σ−1/2 Σ1/2 D(p0 )D(p0 )Σ1/2 Σ−1/2 [Zn (p0 )]τ
³ ´³ ´³ ´τ
= Zn (p0 )Σ−1/2 Σ1/2 D(p0 )D(p0 )Σ1/2 Zn (p0 )Σ−1/2
= Wn AWnτ
∼ approx W AW τ ,
where W ∼ N (0, Ik ) and
A = Σ1/2 D(p0 )D(p0 )Σ1/2 .
First we claim that B is idempotent since
B 2 = (Ik − φτ φ)2 = Ik − 2φτ φ + φτ φφτ φ = Ik − 2φτ φ + φτ 1φ = Ik − φτ φ = B
Consequently, we see that A is also idempotent, since
A2 = Σ1/2 D(p0 )D(p0 )ΣD(p0 )D(p0 )D(p0 )Σ1/2 = Σ1/2 D(p0 )BD(p0 )Σ1/2
and
A3 = Σ1/2 D(p0 )BD(p0 )ΣD(p0 )D(p0 )Σ1/2 = Σ1/2 D(p0 )B 2 D(p0 )Σ1/2
= Σ1/2 D(p0 )BD(p0 )Σ1/2 = A2
That is, the engenvalues of A is either 1 or 0. So A is idempotent. (AX = λX = A2 X =
λAX = λ2 X, so λ2 − λ = 0, so λ = 1, or 0) And the rank of A is
rank(A) = trace[Σ1/2 D(p0 )D(p0 )Σ1/2 ] = trace[D(p0 )ΣD(p0 )]

130
= trace[B] = trace[Ik ] − trace[φτ φ] = k − trace[φφτ ] = k − 1
Therefore, there exists an orthogonal matrix Γ such that
à !
Ik−1 0
A=Γ Γτ .
0 0

Thus,
à !
Ik−1 0
W AW τ = W Γ Γτ W τ
0 0
à !
Ik−1 0
= V Vτ (where V = W Γ ∼ N (0, Ik ))
0 0
= V12 + ... + Vk−1
2

∼ χ2k−1 .

The proof is complete.

Remark 7.9.2 .

(a). A more general theorem is available, see Theorem 6.8. of Shao (1999,
p388).
(b). For more complete treatment of χ2 approximation, see Serfling.

7.9.3 Application: Goodness of fit tests


Goodness of fit tests with known distribution
Let Y1 , . . . , Yn ∼ F (not necessarily univariate). Consider the problem of testing

H0 : F = F0 , versus H1 : F 6= F0 , (9.14)

where F0 is a known d.f.. For instance, F = N (0, 1). In elementary statistics, we know
that there are two ways to deal with this problem.
(i). Komogrov-Smirnov test, which is based on

Dn = sup |Fn (y) − F0 (y)|, where Fn is the empirical d.f.


y

(See the notes in Nonparametric Statistics.)


(ii). Cramer-von Mises test, which is based on
Z
Cn (F ) = |Fn (y) − F (y)|2 dF (y).

(iii). χ2 test as given earlier, based on


k
X (Xi − np0i )2
χ2 = ∼approx χ2k−1 . (9.15)
i=1 np0i

For a comparison of these tests, see Shao (1999), page 400-401.

131
We shall concentrate on the χ2 test in this section. To obtain the χ2 test, we partition
the range of X into k disjoint events A1 , ..., Ak and denote pj = PF (Aj ) and p0j = PF0 (Aj ),
j = 1, ..., k. For instance, if Yj ’s are univariate, then we can partition the whole line into
k intervals (−∞, a1 ], (a1 , a2 ], ..., (ak−1 , ∞], and denote
p1 = F (a1 ),
pj = F (aj ) − F (aj−1 ), j = 2, ..., k − 1,
pk = 1 − F (aj ).
Now instead of testing (9.14), we can test a less stringent test
H0 : pi = p0i versus H1 : pi 6= p0i , i = 1, ..., k.
or
H 0 : p = p0 versus H1 : p = p0 . (9.16)

Let Xj be the number of the observations Yi ’s in Aj , j = 1, ..., k. Clearly,


(X1 , ..., Xk ) ∼ Multinomial (n, p1 , ..., pk )
A χ2 for testing (9.16) is simply given by (9.15).

Goodness of fit tests with unknown distribution


Let Y1 , . . . , Yn be a random sample. Consider the problem of testing
H0 : F = Fθ , versus H1 : F 6= Fθ , (9.17)
where Fθ is a known d.f. with unknown parameter θ ∈ Θ ∈ Rs . For instance, Fθ =
N (µ, σ 2 ). Clearly, χ2 test as given earlier, based on
k
X (Xi − np0i )2
χ2 = ∼approx χ2k−1 ,
i=1 np0i
can not be used here, since p0j depend on unknown Fθ . To overcome this difficulty, we
can use a modified χ2 test given below. Using the same notation as before, first let
pj (θ) = PFθ (Aj ), j = 1, ..., k
p(θ) = (p1 (θ), ......, pk (θ))
Now instead of testing (9.17), we can test a less stringent test
H0 : p = p(θ) versus H1 : p 6= p(θ). (9.18)
(Compare this with (9.16), which is a special case of this with s = 0.)
Let θ̂ be the MLE of θ under H0 . Denote p̂ = p(θ̂). Then by the earlier theorem, the
LRT rejects H0 when
−2 log λ (X) > χ2k−s−1 (1 − α),
which has asymptotic size α, where
k k
à !Xj k
à !Xj
Y [pj (θ̂)]Xj Y npj (θ̂) Y np̂j
λ (X) = = =
j=1 (Xj /n)Xj j=1 Xj j=1 Xj

132
Remark 7.9.3 We used that fact −2 log λ (X) ∼approx χ2k−s−1 . Here, let us see how we
figure out the degree of freedom from the general theorem of LRT.

(1). When there are no constraints, the number of free parameters is k − 1.


(−1 since the total is a constant 1.)

(2). Under H0 : p = p(θ), where θ is s-dim, the total number of free parame-
ters under H0 is s.

(3). Therefore, according to the LRT theorem, the degree of freedom for the
asymptotic χ2 distribution is

r = total degree of freedom − number of constraints ≡ (k − 1) − s.

Similarly to some early examples, it can be shown that


k
X (Xi − np̂i )2
−2 log λ (X) = + higher order term
i=1 np̂i

Therefore,
k
X (Xi − np̂i )2
χ̃2 = = −2 log λ (X) + higher order term ∼approx χ2k−s−1 .
i=1 np̂i

133
7.10 Test of independence in contingency tables

Example. (r × c contingency table.) We have considered 2 × 2 contingency table


earlier, where we derive an exact UMPU test (i.e. conditional hypergeometric test). An
extension of this is the r × c contingency table.

A1 A2 Ac
B1 X11 X12 · · · X1c X1·
B2 X21 X22 · · · X2c X2·
. ··· ··· ···
Br Xr1 Xr2 · · · Xrc Xr·
X·1 X·2 · · · X·c X··
where
Ai ’s are disjoint events with ∪ci=1 Ai = Ω (the sample space),
Bi ’s are disjoint events with ∪ri=1 Bi = Ω,
Xij is the observed frequency of the outcomes in Ai ∩ Bj .

We wish to test whether the two attributes A and B are independent, i.e.

H0 : pij = pi qj for all i, j versus H1 : H0 is not true.


P
where pij = P (Ai ∩ Bj ) = EXij /n, pi = P (Bi ) = j pij = EXi. /n and qj = P (Aj ) =
P P P P
i pij = EX.j /n. Of course, i,j pij = 1 with pij ≥ 0, i pi = 1 and j qj = 1.

We are interested in finding its LRT, Wald’s and Rao’s tests, Person’s χ2 tests.

LRT
Recall that X = (X11 , ..., Xrc ) follow multinomial distribution with θ = (p11 , ..., prc ). So

n!
l(θ) = Pθ (X = x) = p11 X11 ...prc Xrc .
X11 !...Xrc !
It is easy to find (i.e., by Lagrange multiplier) that the MLE of θ is

θ̂ = (p̂11 , ..., p̂rc ) = (X11 /n, ..., Xrc /n) .

To see why, define the Lagrange multiplier


X X n!
L(θ, λ) = Xij log pij − λ( pij − 1) + log .
i,j i,j X11 !...Xrc !

Set
∂L(θ, λ)
= Xij /pij − λ = 0, (10.19)
∂pij
∂L(θ, λ) X
= pij − 1 = 0. (10.20)
∂λ i,j

134
P P
From (10.19), Xij /λ = pij , so that i,j pij = i,j Xij /λ = n/λ = 1. That is λ = n.
Therefore, the solutions are p̂ij = Xij /n, which completes the proof.

On the other hand, under H0 : pij = pi qj for all i and j, we have


n! n!
l(θ0 ) = p11 X11 ......prc Xrc = p1 X11 q1 X11 ......pr Xrc qc Xrc
X11 !...Xrc ! X11 !...Xrc !
n! ³ ´³ ´
= p1 X1. ......pr Xr. q1 X.1 ......qc X.c ,
X11 !...Xrc !
where θ0 = (p1 , ..., pr , q1 , ..., qc ). It is easy to find (i.e., by Lagrange multiplier) that the
MLE of θ is

θ̂0 = (p̂1 , ..., p̂r , q̂1 , ..., q̂c ) = (X1. /n, ..., Xr. /n, X.1 /n, ..., X.c /n) .

To see why, define Lagrange multiplier


X X X X n!
L(θ, λ) = Xi. log pi + X.j log qj − λ1 ( pi − 1) − λ2 ( qj − 1) + log .
i j i j X11 !...Xrc !

The rest of the derivation is as before.

Therefore,
³ ´³ ´
l(θ̂0 ) p̂X Xr.
1 ......p̂r
1.
q̂1X.1 ......q̂cX.c
λ(X) = =
l(θ̂) p̂X Xrc
11 ...p̂rc
11

(p̂1 q̂1 )X11 .....(p̂r q̂c )Xrc


=
p̂X Xrc
11 ...p̂rc
11

à !X11 à !Xrc
p̂1 q̂1 p̂r q̂c
= .....
p̂11 p̂rc
µ ¶ µ ¶Xrc
X1. X.1 X11 Xr. X.c
= .....
nX11 nXrc
r Y c
à ! X ij
Y Xi. X.j
= .
i=1 j=1 nXij

We reject H0 if λ(X) < C, where C is determined by Pθ0 (λ(X) < C) = α.


Note that we can write
r X
c
à !
X Xi. X.j
−2 log λ(X) = −2 Xij log
i=1 j=1 nXij
r X
b
à !
X nXij
= 2 Xij log
i=1 j=1 Xi. X.j
r X
c
à !
X Xij − np̂i. q̂j
= 2 [np̂i. q̂j + (Xij − np̂i. q̂j )] log 1 +
i=1 j=1 np̂i. q̂j
r X
X c
= 2 np̂i. q̂j [1 + ∆ij ] log (1 + ∆ij )
i=1 j=1
à !
Xr X c
∆2
= 2 np̂i. q̂j [1 + ∆ij ] ∆ij − ij + higher order term
i=1 j=1 2

135
r X
X c µ ¶
∆ij
= 2 np̂i. q̂j [1 + ∆ij ] ∆ij 1− + higher order term
i=1 j=1 2
Xr X c · ¸
1
= 2 np̂i. q̂j ∆ij 1 + ∆ij + higher order term
i=1 j=1 2
Xr X c r X
X c
= 2 np̂i. q̂j ∆ij + np̂i. q̂j ∆2ij + higher order term
i=1 j=1 i=1 j=1
c
r X r X
c
X X (Xij − np̂i. q̂j )2
= 2 (Xij − np̂i. q̂j ) + + higher order term
i=1 j=1 i=1 j=1 np̂i. q̂j
r X
c r X
c r X
c
X X X (Xij − np̂i. q̂j )2
= 2 Xij − 2n−1 Xi. X.j + + higher order term
i=1 j=1 i=1 j=1 i=1 j=1 np̂i. q̂j
 Ã r ! c 
r X
c
−1
X X X (Xij − np̂i. q̂j )2
= 2 n − n Xi.  X.j   + + higher order term
i=1 j=1 i=1 j=1 np̂i. q̂j
r X c
X (Xij − np̂i. q̂j )2
= 2 [n − n] + + higher order term
i=1 j=1 np̂i. q̂j
r X
c
X (Xij − np̂i. q̂j )2
= + higher order term
i=1 j=1 np̂i. q̂j
2
∼ approx χ(r−1)(c−1) .

from the results on the multinomial distribution.


Here, let us see how we figure out the degree of freedom from the general theorem of
LRT.

(1). When there are no constraints, the number of free parameters is

k ≡ (rc − 1).

(−1 since the total is a constant 1.)

(2). Under H0 : pij = pi qj for all i and j, probability pij in each cell is a
function of the marginal probability. So the total number of free parameters
under H0 is

(k − r)by the notation in LTR theorem ≡ (r − 1) + (c − 1).

(−1 since the marginal probability total is a constant 1.)

(3). Therefore, according to the LRT theorem, the degree of freedom for the
asymptotic χ2 distribuiton is

r = k − (k − r) ≡ (rc − 1) − (r − 1) − (c − 1) = rc − r − c + 1 = (r − 1)(c − 1).

(Here, there are two different r’s. One is from LRT theorem, the other is from
the contigincy table.)

Wald tests, Rao’s tests, Person’s χ2 tests, etc


Left as exercises.

136
7.11 Some general comments
Remark 7.11.1 .

(1). All tests in this chapter have asymptotically χ2 distribution.


(2). Are tests here can only be used for simple null hypothesis (i.e. only
involving equalities, but not inequalities).
(3). UMPU test does not exist for r × c contingency table when r > 2 and
c > 2.

7.12 Exercises
1. Define P √ √
X nX̄ nX̄
W = qP = qP , T =
Xi2 Xi2 /n S
P
where S 2 = (Xi − X̄)2 /(n − 1). Show that the following identity holds
µ ¶1/2
n−1 W
T = q .
n 1 − W 2 /n

(Remark: W is the so-called “self-normalized mean” while T the Student’s t-


statistic. And W and T have a 1-1 correspondence.)

2. Find the LRT of H0 : θ = θ0 versus H1 : θ 6= θ0 based on a sample of size 1 from


density function
θ−x
f (x) = 2 2 I{0 < x < θ}.
θ

3. Let X1 , ..., Xm and Y1 , ..., Yn be two independent samples. Suppose that X’s and
Y ’s have common p.d.f.’s
1 −x/λ1
f1 (x) = e I{x > 0},
λ1
and
1 −y/λ2
f2 (y) = e I{y > 0},
λ2
respectively. We wish to test H0 : λ1 = λ2 versus H1 : λ1 6= λ2 .

(i). Find a UMPU test of size α.


(ii). Find an LRT of size α.
(iii). Are the two tests in (i) and (ii) the same?

137
4. Let X1 , ..., Xn be a random sample with p.d.f.
1
f (x) = e−x/θ I{x > 0}.
θ
Find the LRT of

(i) H0 : θ ≤ θ0 versus H1 : θ > θ0 .


(ii) H0 : θ = θ0 versus H1 : θ 6= θ0 .

5. Let X1 , ..., Xn be a random sample from a normal distribution with unknown pa-
rameters µ and σ 2 . Find the LRT of

(i) H0 : σ 2 ≤ σ02 versus H1 : σ 2 > σ02 .


(ii) H0 : σ 2 = σ02 versus H1 : σ 2 6= σ02 .

6. Let U1 /σ12 ∼ χ2d1 , and U2 /σ22 ∼ χ2d2 and they are independent. Suppose that σ22 /σ12 =
a. Show that U2 /U1 and aU1 + U2 are independent. In particular, if σ1 = σ2 , U2 /U1
and U1 + U2 are independent.
(Hint: Either use Basu’s theorem or transformation method.)

7. A random sample X1 , . . . , Xn is drawn from a Pareto population with pdf

θν θ
f (x) = I{x ≥ ν}
xθ+1
where θ > 0 and ν > 0.

(a). Find the MLE’s of θ and ν.


(b). Show that the LRT of

H0 : θ = 1 versus H1 : θ 6= 1

has critical region of the form {x : T (x) < C1 or T (x) > C2 }, where
0 < C1 < C2 and " Qn #
i=1 Xi
T = log
(X(1) )n
(c). Show that, under H0 , 2T has a chi squared distribution (exact, not
approximate), and find the number of degrees of freedom. (Hint: Obtain
the joint distribution of the n − 1 nontrivial terms Xi /X(1) conditional on
X(1) . Put these n−1 terms together, and notice that the distribution of T
given X(1) does not depend on X(1) , so it is the unconditional distribution
of T .)

8. We have already seen the usefulness of the LRT in dealing with problems with
nuisance parameters. We now look at some other nuisance parameter problems.

138
(a). Find the LRT’s of

H0 : θ ≤ 0 versus H1 : θ > 0

based on a sample X1 , . . . , Xn from the population with pdf


1 −(x−θ)/λ
f (x) = e I{x > θ}.
λ

(b). (a). Find the LRT’s of

H0 : γ = 1 versus H1 : γ 6= 1

based on a sample X1 , . . . , Xn from the Weibull(γ, β) with pdf


γ γ−1 −xγ /β
f (x) = x e , x > 0, β > 0.
β

9. A special case of a normal family is one in which the mean and the variance are re-
lated, the N (θ, aθ) family. If we are interested in testing this relationship, regardless
of the value of θ, we are again faced with a nuisance parameter problem.

(i) Find the LRT of H0 : a = 1 versus H1 : a 6= 1 based on a sample


X1 , . . . , Xn from N (θ, aθ), where θ is unknown.
(ii) A similar question can be asked about a related family, the N (θ, aθ2 )
family. Find the LRT of H0 : a = 1 versus H1 : a 6= 1 based on a sample
X1 , . . . , Xn from N (θ, aθ2 ), where θ is unknown.

10. Suppose that X1 , . . . , Xn are iid with a beta(µ, 1) pdf and Y1 , ..., Ym are iid with a
beta(θ, 1) pdf. Assume that X’s and Y ’s are independent.

(i). Find the LRT of H0 : θ = µ versus H1 : θ 6= µ.


(ii). Show that the test in part (a) can be based on the statistic
P
log Xi
T =P P
log Xi + log Yi

(iii). Find the distribution of T when H0 is true, and show how to get a
test of size α = 0.1.

139

You might also like