Likelihood Ratio Tests

6. Likelihood Ratio Tests

Preliminaries

As usual, our starting point is a random experiment with a sample space, and a probability measure P. In the basic statistical model, we have an observable random variable X taking values in a set S. In general, X can have quite a complicated structure. For example, if the experiment is to sample n objects from a population and record various measurements of interest, then

X = (X₁, X₂, ..., X_n)

where X_i is the vector of measurements for the i'th object. The most important special case occurs when X₁, X₂, ..., X_n, are independent and identically distributed. In this case, we have a random sample of size n from the common distribution.

In the previous sections, we developed tests for parameters based on natural test statistics. However, in other cases, the tests may not be parametric, or there may not be an obvious statistic to start with. Thus, we need a more general method for constructing test statistics. Moreover, we do not yet know if the tests constructed so far are the best, in the sense of maximizing the power for the set of alternatives. In this and the next section, we investigate both of these ideas. Likelihood functions, similar to those used in maximum likelihood estimation, will play a key role.

Tests of Simple Hypotheses

Suppose that X has one of two possible distributions. Our simple hypotheses are

H₀: X has density f₀; H₁: X has density f₁.

The test that we will construct is based on the following simple idea: if we observe x, then the condition f₁(x) > f₀(x) is evidence in favor of the alternative; the opposite inequality is evidence against the alternative. Thus, let

L(x) = f₀(x) / f₁(x) for x in S.

The function L is the likelihood ratio function for the hypotheses and L(X) is the likelihood ratio statistic. Restating our earlier observation, note that small values of L are evidence in favor of the H₁. Thus it seems reasonable that the likelihood ratio statistic may be a good test statistic, and that we shooed consider tests of the following form, where k is a constant:

Reject H₀ if and only if L(X) k.

$Mathematical Exercise$ 1. Show that the significance level of the test is

r = P[L(X) k | H₀].

As usual, we can try to construct a test by choosing k so that r is a prescribed value. If X is discrete, this will only be possible when r is a value of the distribution function of L(X).

An important special case of this model occurs when the density function f(x | a) of X depends on a parameters a that has two possible values. Thus, the parameter space is A = {a₀, a₁}. In this case, the hypotheses are

H₀: a = a₀ versus H₁: a = a₁,

and the likelihood ratio function is L(x) = f(x | a₀) / f(x | a₁).

The Neyman-Pearson Lemma

The following exercises establish the Neyman-Pearson Lemma, and show that the test given above is most powerful. Let

R = {x S: L(x) k}.

$Mathematical Exercise$ 2. Use the definition of L and the definition of R to show that

P(X B | H₀) k P(X B | H₁) for B R.
P(X B | H₀) k P(X B | H₁) for B R^c.

$Mathematical Exercise$ 3. Show that if A R, then

P(X R | H₁) - P(X B | H₁) (1 / k) [P(X R | H₀) - P(X B | H₀)].

Hint: Write R = (R B) (R B^c) and B = (B R) (B R^c). Use the additivity of probability and the results in Exercise 1.

$Mathematical Exercise$ 4. Consider the tests with rejection regions R and B. Use Exercise 3 to show that if the size of B is no greater than the size of R then the test with rejection region R is more powerful:

P(X R | H₁) P(X B | H₁).

The Neyman-Pearson lemma is a beautiful result, and is more useful than might be first apparent. In many important cases, the same most powerful test works for a range of alternatives, and thus is a uniformly most powerful test for this range. In the following subsections, we will consider some of these special cases.

Tests for the Exponential Model

Suppose that X = (X₁, X₂, ..., X_n) is a random sample from the exponential distribution with scale parameter b. The sample variables might represent the lifetimes from a sample of devices. We wish to test the following simple hypotheses, where b₀, b₁ > 0 are distinct specified values.

H₀: b = b₀ versus H₁: b = b₁.

$Mathematical Exercise$ 5. Show that the likelihood ratio statistic is

L(X) = (b₁ / b₀)ⁿ exp[(1 / b₁ - 1 / b₀)Y] where Y = X₁ + X₂ + ЗЗЗ + X_n.

Recall that Y has the gamma distribution with shape parameter n and scale parameter b. We will denote the quantile of order r for the this distribution by y_r(n, b).

$Mathematical Exercise$ 6. Suppose that b₁ > b₀. Show that the most powerful test at the r level is

Reject H₀ if and only if Y > y_{1 -}_r(n, b₀).

$Mathematical Exercise$ 7. Suppose that b₁ < b₀. Show that the most powerful test at the r level is

Reject H₀ if and only if Y < y_r(n, b₀).

Note that the tests in Exercise 6 and 7 do not depend on the value of b₁. This fact, together with the monotonicity of the power function can be used to shows that the tests are uniformly most powerful for the usual one-sided tests.

$Mathematical Exercise$ 8. Show that the test in Exercise 6 is uniformly most powerful for the hypotheses

H₀: b b₀ versus H₁: b > b₀.

$Mathematical Exercise$ 9. Show that the test in Exercise 7 is uniformly most powerful for the hypotheses

H₀: b b₀ versus H₁: b < b₀.

Tests for the Bernoulli Model

Suppose that I = (I₁, I₂, ..., I_n) is a random sample from the Bernoulli distribution with parameter p. The sample could represent the results of tossing a coin n times, where p is the probability of heads. We wish to test the simple hypotheses

H₀: p = p₀ versus H₁: p = p₁,

where p₀, p₁ in (0, 1) are specified distinct values. In the coin tossing model, we know that the probability of heads is either p₀ or p₁, but we don't know which.

$Mathematical Exercise$ 10. Show that the likelihood ratio statistic is

L(I) = [(1 - p₀) / (1 - p₁)]ⁿ {p₀(1 - p₁) / [p₁(1 - p₀)]}^X where X = I₁ + I₂ + ЗЗЗ + I_n.

Recall that X has the binomial distribution with parameters n and p. We will denote the quantile of order r for the this distribution by x_r(n, p); although since the distribution is discrete, only certain values of r are possible.

$Mathematical Exercise$ 11. Suppose that p₁ > p₀. Show that the most powerful test at level r is

Reject H₀ if and only if X x_{1 - r}(n, p₀).

$Mathematical Exercise$ 12. Suppose that p₁ < p₀. Show that the most powerful test at level r is

Reject H₀ if and only if X x_r(n, p₀).

Again, note that the tests in Exercise 11 and 12 do not depend on the value of p₁. This fact, together with the monotonicity of the power function can be used to shows that the tests are uniformly most powerful for the usual one-sided tests.

$Mathematical Exercise$ 13. Show that the test in Exercise 11 is uniformly most powerful for the hypotheses

H₀: p p₀ versus H₁: p > p₀.

$Mathematical Exercise$ 14. Show that the test in Exercise 12 is uniformly most powerful for the hypotheses

H₀: p p₀ versus H₁: p < p₀.

Uniformly Most Powerful Tests

The one sided tests that we derived in the normal model, for Е with d known, for Е with d unknown, and for d with Е unknown are all uniformly most powerful. On the other hand, none of the two-sided tests are uniformly most powerful.

A Nonparametric Example

Suppose that X = (X₁, X₂, ..., X_n) is a random sample, either from the Poisson distribution with parameter 1 or from the geometric distribution with parameter 1/2. Note that both of these are distributions on the nonnegative integers, and both have mean 1. Thus, we wish to test

H₀: X is a sample from g₀(x) = e^-x / x! for x = 0, 1, ...
H₁: X is a sample from g₁(x) = (1/2)^{x + 1} for x = 0, 1, ...

$Mathematical Exercise$ 15. Show that the likelihood ratio statistic is

L(X) = 2ⁿ e^-n 2^Y / U where Y = X₁ + X₂ + ЗЗЗ + X_n and U = X₁! X₂! ЗЗЗ X_n!.

$Mathematical Exercise$ 16. Show that the most powerful tests have the following form, where d is a constant.

Reject H₀ if and only if ln(2) Y - ln(U) d.

Generalized Likelihood Ratio

The likelihood ratio statistic can be generalized to composite hypotheses. Suppose again that the density f(x | a) of the data variable X depends on a parameters a, taking values in a parameter space A. Consider the following hypotheses, where A₀ is a subset of A:

H₀: a A₀ versus H₁: a A - A₀.

We define

L(x) = max{f(x | a): a A₀} / max{f(x | a): a A} for x in S.

The function L is the likelihood ratio function and L(X) is the likelihood ratio statistic. By the same reasoning as before, small values of L(x) are evidence in favor of the alternative hypothesis.