Virtual Laboratories > Hypothesis Testing > 1 2 3 4 5 6 [7]

7. The Goodness of Fit Test


Preliminaries

Suppose that we have a random experiment with a random variable X of interest. Assume additionally that X is discrete with density function f on a finite set S. We repeat the experiment n times go generate a random sample of size n from the distribution of X:

X1, X2, ..., Xn.

Recall that these are independent variables, each with the distribution of X.

In this section, we assume that the distribution of X is unknown. For a given density function f0, we will test the hypotheses

H0: f = f0 versus H1: f noteq.gif (843 bytes) f0,

The test that we will construct is known as the goodness of fit test for the conjectured density f0. As usual, our challenge in developing the test is to find a good test statistic--one that gives us information about the hypotheses and whose distribution, under the null hypothesis, is known, at least approximately.

Derivation of the Test

Suppose that S = {x1, x2, ..., xk}. To simplify the notation, let

pj = f0(xj) for j = 1, 2, ..., k.

Now let Nj = #{i in {1, 2, ..., n}: Xi = xj} for j = 1, 2, ..., k.

Mathematical Exercise 1. Show that under the null hypothesis,

  1. N = (N1, N2, ..., Nk) has the multinomial distribution with parameters n and p1, p2, ..., pk.
  2. E(Nj) = npj.
  3. var(Nj) = npj(1 - pj).

Exercise 1 indicates how we might begin to construct our test: for each j we can compare the observed frequency of xj (namely Nj) with the expected frequency of value xj (namely npj), under the null hypothesis. Specifically, our test statistic will be

V = (N1 - np1)2 / np1 + (N2 - np2)2 / np2 + ··· + (Nk - npk)2 / npk.

Note that the test statistic is based on the squared errors (the differences between the expected frequencies and the observed frequencies). The reason that the squared errors are scaled as they are is the following crucial fact, which we will accept without proof: Under the null hypothesis, as n increases to infinity, the distribution of V converges to the chi-square distribution with k - 1 degrees of freedom.

As usual, for m > 0 and r in (0, 1), we will let vm, r denote the quantile of order p for the chi-square distribution with k degrees of freedom. For selected values of m and r, vm, r can be obtained from the table of the chi-square distribution.

Mathematical Exercise 2. Show that the following test has approximate significance level a:

Reject H0: f = f0 versus H1: f noteq.gif (843 bytes) f0, if and only if V > vk - 1, 1 - a.

Again, the test is an approximate one that works best when n is large. Just how large n needs to be depends on the pj; the rule of thumb is that the test will work well if the expected frequencies npj are at least 1 and at least 80% are at least 5.

Let an indicator variable I takes the value 1 when the null hypothesis is rejected and the value 0 when it is not rejected.

Mathematical Exercise 3. Suppose that the sampling and test distributions are the same. Explain why

  1. The null hypothesis is true.
  2. I = 0 means a correct decision
  3. I = 1 means a type 1 error.
  4. The relative frequency of the event I = 1, as we run the experiment repeatedly, converges to the true significance level of the test.
  5. If the sample size n is large, the number in (d) should be close to the chosen significance level.

Mathematical Exercise 4. Suppose that the sampling and test distributions are different. Explain why

  1. The null hypothesis is false.
  2. I = 0 means a type 2 error.
  3. I = 1 means a correct decision.
  4. The relative frequency of the event I = 1, as we run the experiment repeatedly, converges to the power of the test.

Simulation Exercises

In the simulation exercises below, you will be able to judge the quality of the test empirically.

Simulation Exercise 5. In the chi-square dice experiment, set the sampling distribution to fair, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

  1. fair
  2. ace-six flats
  3. the symmetric, unimodal distribution
  4. the distribution skewed right

Simulation Exercise 6. In the chi-square dice experiment, set the sampling distribution to ace-six flats, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

  1. fair
  2. ace-six flats
  3. the symmetric, unimodal distribution
  4. the distribution skewed right

Simulation Exercise 7. In the chi-square dice experiment, set the sampling distribution to the symmetric, unimodal distribution, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

  1. the symmetric, unimodal distribution
  2. fair
  3. ace-six flats
  4. the distribution skewed right

Simulation Exercise 8. In the chi-square dice experiment, set the sampling distribution to the distribution skewed right, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

  1. the distribution skewed right
  2. fair
  3. ace-six flats
  4. the symmetric, unimodal distribution

Mathematical Exercise 9. Suppose that D1 and D2 are different distributions. Is the power of the test with sampling distribution D1 and test distribution D2 the same as the power of the test with sampling distribution D2 and test distribution D1? Make a conjecture based on your results in Exercises 5-8.

Simulation Exercise 10. In the chi-square dice experiment, set the sampling and test distributions to fair and the significance level to 0.05. Run the experiment 1000 times for each of the following sample sizes. In each case, give the empirical estimate of the significance level and compare with 0.05.

  1. n = 10
  2. n = 20
  3. n = 50
  4. n = 100

Simulation Exercise 11. In the chi-square dice experiment, set the sampling distribution to fair, the test distributions to ace-six flats, and the significance level to 0.05. Run the experiment 1000 times for each of the following sample sizes. In each case, give the empirical estimate of the power of the test. Do the powers seem to be converging?

  1. n = 10
  2. n = 20
  3. n = 50
  4. n = 100

Related Topics

For a descriptive goodness of fit test, see the section on Probability Plots in the chapter on Random Samples.