The Goodness of Fit Test

7. The Goodness of Fit Test

Preliminaries

Suppose that we have a random experiment with a random variable X of interest. Assume additionally that X is discrete with density function f on a finite set S. We repeat the experiment n times go generate a random sample of size n from the distribution of X:

X₁, X₂, ..., X_n.

Recall that these are independent variables, each with the distribution of X.

In this section, we assume that the distribution of X is unknown. For a given density function f₀, we will test the hypotheses

H₀: f = f₀ versus H₁: f f₀,

The test that we will construct is known as the goodness of fit test for the conjectured density f₀. As usual, our challenge in developing the test is to find a good test statistic--one that gives us information about the hypotheses and whose distribution, under the null hypothesis, is known, at least approximately.

Derivation of the Test

Suppose that S = {x₁, x₂, ..., x_k}. To simplify the notation, let

p_j = f₀(x_j) for j = 1, 2, ..., k.

Now let N_j = #{i in {1, 2, ..., n}: X_i = x_j} for j = 1, 2, ..., k.

$Mathematical Exercise$ 1. Show that under the null hypothesis,

N = (N₁, N₂, ..., N_k) has the multinomial distribution with parameters n and p₁, p₂, ..., p_k.
E(N_j) = np_j.
var(N_j) = np_j(1 - p_j).

Exercise 1 indicates how we might begin to construct our test: for each j we can compare the observed frequency of x_j (namely N_j) with the expected frequency of value x_j (namely np_j), under the null hypothesis. Specifically, our test statistic will be

V = (N₁ - np₁)² / np₁ + (N₂ - np₂)² / np₂ + ЗЗЗ + (N_k - np_k)² / np_k.

Note that the test statistic is based on the squared errors (the differences between the expected frequencies and the observed frequencies). The reason that the squared errors are scaled as they are is the following crucial fact, which we will accept without proof: Under the null hypothesis, as n increases to infinity, the distribution of V converges to the chi-square distribution with k - 1 degrees of freedom.

As usual, for m > 0 and r in (0, 1), we will let v_{m, r} denote the quantile of order p for the chi-square distribution with k degrees of freedom. For selected values of m and r, v_{m, r} can be obtained from the table of the chi-square distribution.

$Mathematical Exercise$ 2. Show that the following test has approximate significance level a:

Reject H₀: f = f₀ versus H₁: f f₀, if and only if V > v_k - 1, 1 - a.

Again, the test is an approximate one that works best when n is large. Just how large n needs to be depends on the p_j; the rule of thumb is that the test will work well if the expected frequencies np_j are at least 1 and at least 80% are at least 5.

Let an indicator variable I takes the value 1 when the null hypothesis is rejected and the value 0 when it is not rejected.

$Mathematical Exercise$ 3. Suppose that the sampling and test distributions are the same. Explain why

The null hypothesis is true.
I = 0 means a correct decision
I = 1 means a type 1 error.
The relative frequency of the event I = 1, as we run the experiment repeatedly, converges to the true significance level of the test.
If the sample size n is large, the number in (d) should be close to the chosen significance level.

$Mathematical Exercise$ 4. Suppose that the sampling and test distributions are different. Explain why

The null hypothesis is false.
I = 0 means a type 2 error.
I = 1 means a correct decision.
The relative frequency of the event I = 1, as we run the experiment repeatedly, converges to the power of the test.

Simulation Exercises

In the simulation exercises below, you will be able to judge the quality of the test empirically.

5. In the chi-square dice experiment, set the sampling distribution to fair, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

fair
ace-six flats
the symmetric, unimodal distribution
the distribution skewed right

6. In the chi-square dice experiment, set the sampling distribution to ace-six flats, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

fair
ace-six flats
the symmetric, unimodal distribution
the distribution skewed right

7. In the chi-square dice experiment, set the sampling distribution to the symmetric, unimodal distribution, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

the symmetric, unimodal distribution
fair
ace-six flats
the distribution skewed right

8. In the chi-square dice experiment, set the sampling distribution to the distribution skewed right, the sample size to 50, and the significance level to 0.1. Set the test distribution as indicated below and in each case, run the simulation 1000 times. In case (a), give the empirical estimate of the significance level of the test and compare with 0.1. In the other cases, give the empirical estimate of the power of the test. Rank the distributions in (b)-(d) in increasing order of apparent power. Do your results seem reasonable?

the distribution skewed right
fair
ace-six flats
the symmetric, unimodal distribution

$Mathematical Exercise$ 9. Suppose that D₁ and D₂ are different distributions. Is the power of the test with sampling distribution D₁ and test distribution D₂ the same as the power of the test with sampling distribution D₂ and test distribution D₁? Make a conjecture based on your results in Exercises 5-8.

10. In the chi-square dice experiment, set the sampling and test distributions to fair and the significance level to 0.05. Run the experiment 1000 times for each of the following sample sizes. In each case, give the empirical estimate of the significance level and compare with 0.05.

n = 10
n = 20
n = 50
n = 100

11. In the chi-square dice experiment, set the sampling distribution to fair, the test distributions to ace-six flats, and the significance level to 0.05. Run the experiment 1000 times for each of the following sample sizes. In each case, give the empirical estimate of the power of the test. Do the powers seem to be converging?

n = 10
n = 20
n = 50
n = 100

7. The Goodness of Fit Test

Preliminaries

Derivation of the Test

Simulation Exercises

Related Topics