Tests of the Mean in the Normal Model

2. Tests of the Mean in the Normal Model

Preliminaries

Suppose that X₁, X₂, ..., X_nis a random sample from the normal distribution with mean ľ and variance d². In this section we will construct hypothesis tests for ľ, one of the most important special cases. This section parallels the section on Estimation of the Mean in the Normal Model in the chapter on Interval Estimation.

The test procedure is different, depending on whether d is known or unknown; for this reason d is a nuisance parameter for the problem of testing ľ. The key elements in the construction of the tests are the sample mean and sample variance

M = (1 / n)_{i
= 1, ..., n} X_i.
S² = [1 / (n - 1)_{i
= 1, ..., n} (X_i - M)².

and the special properties of these statistics when the sampling distribution is normal.

Tests of ľ When `d` is Known

Suppose first that the standard deviation d is known; this assumption is usually artificial, but not always (see Exercise 23). Thus, the parameter space is {ľ: ľ is R} and all hypotheses define subsets of this space. The basic test statistic that we will use is

Z₀ = (M - ľ₀) / (d / n^1/2).

Note that Z₀ gives the directed distance from the sample mean to ľ₀ in units of standard deviations. Thus, Z₀ should give good information about competing hypotheses with ľ₀ on the boundary.

$Mathematical Exercise$ 1. Show that Z₀ has the normal distribution with

E(Z₀) = (ľ - ľ₀) / (d / n^1/2).
var(Z₀) = 1.

In particular, if ľ = ľ₀, Z₀ is the ordinary standard score and has the standard normal distribution. As usual, for p in (0, 1), we will let z_p denote the quantile of order p for the standard normal distribution. For selected values of p, values z_p can be obtained from the quantile applet.

$Mathematical Exercise$ 2. Show that the following tests have significance level r:

Reject H₀: ľ = ľ₀ versus H₁: ľ ľ₀ if and only if Z₀ > z_{1 -} _r_/2 or Z₀ < -z_{1
-}_r_/2.
Reject H₀: ľ ľ₀ versus H₁: ľ > ľ₀ if and only if Z₀ > z_{1 -}_r.
Reject H₀: ľ ľ₀ versus H₁: ľ < ľ₀ if and only if Z₀ < -z_{1 -}_r.

The following exercise is a special case of the general equivalence between hypothesis testing and interval estimation that was discussed in the introduction.

$Mathematical Exercise$ 3. For each of the tests in Exercise 2, show that we fail to reject H₀ at significance level a if and only if ľ₀ is in the corresponding 1 - r confidence interval.

The p-value of these test can be computed in terms of the standard normal distribution function G.

$Mathematical Exercise$ 4. Show that the p-values of the tests in Exercise 2 are respectively

2[1 - G(|Z₀|)]
1 - G(Z₀)
G(Z₀)

5. In the mean test experiment, make sure that sigma and z quantiles are selected. Select the normal distribution with standard deviation 2, significance level 0.1, sample size n = 20, and ľ₀ = 0. For each of the three tests, do the following:

For ľ = -1.0, -0.75, -0.5, 0.25, 0, 0.25, 0.5, 0.75, 1, run the experiment 1000 times each, updating every 10 runs, and then note the relative frequency of rejecting H₀ for each value.
When ľ = 0, compare the relative frequency with the significance level.
Based on these relative frequencies, sketch the empirical power curve.

6. In the mean estimate experiment, make sure that sigma and z quantiles are selected. Select the normal distribution with ľ = 0 and standard deviation 2, confidence level 0.90, and sample size n = 10. For each of the three types of confidence intervals, run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of ľ₀ for which the null hypothesis would be rejected.

Power Curves

Recall that the power function for a test of ľ is Q(ľ) = P(Reject H₀ | ľ). For the tests in Exercise 2, we can compute the power functions explicitly in terms of the standard normal distribution function G.

$Mathematical Exercise$ 7. For the test H₀: ľ = ľ₀ versus H₁: ľ ľ₀ at significance level r, show the following results and sketch the graph of Q:

Q(ľ) = G[-z_{1 -} _r_/2 + (ľ - ľ₀) / (d / n^1/2)] + G[-z_{1 -} _r_/2 - (ľ - ľ₀) / (d / n^1/2)]
Q(ľ) is symmetric about ľ₀.
Q(ľ) decreases for ľ < ľ₀ and increases for ľ > ľ₀.
Q(ľ₀) = r.
Q(ľ) 1 as ľ and Q(ľ) 1 as ľ -.

$Mathematical Exercise$ 8. For the test H₀: ľ ľ₀ versus H₁: ľ > ľ₀ at significance level a, show the following results and sketch the graph of Q.

Q(z) = G[-z_{1 -}_r + (ľ - ľ₀) / (d / n^1/2)]
Q is increasing
Q(ľ₀) = r.
Q(ľ) 0 as ľ - and G(ľ) 1 as ľ .

$Mathematical Exercise$ 9. For the test of H₀: ľ ľ₀ versus H₁: ľ < ľ₀ at significance level r, show the following results and sketch the graph of Q:

Q(z) = G[-z_{1 - a} - (ľ - ľ₀) / (d / n^1/2)]
Q is decreasing
Q(ľ₀) = r.
Q(ľ) 1 as ľ - and G(ľ) 0 as ľ .

$Mathematical Exercise$ 10. Show that for any of the three tests, increasing the sample size n or decreasing the standard deviation d results in a uniformly more powerful test.

Biased Tests

For the hypotheses H₀: ľ = ľ₀ versus H₁: ľ ľ₀, the symmetric two-sided test in Exercise 2 is the one most commonly used, but is not the only one. In the following exercises, we will explore the power of a non-symmetric tests. For p in (0, 1) consider the test

Reject H₀ if and only if Z₀ > z_{1
- pr} or Z₀ < z_{(1 - p)}_r.

Note that when p = 1/2, the test agrees with the symmetric test in Exercise 2.

$Mathematical Exercise$ 11. Show that the test has significance level a for any p in (0, 1).

$Mathematical Exercise$ 12. Show that the power function Q of the test satisfies the following properties, and then sketch the graph:

Q(ľ) = G[-z_{1 - pr} + (ľ - ľ₀) / (d / n^1/2)] + G[z_{(1 - p)}_r - (ľ - ľ₀) / (d / n^1/2)]
Q(ľ) decreases for ľ < m and increases for ľ > m where m = ľ₀ + (z_{1 - pr} + z_{(1 -
p)}_r) n^1/2 / (2d).
Q(ľ₀) = a.
Q(ľ) 1 as ľ and Q(ľ) 1 as ľ -.

$Mathematical Exercise$ 13. Show that as p increases, the test becomes more powerful for ľ > ľ₀ and less powerful for ľ < ľ₀.

Design of Experiment

In many cases, the first step is to design the experiment so that the significance level is r and so that the test has a given power for a given alternative.

$Mathematical Exercise$ 14. For a one-sided test, show that the sample size n needed for a test with significance level r and power 1 - s for the alternative ľ₁ is

n = (z_{1 -} _r_/2 + z_{1 -}_s)² d² / (ľ₁ - ľ₀)².

Hint: Set the power function equal to 1 - s and solve for n.

$Mathematical Exercise$ 15. For the two-sided test, show that the sample size n needed for a test with significance level r and power 1 - s for the alternative ľ₁ is approximately

n = (z_{1 -}_r + z_{1 -}_s)² d² / (ľ₁ - ľ₀)².

Hint: In the power function for the two sided test given in Exercise , neglect the first term if ľ₁ < ľ₀ and neglect the second term if ľ₁ > ľ₀.

Tests of ľ When `d` is Unknown

Consider now the more realistic assumption that d, as well as ľ, is unknown. In this case, the parameter space is {(ľ, d): ľ in R, d > 0} and all hypotheses define subsets of this space. The basic test statistic that we will use for tests about ľ is

T₀ = (M - ľ₀) / (S / n^1/2).

Recall that when ľ = ľ₀, T₀ has the student t distribution with n - 1 degrees of freedom; when ľ ľ₀, the distribution of T₀ is known as a non-central t distribution. As usual, t_{k, p} will denote the quantile of order p for the t distribution with k degrees of freedom.

$Mathematical Exercise$ 16. Show that the following tests have significance level r.

Reject H₀: ľ = ľ₀ versus H₁: ľ ľ₀ if and only if T₀ > t_n _{- 1, 1 -} _r_/2 or T₀ < -t_n _{- 1, 1 -}_r_/2.
Reject H₀: ľ ľ₀ versus H₁: ľ > ľ₀ if and only if T₀ > t_n _{- 1, 1 -}_r.
Reject H₀: ľ ľ₀ versus H₁: ľ < ľ₀ if and only if T₀ < -t_n _{- 1, 1 -}_r.

Review again the section on Estimates of the Mean, in the Chapter on Interval Estimation. The following exercise is a special case of the general equivalence between hypothesis testing and interval estimation that was discussed in the introduction.

$Mathematical Exercise$ 17. For each of the tests in Exercise 2, show that we fail to reject H₀ at significance level a if and only if ľ₀ is in the corresponding 1 - r confidence interval.

The p-value of these test can be computed in terms of the distribution function G_n - 1 of the t-distribution with n - 1 degrees of freedom..

$Mathematical Exercise$ 18. Show that the p-values of the tests in Exercise 16 are respectively

2[1 - G_n - 1(|T₀|)]
1 - G_n - 1(T₀)
G_n - 1(T₀)

19. In the mean test experiment, make sure that S and t quantiles are selected. Select the normal distribution with standard deviation 2, significance level 0.1, sample size n = 20, and ľ₀ = 0. For each of the three tests do the following:

For ľ = -1, -0.75, -0.5, -0.25, 0, 0.25, 0.5, 0.75, 1, run the experiment 1000 times each, updating every 10 runs, and then note the relative frequency of rejecting H₀ for each value.
When ľ = 0, compare the relative frequency with the significance level.
Based on these relative frequencies, sketch the empirical power curve.

20. In the mean estimate experiment, make sure that S and t quantiles are selected. Select the normal distribution with ľ = 0 and standard deviation 2,confidence level 0.90, and sample size n = 10. For each of the three types of intervals, run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of ľ₀ for which the null hypothesis would be rejected.

The power function for the tests in Exercise 16 can be computed explicitly in terms of the non-central t distribution function. Qualitatively, the graphs of the power functions are similar to the case when ľ is known, given in Exercises 7, 8, and 9.

If an upper bound d₀ on the standard deviation d is known, then conservative estimates on the sample size needed for a given confidence level and a given margin of error can be obtained using the methods of Exercises 14 and 15.

Non-Normal Distributions

One of the key assumptions that we made was that the underlying distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the underlying distribution, let alone whether or not it is normal. Suppose in fact that the underlying distribution is not normal. When n is relatively large, the distribution of the sample mean will still be approximately normal by the central limit theorem, and thus our derivation should still be approximately valid. The following exercises allow you to explore the robustness of the procedure.

21. In the mean test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of ľ₀, run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting H₀. When H₀ is true, compare the relative frequency with the significance level.

22. In the mean test experiment, select the uniform distribution on (0, 4). For the three different tests and for various significance levels, sample sizes, and values of ľ₀, run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting H₀. When H₀ is true, compare the relative frequency with the significance level.

How large n needs to be for the testing procedure to work well depends, of course, on the underlying distribution; the more this distribution deviates from normality, the larger n must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.

Computational Exercises

$Mathematical Exercise$ 23. The length of a certain machined part is supposed to be 10 centimeters. In fact, due to imperfections in the manufacturing process, the actual length is a random variable. The standard deviation is due to inherent factors in the process, which remain fairly stable over time. From historical data, the standard deviation is known with a high degree of accuracy to be 0.3. The mean, on the other hand, may be set by adjusting various parameters in the process and hence may change to an unknown value fairly frequently. We are interested in testing H₀: ľ = 10 versus H₁: ľ 10.

Suppose that a sample of 100 parts has mean 10.1. Perform the test at the 0.1 level of significance.
Compute the p-value for the data in (a).
Compute the power of the test in (a) at ľ = 10.05.
Compute the approximate sample size needed for significance level 0.1 and power 0.8 when ľ = 10.05.

$Mathematical Exercise$ 24. A bag of potato chips of a certain brand has an advertised weight of 250 grams. Actually, the weight (in grams) is a random variable. Suppose that a sample of 75 bags has mean 248 and standard deviation 5. At the 0.05 significance level, test H₀: ľ 250 versus H₁: ľ < 250.

$Mathematical Exercise$ 25. At a tele-marketing firm, the length of a telephone solicitation (in seconds) is a random variable. A sample of 50 calls has mean 310 and standard deviation 25. At the 0.1 level of significance, can we conclude that ľ > 300?

$Mathematical Exercise$ 26. At a certain farm the weight of a peach (in ounces) at harvest time is a random variable. A sample of 100 peaches has mean 8.2 and standard deviation 0.5. At the 0.05 level of significance, can we conclude that ľ > 8?

$Mathematical Exercise$ 27. The hourly wage for a certain type of construction work is a random variable with standard deviation 1.25. For sample of 25 workers, the mean wage was $6.75. At the 0.01 level of significance, can we conclude that ľ < 7.00?

28. Using Michelson's data, test to see if the velocity of light is greater than 730 (+299000) km/sec, at the 0.1 significance level.

29. Using Cavendish's data, test to see if the density of the earth is less than 5.5 times the density of water, at the 0.05 significance level.

30. Using Short's data, test to see if the parallax of the sun differs from 9 seconds of a degree, at the 0.1 significance level.

31. Using Fisher's iris data, perform the following tests, at the 0.1 level:

The mean petal length of Setosa irises differs from 15 mm.
The mean petal length of Verginica irises is greater than 52 mm.
The mean petal length of Versicolor irises is less than 42 mm.