Tests in the Two-Sample Normal Model

5. Tests in the Two-Sample Normal Model

In this section, we will study hypothesis tests in the two-sample normal model and in the bivariate normal model. This section parallels the section on Estimation in the Two Sample Normal Model in the chapter on Interval Estimation.

The Two-Sample Normal Model

Suppose first that X = (X₁, X₂, ..., X_n_₁) is a random sample of size n₁ from the normal distribution with mean ľ₁ and variance d₁² and that Y = (Y₁, Y₂, ..., Y_n_₂) is a random sample of size n₂ from the normal distribution with mean ľ₂ and variance d₂². Moreover, suppose that the samples X and Y are independent.

This type of situation arises frequently when the random variables represent a measurement of interest for the objects of the population, and the samples correspond to two different treatments. For example, we might be interested in the blood pressure of a certain population of patients. The X vector records the blood pressures of a control sample, while the Y vector records the blood pressures of the sample receiving a new drug. Similarly, we might be interested in the yield of an acre of corn. The X vector records the yields of a sample receiving one type of fertilizer, while the Y vector records the yields of a sample receiving a different type of fertilizer.

Usually our interest is in a comparison of the parameters (either the mean or variance) for the two sampling distributions. In this section we will construct tests for the ratio of the variances and for the difference of the means. As with previous estimation problems we have studied, the procedures vary depending on what parameters are known or unknown. Also as before, key elements in the construction of the tests are the sample means and sample variances and the special properties of these statistics when the sampling distribution is normal. We will use the following notation:

M₁ = (1 / n₁)_{i
= 1, ..., n₁} X_i.
W₁² = (1 / n₁)_{i
= 1, ..., n₁} (X_i- ľ₁)².
S₁² = [1 / (n₁ - 1)]_{i
= 1, ..., n₁} (X_i- M₁)².
M₂ = (1 / n₂)_{i
= 1, ..., n₂} X_i.
W₂² = (1 / n₂)_{i
= 1, ..., n₂} (X_i- ľ₂)².
S₂² = [1 / (n₂ - 1)]_{i
= 1, ..., n₂} (X_i- M₂)².

Tests for `d`₂² / `d`₁² when ľ₁, ľ₂ are Known

We will first consider tests for the ratio of the variances d₂² / d₁² when the means ľ₁, ľ₂ are known. Usually, of course, this is an unrealistic assumption. Our basic test statistic is

F₀ = (W₁² / W₂²)a₀ where a₀ > 0.

$Mathematical Exercise$ 1. Show that if d₂² / d₁² = a₀ then F₀ has the F distribution with n₁ degrees of freedom in the numerator and n₂ degrees of freedom in the denominator.

For p in (0, 1) and for m > 0 and k >0, let f_{m, n, p} denote the quantile of order p for the F distribution with m degrees of freedom in the numerator and n degrees of freedom in the denominator.

$Mathematical Exercise$ 2. Show that the following tests have significance level r:

Reject H₀: d₂² / d₁² = a₀ versus H₁: d₂² / d₁² a₀ if and only if F₀ > f_n_{₁, n₂, 1 -
r/2} or F₀ < f_n₁, n₂, a/2_{₁, n₂,
a/2}.
Reject H₀: d₂² / d₁² a₀ versus H₁:d₂² / d₁² > a₀ if and only if F₀ < f_n_{₁, n₂,
r}.
Reject H₀: d₂² / d₁² a₀ versus H₁: d₂² / d₁² < a₀ if and only if F₀ > f_n_{₁,
n₂, 1 - r}.

$Mathematical Exercise$ 3. For each of the tests in Exercise 2, show that we fail to reject H₀ at significance level r if and only if a₀ is in the corresponding 1 - r level confidence interval.

Tests for `d`₂² / `d`₁² when ľ₁, ľ₂ are Unknown

Next we will first consider tests for the ratio of the variances d₂² / d₁² under the more realistic assumption that the means ľ₁, ľ₂ are unknown. In this case, our basic test statistic is

F₀ = (S₁² / S₂²)a₀ where a₀ > 0.

$Mathematical Exercise$ 4. Show that if d₂² / d₁² = a₀ then F₀ has the F distribution with n₁ - 1 degrees of freedom in the numerator and n₂ - 1 degrees of freedom in the denominator.

$Mathematical Exercise$ 5. Show that the following tests have significance level r:

Reject H₀: d₂² / d₁² = a₀ versus H₁: d₂² / d₁² a₀ if and only if F₀ > f_n_{₁ - 1, n₂ - 1,
1 - r/2} or F₀ < f_n_{₁ - 1, n₂
- 1,} _r_/2.
Reject H₀: d₂² / d₁² a₀ versus H₁:d₂² / d₁² > a₀ if and only if F₀ < f_n_{₁ - 1, n₂
- 1, r}.
Reject H₀: d₂² / d₁² a₀ versus H₁: d₂² / d₁² < a₀ if and only if F₀ > f_n_{₁
- 1, n₂ - 1, 1 - r}.

$Mathematical Exercise$ 6. For each of the tests in Exercise 5, show that we fail to reject H₀ at significance level r if and only if a₀ is in the corresponding 1 - r level confidence interval.

Tests for ľ₂ - ľ₁ when `d`₁, `d`₂ are Known

Next we will consider the estimation problem for the difference of the means ľ₂ - ľ₁ under the assumption that the standard deviations d₁ and d₂ are known. Of course, this is usually an unrealistic assumption. Our test statistic is

Z₀ = [(M₂ - M₁) - a₀] / (d₁² / n₁ + d₂² / n₂)^1/2.

$Mathematical Exercise$ 7. Show that Z₀ has the normal distribution with mean a₀ - (ľ₂ - ľ₁) and variance 1.

As usual, let z_p denote the quantile of order p for the standard normal distribution. For selected values of p, values of z_p can be obtained from the quantile applet.

$Mathematical Exercise$ 8. Show that the following tests have significance level r:

Reject H₀: ľ₂ - ľ₁ = a₀ versus H₁: ľ₂ - ľ₁ a₀ if and only if Z₀ > z_{1 -} _r_{/ 2} or Z₀ < -z_{1
-}_r_{/ 2}.
Reject H₀: ľ₂ - ľ₁ a₀ versus H₁: ľ₂ - ľ₁ > a₀ if and only if Z₀ > z_{1 -}_r.
Reject H₀: ľ₂ - ľ₁ a₀ versus H₁: ľ₂ - ľ₁ < a₀ if and only if Z₀ < -z_{1 -}_r.

$Mathematical Exercise$ 9. For each of the tests in Exercise 8, show that we fail to reject H₀ at significance level r if and only if a₀ is in the corresponding 1 - r level confidence interval.

Tests for ľ₂ - ľ₁ when `d`₁, `d`₂ are Unknown

Finally, we will consider tests for the difference of the means under the more realistic assumption that the standard deviations d₁ and d₂ are unknown, but equal:

d₁ = d₂ = d.

This assumption is reasonable if there is an inherent variability in the measurement variables that does not change even when different treatments are applied to the objects in the population. Recall that the pooled estimate of the common variance d² is

S² = [(n₁ - 1)S₁² + (n₂ - 1)S₂²] / (n₁ + n₂ - 2).

Our basic test statistic is

T₀ = [(M₂ - M₁) - a₀] / [S (1 / n₁ + 1 / n₂)^1/2].

$Mathematical Exercise$ 10. Show that if ľ₂ - ľ₁ = a₀, then T₀ has the t-distribution with n = n₁ + n₂ - 2 degrees of freedom.

As usual, for k > 0 and p in (0, 1) let t_{k, p} denote the quantile of order p for the t distribution with k degrees of freedom. For selected values of k and p, values of t_k, p are given in the quantile applet.

$Mathematical Exercise$ 11. Show that the following tests have significance level r:

Reject H₀: ľ₂ - ľ₁ = a₀ versus H₁: ľ₂ - ľ₁ a₀ if and only if T₀ > t_n_{, 1 -}_r_{/ 2} or T₀ < -t_n_{, 1 -}_r_{/ 2}.
Reject H₀: ľ₂ - ľ₁ a₀ versus H₁: ľ₂ - ľ₁ > a₀ if and only if T₀ > t_n_{, 1 -}_r.
Reject H₀: ľ₂ - ľ₁ a₀ versus H₁: ľ₂ - ľ₁ < a₀ if and only if T₀ < -t_n_{, 1 -}_r.

$Mathematical Exercise$ 12. For each of the tests in Exercise 11, show that we fail to reject H₀ at significance level a if and only if a₀ is in the corresponding 1 - r level confidence interval.

Tests in the Bivariate Normal Model

In this subsection, we consider a model that is superficially similar to the two-sample normal model, but is actually much simpler. Suppose that

(X₁, Y₁), (X₂, Y₂), ..., (X_n, Y_n)

is a random sample of size n from the bivariate normal distribution with

E(X) = ľ₁, E(Y) = ľ₂, var(X) = d₁², var(Y) = d₂², cov(X, Y) = d_1,2.

Thus, instead of a pair of samples, we have a sample of pairs. This type of model frequently arises in before and after experiments, in which a measurement of interest is recorded for a sample of n objects from the population, both before and after a treatment. For example, we could record the blood pressure of a sample of n patients, before and after the administration of a certain drug.

$Mathematical Exercise$ 13. Show that Y₁ - X₁, Y₂ - X₂, ..., Y_n - X_n is a random sample of size n from the normal distribution with mean ľ₂ - ľ₁ and variance d² = d₁² + d₂² - 2d_1,2.

Thus, the differences fit the one-sample normal model that we have already studied. In particular, for tests of ľ₂ - ľ₁, see the section on Tests of the Mean in the Normal Model and for tests of d², see the section on Tests of the Variance in the Normal Model.

Computational Exercises

$Mathematical Exercise$ 14. A new drug is being developed to reduce a certain blood chemical. A sample of 36 patients are given a placebo while a sample of 49 patients are given the drug. The statistics (in mg) are m₁ = 87, s₁ = 4, m₂ = 63, s₂ = 6. Test the following at the 10% significance level:

H₀: d₁ = d₂ versus H₁: d₁ d₂.
H₀: ľ₁ ľ₂ versus H₁: ľ₁ > ľ₂ (assuming the d₁ = d₂).
Based on (b), is the drug effective?

$Mathematical Exercise$ 15. A company claims that an herbal supplement improves intelligence. A sample of 25 persons are given a standard IQ test before and after taking the supplement. The before and after statistics are m₁ = 105, s₁ = 13, m₂ = 110, s₂ = 17, s₁₂ = 190. At the 10% significance level, do you believe the company's claim?

16. In Fisher's iris data, consider the petal length variable for the samples of Versicolor and Virginica irises. Test the following at the 10% significance level:

H₀: d₁ = d₂ versus H₁: d₁ d₂.
H₀: ľ₁ ľ₂ versus H₁: ľ₁ > ľ₂ (assuming the d₁ = d₂).

$Mathematical Exercise$ 17. A plant has two machines that produce a circular rod whose diameter (in cm) is critical. A sample of 100 rods from the first machine as mean 10.3 and standard deviation 1.2. A sample of 100 rods from the second machine has mean 9.8 and standard deviation 1.6.

H₀: d₁ = d₂ versus H₁: d₁ d₂.
H₀: ľ₁ = ľ₂ versus H₁: ľ₁ ľ₂ (assuming the d₁ = d₂).

5. Tests in the Two-Sample Normal Model

The Two-Sample Normal Model

Tests for d22 / d12 when ľ1, ľ2 are Known

Tests for d22 / d12 when ľ1, ľ2 are Unknown