Estimation in the Two-Sample Normal Model

5. Estimation in the Two-Sample Normal Model

In this section, we will study estimation problems in the two-sample normal model and in the bivariate normal model. This section parallels the section on Tests in the Two-Sample Normal Model in the Chapter on Hypothesis Testing.

The Two-Sample Normal Model

Suppose that X = (X₁, X₂, ..., X_n_₁) is a random sample of size n₁ from the normal distribution with mean ľ₁ and variance d₁² and that Y = (Y₁, Y₂, ..., Y_n₂) is a random sample of size n₂ from the normal distribution with mean ľ₂ and variance d₂². Moreover, suppose that the samples X and Y are independent.

This type of situation arises frequently when the random variables represent a measurement of interest for the objects of the population, and the samples correspond to two different treatments. For example, we might be interested in the blood pressure of a certain population of patients. The X vector records the blood pressures of a control sample, while the Y vector records the blood pressures of the sample receiving a new drug. Similarly, we might be interested in the yield of an acre of corn. The X vector records the yields of a sample receiving one type of fertilizer, while the Y vector records the yields of a sample receiving a different type of fertilizer.

Usually our interest is in a comparison of the parameters (either the mean or variance) for the two sampling distributions. In this section we will construct confidence intervals for the ratio of the variances and for the difference of the means. As with previous estimation problems we have studied, the procedures vary depending on what parameters are known or unknown. Also as before, key elements in the construction of the confidence intervals are the sample means and sample variances and the special properties of these statistics when the sampling distribution is normal. We will use the following notation:

M₁ = (1 / n₁)_{i
= 1, ..., n1} X_i.
W₁² = (1 / n₁)_{i
= 1, ..., n1} (X_i- ľ₁)².
S₁² = [1 / (n₁ - 1)]_{i
= 1, ..., n1} (X_i- M₁)².
M₂ = (1 / n₂)_{i
= 1, ..., n2} X_i.
W₂² = (1 / n₂)_{i
= 1, ..., n2} (X_i- ľ₂)².
S₂² = [1 / (n₂ - 1)]_{i
= 1, ..., n2} (X_i- M₂)².

Confidence Intervals `d`₂² / `d`₁² when ľ₁, ľ₂ are Known

We will first consider the estimation problem for the ratio of the variances d₂² / d₁² under the assumption that the means ľ₁ and ľ₁ are known. Usually, of course, this is an unrealistic assumption.

$Mathematical Exercise$ 1. Show that F = (W₁² / d₁²) / (W₂² / d₂²) has the F distribution with n₁ degrees of freedom in the numerator and n₂ degrees of freedom in the denominator.

It follows that F is a pivotal variable for d₂² / d₁². Now for p in (0, 1) and for m > 0 and k >0, let f_{m, n, p} denote the quantile of order p for the F distribution with m degrees of freedom in the numerator and n degrees of freedom in the denominator. For selected values of m, n, and p, f_{m, n, p}can be computed using the quantile applet.

$Mathematical Exercise$ 2. Use the pivotal variable F to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for d₂² / d₁² are given as follows:

[f_n_{₁, n₂, r/2} W₂² / W₁², f_n_{₁, n₂,
1 - r/2} W₂² / W₁²].
f_n_{₁, n₂, 1 - r} W₂² / W₁².
f_n_{₁, n₂, r} W₂² / W₁².

Confidence Intervals `d`₂² / `d`₁² when ľ₁, ľ₂ are Unknown

Next we will consider the estimation problem for the ratio of the variances d₂² / d₁² under the more realistic assumption that the means ľ₁ and ľ₁ are unknown.

$Mathematical Exercise$ 3. Show that F = (S₁² / d₁²) / (S₂² / d₂²) has the F distribution with n₁ - 1 degrees of freedom in the numerator and n₂ - 1 degrees of freedom in the denominator.

It follows that F is a pivotal variable for d₂² / d₁².

$Mathematical Exercise$ 4. Use the pivotal variable F to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for d₂² / d₁² are given as follows:

[f_n_{₁ - 1, n₂ - 1,
r/2} S₂² / S₁², f_n_{₁ - 1, n₂
- 1, 1 - r/2} S₂² / S₁²].
f_n₁ - 1, n₂ - 1, 1 - a_{₁ - 1, n₂ - 1, 1 -
r} S₂² / S₁².
f_n_{₁ - 1, n₂ - 1,
r} S₂² / S₁².

Confidence Intervals for ľ₂ - ľ₁ when `d`₁, `d`₂ are Known

Next we will consider the estimation problem for the difference of the means ľ₂ - ľ₁ under the assumption that the standard deviations d₁ and d₂ are known. Of course, this is usually an unrealistic assumption.

$Mathematical Exercise$ 5. Show that M₂ - M₁ has the normal distribution with mean ľ₂ - ľ₁ and variance d₁² / n₁ + d₂² / n₂.

$Mathematical Exercise$ 6. Show that Z = [(M₂ - M₁) - (ľ₂ - ľ₁)] / (d₁² / n₁ + d₂² / n₂)^1/2 has the standard normal distribution.

From Exercise 6, Z is a pivotal variable for ľ₂ - ľ₁. As usual, for p in (0, 1), we will let z_p denote the quantile of order p for the standard normal distribution.

$Mathematical Exercise$ 7. Use the pivotal variable Z to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for ľ₂ - ľ₁ are given as follows:

[(M₂ - M₁) - z_{1 -} _r_/2 (d₁² / n₁ + d₂² / n₂)^1/2,
(M₂ - M₁) + z_{1 -} _r_/2 (d₁² / n₁ + d₂² / n₂)^1/2].
(M₂ - M₁) + z_{1 -}_r (d₁² / n₁ + d₂² / n₂)^1/2.
(M₂ - M₁) - z_{1 -}_r (d₁² / n₁ + d₂² / n₂)^1/2.

Confidence Intervals for ľ₂ - ľ₁ when `d`₁, `d`₂ are Unknown

Finally we will consider the estimation problem for the difference of the means ľ₂ - ľ₁ under the more realistic assumption that the standard deviations d₁ and d₂ are unknown. In this case, it is more difficult to find a suitable pivot variable, but we can do the analysis in the special case that the standard deviations are the same. Thus, we will assume that

d₁ = d₂ = d and the common value d is unknown.

This assumption is reasonable if there is an inherent variability in the measurement variables that does not change even when different treatments are applied to the objects in the population.

$Mathematical Exercise$ 8. Show that Z = [(M₂ - M₁) - (ľ₂ - ľ₁)] / [d(1 / n₁ + 1 / n₂)^1/2] has the standard normal distribution.

To construct the pivot variable, we first need a point estimate of d². A natural approach is to consider a weighted average of the sample variances S₁² and S₂², with the degrees of freedom as the weight factors (this is called the pooled estimate of d²). Thus, let

S² = [(n₁ - 1)S₁² + (n₂ - 1)S₂²] / (n₁ + n₂ - 2).

$Mathematical Exercise$ 9. Show that V = (n₁ + n₂ - 2)S² / d² has the chi-square distribution with n₁ + n₂ - 2 degrees of freedom. Hint: (n_i - 1)S_i² / d² has the chi-square distribution with n_i - 1degrees of freedom for i = 1 and 2, and these variables are independent.

$Mathematical Exercise$ 10. Show that M₂ - M₁ and S² are independent. Hint: (M₁, S₁) and (M₂, S₂) are independent, M₁ and S₁ are independent, and M₂ and S₂ are independent.

$Mathematical Exercise$ 11. Show that T = [(M₂ - M₁) - (ľ₂ - ľ₁)] / [S(1 / n₁ + 1 / n₂)^1/2] has the t-distribution with n₁ + n₂ - 2 degrees of freedom. Hint: Show that T = Z / [V / (n₁ + n₂ - 2)]^1/2 where Z is the random variable in Exercise 8 and V is the random variable in Exercise 9. Moreover, Z and V are independent by Exercise 10.

From Exercise 11, T is a pivotal variable for ľ₂ - ľ₁. For k > 0 and p in (0, 1) let t_{k, p} denote the quantile of order p for the t distribution with k degrees of freedom. For selected values of k and p, values of t_{k, p} are given in the table of the Student t distribution or from the quantile applet.

$Mathematical Exercise$ 12. Use the pivotal variable T to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for ľ₂ - ľ₁ are given as follows:

[(M₂ - M₁) - t_n₁ + n₂ - 2, 1 - a/2_{₁ + n₂
- 2, 1 -}_r_/2 S(1 / n₁ + 1 / n₂)^1/2,
(M₂ - M₁) + t_n₁ + n₂ - 2, 1 - a/2_{₁ + n₂
- 2, 1 -}_r_/2 S(1 / n₁ + 1 / n₂)^1/2].
(M₂ - M₁) + t_n₁ + n₂ - 2, 1 - a_{₁ + n₂
- 2, 1 -}_r S(1 / n₁ + 1 / n₂)^1/2.
(M₂ - M₁) - t_n_{₁ + n₂
- 2, 1 -}_r S(1 / n₁ + 1 / n₂)^1/2.

Estimation in the Bivariate Normal Model

In this subsection, we consider a model that is superficially similar to the two-sample normal model, but is actually much simpler. Suppose that

(X₁, Y₁), (X₂, Y₂), ..., (X_n, Y_n)

is a random sample of size n from the bivariate normal distribution (X, Y) with

E(X) = ľ₁, E(Y) = ľ₂, var(X) = d₁², var(Y) = d₂², cov(X, Y) = d_1,2.

Thus, instead of a pair of samples, we have a sample of pairs. This type of model frequently arises in before and after experiments, in which a measurement of interest is recorded for a sample of n objects from the population, both before and after a treatment. For example, we could record the blood pressure of a sample of n patients, before and after the administration of a certain drug. As with the two-sample normal model, the interest is usually in estimating the difference of the means.

We will denote the sample means and variances of X and Y, and the sample covariance, respectively, by

M₁, M₂, S₁², S₂², S₁₂.

$Mathematical Exercise$ 13. Show that Y₁ - X₁, Y₂ - X₂, ..., Y_n - X_n is a random sample of size n from the normal distribution with mean ľ₂ - ľ₁ and variance d² = d₁² + d₂² - 2d_1,2.

From Exercise 13, the differences fit the simple one-sample normal model.

$Mathematical Exercise$ 14. Show that the sample mean and variance of the differences are

M = M₂ - M₁.
S² = S₁² + S₂² - 2S₁₂.

$Mathematical Exercise$ 15. Show that if d is known, then a 1 - a confidence interval, confidence upper bound, and confidence lower bound for ľ are given as follows, where the quantiles are from the standard normal distribution:

[M - z_{1 - a/2} d / n^1/2, M + z_{1 - a/2} d / n^1/2].
M + z_{1 - a} d / n^1/2.
M - z_{1 - a} d / n^1/2.

$Mathematical Exercise$ 16. Show that if d is unknown then a 1 - r confidence interval, confidence upper bound, and confidence lower bound for ľ are given as follows, where the quantiles are from the t distribution with n - 1 degrees of freedom:

[M - t_n _{- 1, 1 -}_r_/2 S / n^1/2, M + t_n _{- 1, 1 -}_r_/2 S / n^1/2].
M + t_n _{- 1, 1 -}_r S / n^1/2.
M - t_n _{- 1, 1 -}_r S / n^1/2.

$Mathematical Exercise$ 17. Suppose that X = (X₁, X₂, ..., X_n) Y = (Y₁, Y₂, ..., Y_n) are independent samples from normal distributions. This data fits both models--the two-sample normal model and the bivariate normal model. Which procedure would work better for estimating the difference of means ľ₂ - ľ₁?

Computational Exercises

$Mathematical Exercise$ 18. A new drug is being developed to reduce a certain blood chemical. A sample of 36 patients are given a placebo while a sample of 49 patients are given the drug. The statistics (in mg) are m₁ = 87, s₁ = 4, m₂ = 63, s₂ = 6.

Compute the 90% confidence interval for d₂ / d₁.
Assuming the d₁ = d₂, compute the 90% confidence interval for ľ₂ - ľ₁.
Based on (a), is the assumption that d₁ = d₂ reasonable?
Based on (b), is the drug effective?

$Mathematical Exercise$ 19. A company claims that an herbal supplement improves intelligence. A sample of 25 persons are given a standard IQ test before and after taking the supplement. The before and after statistics are m₁ = 105, s₁ = 13, m₂ = 110, s₂ = 17, s₁₂ = 190. Find the 90% confidence interval for ľ₂ - ľ₁. Do you believe the company's claim?

20. In Fisher's iris data, consider the petal length variable for the samples of Versicolor and Virginica irises.

Compute the 90% confidence interval for d₂ / d₁.
Assuming the d₁ = d₂, compute the 90% confidence interval for ľ₂ - ľ₁.
Based on (a), is the assumption that d₁ = d₂ reasonable?

$Mathematical Exercise$ 21. A plant has two machines that produce a circular rod whose diameter (in cm) is critical. A sample of 100 rods from the first machine as mean 10.3 and standard deviation 1.2. A sample of 100 rods from the second machine has mean 9.8 and standard deviation 1.6.

Compute the 90% confidence interval for d₂ / d₁.
Assuming the d₁ = d₂, compute the 90% confidence interval for ľ₂ - ľ₁.
Based on (a), is the assumption that d₁ = d₂ reasonable?

5. Estimation in the Two-Sample Normal Model

The Two-Sample Normal Model

Confidence Intervals d22 / d12 when ľ1, ľ2 are Known

Confidence Intervals d22 / d12 when ľ1, ľ2 are Unknown