Virtual Laboratories > Interval Estimation > 1 2 3 4 [5] 6

5. Estimation in the Two-Sample Normal Model


In this section, we will study estimation problems in the two-sample normal model and in the bivariate normal model. This section parallels the section on Tests in the Two-Sample Normal Model in the Chapter on Hypothesis Testing.

The Two-Sample Normal Model

Suppose that X = (X1, X2, ..., Xn1) is a random sample of size n1 from the normal distribution with mean µ1 and variance d12 and that Y = (Y1, Y2, ..., Yn2) is a random sample of size n2 from the normal distribution with mean µ2 and variance d22. Moreover, suppose that the samples X and Y are independent.

This type of situation arises frequently when the random variables represent a measurement of interest for the objects of the population, and the samples correspond to two different treatments. For example, we might be interested in the blood pressure of a certain population of patients. The X vector records the blood pressures of a control sample, while the Y vector records the blood pressures of the sample receiving a new drug. Similarly, we might be interested in the yield of an acre of corn. The X vector records the yields of a sample receiving one type of fertilizer, while the Y vector records the yields of a sample receiving a different type of fertilizer.

Usually our interest is in a comparison of the parameters (either the mean or variance) for the two sampling distributions. In this section we will construct confidence intervals for the ratio of the variances and for the difference of the means. As with previous estimation problems we have studied, the procedures vary depending on what parameters are known or unknown. Also as before, key elements in the construction of the confidence intervals are the sample means and sample variances and the special properties of these statistics when the sampling distribution is normal. We will use the following notation:

  1. M1 = (1 / n1)sumi = 1, ..., n1 Xi.
  2. W12 = (1 / n1)sumi = 1, ..., n1 (Xi - µ1)2.
  3. S12 = [1 / (n1 - 1)]sumi = 1, ..., n1 (Xi - M1)2.
  4. M2 = (1 / n2)sumi = 1, ..., n2 Xi.
  5. W22 = (1 / n2)sumi = 1, ..., n2 (Xi - µ2)2.
  6. S22 = [1 / (n2 - 1)]sumi = 1, ..., n2 (Xi - M2)2.

Confidence Intervals d22 / d12 when µ1, µ2 are Known

We will first consider the estimation problem for the ratio of the variances d22 / d12 under the assumption that the means µ1 and µ1 are known. Usually, of course, this is an unrealistic assumption.

Mathematical Exercise 1. Show that F = (W12 / d12) / (W22 / d22) has the F distribution with n1 degrees of freedom in the numerator and n2 degrees of freedom in the denominator.

It follows that F is a pivotal variable for d22 / d12. Now for p in (0, 1) and for m > 0 and k >0, let fm, n, p denote the quantile of order p for the F distribution with m degrees of freedom in the numerator and n degrees of freedom in the denominator. For selected values of m, n, and p, fm, n, p can be computed using the quantile applet.

Mathematical Exercise 2. Use the pivotal variable F to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for d22 / d12 are given as follows:

  1. [fn1, n2, r/2 W22 / W12, fn1, n2, 1 - r/2 W22 / W12].
  2. fn1, n2, 1 - r W22 / W12.
  3. fn1, n2, r W22 / W12.

Confidence Intervals d22 / d12 when µ1, µ2 are Unknown

Next we will consider the estimation problem for the ratio of the variances d22 / d12 under the more realistic assumption that the means µ1 and µ1 are unknown.

Mathematical Exercise 3. Show that F = (S12 / d12) / (S22 / d22) has the F distribution with n1 - 1 degrees of freedom in the numerator and n2 - 1 degrees of freedom in the denominator.

It follows that F is a pivotal variable for d22 / d12.

Mathematical Exercise 4. Use the pivotal variable F to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for d22 / d12 are given as follows:

  1. [fn1 - 1, n2 - 1, r/2 S22 / S12, fn1 - 1, n2 - 1, 1 - r/2 S22 / S12].
  2. fn1 - 1, n2 - 1, 1 - a1 - 1, n2 - 1, 1 - r S22 / S12.
  3. fn1 - 1, n2 - 1, r S22 / S12.

Confidence Intervals for µ2 - µ1 when d1, d2 are Known

Next we will consider the estimation problem for the difference of the means µ2 - µ1 under the assumption that the standard deviations d1 and d2 are known. Of course, this is usually an unrealistic assumption.

Mathematical Exercise 5. Show that M2 - M1 has the normal distribution with mean µ2 - µ1 and variance d12 / n1 + d22 / n2.

Mathematical Exercise 6. Show that Z = [(M2 - M1) - (µ2 - µ1)] / (d12 / n1 + d22 / n2)1/2 has the standard normal distribution.

From Exercise 6, Z is a pivotal variable for µ2 - µ1. As usual, for p in (0, 1), we will let zp denote the quantile of order p for the standard normal distribution.

Mathematical Exercise 7. Use the pivotal variable Z to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for µ2 - µ1 are given as follows:

  1. [(M2 - M1) - z1 - r/2 (d12 / n1 + d22 / n2)1/2,
    (M2 - M1) + z1 - r/2 (d12 / n1 + d22 / n2)1/2].
  2. (M2 - M1) + z1 - r (d12 / n1 + d22 / n2)1/2.
  3. (M2 - M1) - z1 - r (d12 / n1 + d22 / n2)1/2.

Confidence Intervals for µ2 - µ1 when d1, d2 are Unknown

Finally we will consider the estimation problem for the difference of the means µ2 - µ1 under the more realistic assumption that the standard deviations d1 and d2 are unknown. In this case, it is more difficult to find a suitable pivot variable, but we can do the analysis in the special case that the standard deviations are the same. Thus, we will assume that

d1 = d2 = d and the common value d is unknown.

This assumption is reasonable if there is an inherent variability in the measurement variables that does not change even when different treatments are applied to the objects in the population.

Mathematical Exercise 8. Show that Z = [(M2 - M1) - (µ2 - µ1)] / [d(1 / n1 + 1 / n2)1/2] has the standard normal distribution.

To construct the pivot variable, we first need a point estimate of d2. A natural approach is to consider a weighted average of the sample variances S12 and S22, with the degrees of freedom as the weight factors (this is called the pooled estimate of d2). Thus, let

S2 = [(n1 - 1)S12 + (n2 - 1)S22] / (n1 + n2 - 2).

Mathematical Exercise 9. Show that V = (n1 + n2 - 2)S2 / d2 has the chi-square distribution with n1 + n2 - 2 degrees of freedom. Hint: (ni - 1)Si2 / d2 has the chi-square distribution with ni - 1degrees of freedom for i = 1 and 2, and these variables are independent.

Mathematical Exercise 10. Show that M2 - M1 and S2 are independent. Hint: (M1, S1) and (M2, S2) are independent, M1 and S1 are independent, and M2 and S2 are independent.

Mathematical Exercise 11. Show that T = [(M2 - M1) - (µ2 - µ1)] / [S(1 / n1 + 1 / n2)1/2] has the t-distribution with n1 + n2 - 2 degrees of freedom. Hint: Show that T = Z / [V / (n1 + n2 - 2)]1/2 where Z is the random variable in Exercise 8 and V is the random variable in Exercise 9. Moreover, Z and V are independent by Exercise 10.

From Exercise 11, T is a pivotal variable for µ2 - µ1. For k > 0 and p in (0, 1) let tk, p denote the quantile of order p for the t distribution with k degrees of freedom. For selected values of k and p, values of tk, p are given in the table of the Student t distribution or from the quantile applet.

Mathematical Exercise 12. Use the pivotal variable T to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound for µ2 - µ1 are given as follows:

  1. [(M2 - M1) - tn1 + n2 - 2, 1 - a/21 + n2 - 2, 1 - r/2 S(1 / n1 + 1 / n2)1/2,
    (M2 - M1) + tn1 + n2 - 2, 1 - a/21 + n2 - 2, 1 - r/2 S(1 / n1 + 1 / n2)1/2].
  2. (M2 - M1) + tn1 + n2 - 2, 1 - a1 + n2 - 2, 1 - r S(1 / n1 + 1 / n2)1/2.
  3. (M2 - M1) - tn1 + n2 - 2, 1 - r S(1 / n1 + 1 / n2)1/2.

Estimation in the Bivariate Normal Model

In this subsection, we consider a model that is superficially similar to the two-sample normal model, but is actually much simpler. Suppose that

(X1, Y1), (X2, Y2), ..., (Xn, Yn)

is a random sample of size n from the bivariate normal distribution (X, Y) with

E(X) = µ1, E(Y) = µ2, var(X) = d12, var(Y) = d22, cov(X, Y) = d1,2.

Thus, instead of a pair of samples, we have a sample of pairs. This type of model frequently arises in before and after experiments, in which a measurement of interest is recorded for a sample of n objects from the population, both before and after a treatment. For example, we could record the blood pressure of a sample of n patients, before and after the administration of a certain drug. As with the two-sample normal model, the interest is usually in estimating the difference of the means.

We will denote the sample means and variances of X and Y, and the sample covariance, respectively, by

M1, M2, S12, S22, S12.

Mathematical Exercise 13. Show that Y1 - X1, Y2 - X2, ..., Yn - Xn is a random sample of size n from the normal distribution with mean µ2 - µ1 and variance d2 = d12 + d22 - 2d1,2.

From Exercise 13, the differences fit the simple one-sample normal model.

Mathematical Exercise 14. Show that the sample mean and variance of the differences are

  1. M = M2 - M1.
  2. S2 = S12 + S22 - 2S12.

Mathematical Exercise 15. Show that if d is known, then a 1 - a confidence interval, confidence upper bound, and confidence lower bound for µ are given as follows, where the quantiles are from the standard normal distribution:

  1. [M - z1 - a/2 d / n1/2, M + z1 - a/2 d / n1/2].
  2. M + z1 - a d / n1/2.
  3. M - z1 - a d / n1/2.

Mathematical Exercise 16. Show that if d is unknown then a 1 - r confidence interval, confidence upper bound, and confidence lower bound for µ are given as follows, where the quantiles are from the t distribution with n - 1 degrees of freedom:

  1. [M - tn - 1, 1 - r/2 S / n1/2, M + tn - 1, 1 - r/2 S / n1/2].
  2. M + tn - 1, 1 - r S / n1/2.
  3. M - tn - 1, 1 - r S / n1/2.

Mathematical Exercise 17. Suppose that X = (X1, X2, ..., Xn) Y = (Y1, Y2, ..., Yn) are independent samples from normal distributions. This data fits both models--the two-sample normal model and the bivariate normal model. Which procedure would work better for estimating the difference of means µ2 - µ1?

Computational Exercises

Mathematical Exercise 18. A new drug is being developed to reduce a certain blood chemical. A sample of 36 patients are given a placebo while a sample of 49 patients are given the drug. The statistics (in mg) are m1 = 87, s1 = 4, m2 = 63, s2 = 6.

  1. Compute the 90% confidence interval for d2 / d1.
  2. Assuming the d1 = d2, compute the 90% confidence interval for µ2 - µ1.
  3. Based on (a), is the assumption that d1 = d2 reasonable?
  4. Based on (b), is the drug effective?

Mathematical Exercise 19. A company claims that an herbal supplement improves intelligence. A sample of 25 persons are given a standard IQ test before and after taking the supplement. The before and after statistics are m1 = 105, s1 = 13, m2 = 110, s2 = 17, s12 = 190. Find the 90% confidence interval for µ2 - µ1. Do you believe the company's claim?

Data Analysis Exercise 20. In Fisher's iris data, consider the petal length variable for the samples of Versicolor and Virginica irises.

  1. Compute the 90% confidence interval for d2 / d1.
  2. Assuming the d1 = d2, compute the 90% confidence interval for µ2 - µ1.
  3. Based on (a), is the assumption that d1 = d2 reasonable?

Mathematical Exercise 21. A plant has two machines that produce a circular rod whose diameter (in cm) is critical. A sample of 100 rods from the first machine as mean 10.3 and standard deviation 1.2. A sample of 100 rods from the second machine has mean 9.8 and standard deviation 1.6.

  1. Compute the 90% confidence interval for d2 / d1.
  2. Assuming the d1 = d2, compute the 90% confidence interval for µ2 - µ1.
  3. Based on (a), is the assumption that d1 = d2 reasonable?