Estimation of the Variance in the Normal Model

3. Estimation of the Variance in the Normal Model

Preliminaries

Suppose that X₁, X₂, ..., X_nis a random sample from the normal distribution with mean ľ and variance d². In this section we will construct confidence intervals for d², one of the most important special cases of interval estimation. A parallel section on Tests for the Variance in the Normal Model is in the chapter on Hypothesis Testing.

As usual, we will construct the confidence intervals by finding pivotal variables for d². The construction depends on whether the mean ľ is known or unknown; thus ľ is a nuisance parameter for the problem of estimating ľ. Finally, recall that the normal family is a location-scale family.

Confidence Intervals for `d`² when ľ is Known

Suppose first that ľ is known, although this is usually an artificial assumption in applications. Recall that in this case, the natural estimator of d² is

W² =(1 / n) _i_{= 1, ..., n}(X_i - ľ)².

Recall also that V = nW² / d² has the chi-square distribution with n degrees of freedom, and hence is a pivotal variable for d². Now for k > 0 and p in (0, 1), let v_{k, p} denote the quantile of order p for the chi-square distribution with k degrees of freedom. For selected values of k and p and n, v_{k, p} can be obtained from the table of the chi-square distribution or from the quantile applet.

$Mathematical Exercise$ 1. Use the pivotal variable V to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound are given as follows:

[nW² / v_n_{, 1 -}_r_/2, nW² / v_{n, r}_/2].
nW² / v_{n, r}.
nW² / v_n_{, 1 -}_r.

Note that we have used the equal-tail choice in the construction of the two-sided interval, but the interval is not symmetric about the sample variance W² (unlike the confidence intervals for ľ, which are always symmetric about the sample mean M).

Confidence Intervals for `d`² when ľ is Unknown

Consider now the more realistic case in which ľ, as well as d², is unknown. In this case, the sample variance is

S² = [1 / (n - 1)] _i_{= 1, ..., n}(X_i - M)².

where M = (1 / n) _i_{= 1, ..., n}X_i is the sample mean. Recall that

V = (n - 1)S² / d²

has the chi-square distribution with n - 1 degrees of freedom, and hence is a pivotal variable for d².

$Mathematical Exercise$ 2. Use the pivotal variable V to show that a 1 - r confidence interval, confidence upper bound, and confidence lower bound are given as follows:

[(n - 1)S² / v_n_{-1, 1 -}_r_/2, (n - 1)S² / v_n_-1,_r_/2].
(n - 1)S² / v_n_-1,_r.
(n - 1)S² / v_n_{-1, 1 -}_r.

3. Use variance estimation experiment to explore the procedure. Select the normal distribution. Use various parameter values, confidence levels, sample sizes, and interval types. For each configuration, run the experiment 1000 times with an update frequency of 10. As the simulation runs, note that the confidence interval successfully captures the standard deviation if and only if the value of the pivot variable is between the quantiles. Note the size and location of the confidence intervals and note how well the proportion of successful intervals approximates the theoretical confidence level.

Non-Normal Distributions

One of the key assumptions that we made was that the underlying distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the underlying distribution, let alone whether or not it is normal. Even when the underlying distribution is not normal, the procedures of this section are still used to construct approximate confidence intervals for the variance. You will see in the simulation exercises below that this procedure is not nearly as robust as that of constructing interval estimates for the mean. Nonetheless, if the distribution is not too far from normal, the procedure usually works well.

4. In variance estimation experiment, select the gamma distribution. Use various parameter values, confidence levels, sample sizes, and interval types. For each configuration, run the experiment 1000 times with an update frequency of 10. Note the size and location of the confidence intervals and note how well the proportion of successful intervals approximates the theoretical confidence level.

5. In variance estimation experiment, select the uniform distribution. Use various parameter values, confidence levels, sample sizes, and interval types. For each configuration, run the experiment 1000 times with an update frequency of 10. Note the size and location of the confidence intervals and note how well the proportion of successful intervals approximates the theoretical confidence level.

Computational Exercises

$Mathematical Exercise$ 6. For both procedures, show that a 1 - a confidence interval, lower bound, and upper bound for d can be obtained by taking the square root of the corresponding confidence bounds for d².

$Mathematical Exercise$ 7. Suppose that the weight of a bag of potato chips (in grams) is a random variable with unknown mean ľ and variance d². A sample of 75 bags has mean 250 and standard deviation 10. Construct the 90% confidence interval for d.

$Mathematical Exercise$ 8. At a telemarketing firm, the length of a telephone solicitation (in seconds) is a random variable with unknown mean ľ and variance d². A sample of 50 calls has mean length 300 and standard deviation 30. Construct the 95% confidence upper bound for d.

9. Using Michelson's data, construct the 95% two-sided confidence interval, the confidence upper bound, and the confidence lower bound for the standard deviation of the speed of light in air. Assume that the "true value" is the known mean..

10. Using Cavendish's data, construct the 95% confidence interval, confidence upper bound, and confidence lower bound for the standard deviation of the density of the earth. Assume that the "true value" is the known mean.

11. Using Short's data, construct the 95% two-sided confidence interval, the confidence upper bound, and the confidence lower bound for the standard deviation of the parallax of the sun. Assume that the "true value" is the known mean.

12. For the length of a Sertosa iris petal in Fisher's iris data, Construct the 90% confidence interval for d.