Virtual Laboratories > Interval Estimation > 1 [2] 3 4 5 6
Suppose that X1, X2, ..., Xn is a random sample from the normal distribution with mean µ and variance d2. In this section we will construct confidence intervals for µ, one of the most important special cases of interval estimation. A parallel section on Tests for the Mean in the Normal Model is in the chapter on Hypothesis Testing.
As usual, we will construct the confidence intervals by finding pivotal variables for µ. The construction depends on whether the standard deviation d is known or unknown; thus d is a nuisance parameter for the problem of estimating µ. The key elements in the construction of the intervals are the sample mean and sample variance
and the special properties of these statistics when the sampling distribution is normal. Recall also that the normal family is a location-scale family.
Suppose first that d is known; this is usually, but not always an artificial assumption. Recall that the standard score
Z = (M - µ) / (d / n1/2)
has the standard normal distribution, and hence is a pivotal variable for µ. Now for p in (0, 1), let zp denote the quantile of order p for the standard normal distribution. For selected values of p, zp can be obtained from the last row of the table of the t distribution, or from the table of the standard normal distribution, or from the quantile applet.
1. Use the pivotal
variable Z to show that a 1 - a confidence interval, confidence upper
bound, and confidence lower bound for µ are given as follows:
Note that we have used the equal-tail choice in the construction of the two-sided interval, and hence this interval is symmetric with respect to the sample mean M.
2. Use the
mean estimation experiment to explore the procedure. Select the normal
distribution and select normal pivot. Use various parameter values, confidence levels,
sample sizes, and interval types. For each configuration, run the experiment 1000 times
with an update frequency of 10. As the simulation runs, note that the confidence interval
successfully captures the mean if and only if the value of the pivot variable is between
the quantiles. Note the size and location of the confidence intervals and note how well
the proportion of successful intervals approximates the theoretical confidence level.
Let E denote the distance between the sample mean M and one of the confidence bounds:
E = z d / n1/2,
where z = z1 - r/2 for the two-sided interval and z = z1 - r for the confidence lower bound or upper bound. Note that E is deterministic, and the length of the two-sided interval is 2E. The number E is sometimes called the margin of error.
3. Show that
Exercise 3(c) shows again that there is a tradeoff between the confidence level and the size of the confidence interval. If n and d are fixed, we can decrease E, and hence tighten our estimate, only at the expense of decreasing our confidence in the estimate. Conversely, we can increase our confidence in the estimate only at the expense of enlarging E. In many cases, the first step in the design of the experiment is to determine the sample size needed to estimate µ with a given margin of error and a given confidence level.
4. Show that the
sample size needed to estimate µ with confidence 1 - r and margin of
error E is
n = ceil[(zd / E)2].
Note that n varies directly with square of z2 and with d2 and inversely with E2. This last fact implies a law of diminishing return in reducing the margin of error. For example, if we want to reduce a given margin of error by a factor of 1/2, we must increase the sample size by a factor of 4.
Consider now the more realistic case in which d is also unknown. Recall that
T = (M - µ) / (S / n1/2)
has the student t distribution with n - 1 degrees of freedom, and hence is a pivotal variable for µ. Now for k > 0 and p in (0, 1), let tk, p denote the quantile of order p for the t distribution with n - 1 degrees of freedom. For selected values of k and p, values tk, p can be obtained from the table of the t distribution or from the quantile applet.
5. Use the pivotal
variable T to show that a 1 - a confidence interval, confidence upper
bound, and confidence lower bound for µ are given as follows:
Note that we have used the equal-tail choice in the construction of the two-sided interval, and hence this interval is symmetric with respect to the sample mean. Note also that the length of the confidence interval is random, as well as the center.
6. Use the
mean estimation experiment to explore the procedure. Select the normal
distribution and select student pivot. Use various parameter values, confidence levels,
sample sizes, and interval types. For each configuration, run the experiment 1000 times
with an update frequency of 10. As the simulation runs, note that the confidence interval
successfully captures the mean if and only if the value of the pivot variable is between
the quantiles. Note the size and location of the confidence intervals and note how well
the proportion of successful intervals approximates the theoretical confidence level.
One of the key assumptions that we made was that the underlying distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the underlying distribution, let alone whether or not it is normal. Suppose in fact that the underlying distribution is not normal. When n is relatively large, the distribution of the sample mean will still be approximately normal by the central limit theorem, and thus our derivation should still be approximately valid. The following exercises allow you to explore the robustness of the procedure.
7. Use the
simulation of the mean estimation experiment
to explore the procedure. Select the gamma
distribution and select student pivot. Use various parameter values, confidence levels,
sample sizes, and interval types. For each configuration, run the experiment 1000 times
with an update frequency of 10. Note the size and location of the confidence intervals and
note how well the proportion of successful intervals approximates the theoretical
confidence level.
8.
In the mean estimation experiment,
repeat the previous exercise with the uniform distribution.
How large n needs to be for the testing procedure to work well depends, of course, on the underlying distribution; the more this distribution deviates from normality, the larger n must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.
9. The length of a
certain machined part is supposed to be 10 centimeters but due to imperfections in the
manufacturing process, the actual length is a normally distributed with mean µ and
variance d2. The variance is due to inherent factors in the process,
which remain fairly stable over time. From historical data, it is known that d
= 0.3. On the other hand, µ may be set by adjusting various parameters in the process and
hence may change to an unknown value fairly frequently. A sample of 100 parts has mean
10.2. Construct the 95% confidence interval for µ.
10. Suppose that
the weight of a bag of potato chips (in grams) is a random variable with mean µ and
variance d2, both unknown. A sample of 75 bags has mean 250 and
standard deviation 10. Construct the 90% confidence interval for µ.
11. At a
telemarketing firm, the length of a telephone solicitation (in seconds) is a random
variable wit mean µ and variance d2, both unknown. A sample of 50
calls has mean length 300 and standard deviation 30. Construct the 95% confidence upper
bound for µ.
12. At a certain
farm the weight of a peach (in ounces) at harvest time is a random variable with standard
deviation 0.5. How many peaches must be sampled to estimate the mean weight with a margin
of error ± 0.2 and with 95% confidence.
13. The hourly
salary for a certain type of construction work is a random variable with standard
deviation 1.25. How many workers must be sampled to construct a 95% confidence lower bound
with margin of error 0.25.
14.
Construct the 95% two-sided confidence interval, the confidence upper bound, and the
confidence lower bound for the speed of light in air, using Michelson's
data. In each case, note whether the "true" value is in the confidence
interval.
15.
Construct the 95% confidence interval, confidence upper bound, and confidence lower bound
for the density of the earth, using Cavendish's
data. In
each case, note whether the "true" value is in the confidence interval.
16.
Construct the 95% two-sided confidence interval, the confidence upper bound, and the
confidence lower bound for the parallax of the sun, using Short's
data. In each case, note whether the "true" value is in the confidence
interval.
17.
For the length of a Sertosa iris petal in Fisher's iris
data,
Construct the 90% confidence interval for µ.