Estimation in the Bernoulli Model

4. Estimation in the Bernoulli Model

Preliminaries

Suppose that I₁, I₂, ..., I_nis a random sample from the Bernoulli distribution with unknown parameter p in (0, 1). Thus, these are independent random variables taking the values 1 and 0 with probabilities p and 1 - p respectively. Usually, this model arises in one of the following contexts:

There is an event of interest in a basic experiment, with unknown probability p. We replicate the experiment n times and define I_i = 1 if and only if the event occurred on the i'th run.
We have a population of objects of several different types; p is the unknown proportion of objects of a particular type of interest. We select n objects at random from the population and let I_i = 1 if and only if the i'th object is of the type of interest. When the sampling is with replacement, these variables really do form a random sample from the Bernoulli distribution. When the sampling is without replacement, the variables are dependent, but the Bernoulli model may still be approximately valid. For more on these points, see the Ball and Urn Experiment.

In this section, we will construct confidence intervals for p. A parallel section on Tests in the Bernoulli Model is in the chapter on Hypothesis Testing.

Confidence Intervals for `p`

Recall that the mean and variance of the Bernoulli distribution are

E(I) = p, var(I) = p(1 - p).

Note that the sample mean M is the sample proportion of objects of the type of interest. By the central limit theorem,

Z = (M - p) / [M(1 - M) / n]^1/2

has approximately a standard normal distribution and hence is (approximately) a pivot variable for p.

$Mathematical Exercise$ 1. Use the pivotal variable Z to show that an approximate 1 - r level confidence interval, confidence upper bound, and confidence lower bound for p are given as follows:

[M - z_{1 -} _r_/2 [M(1 - M) / n]^1/2, M + z_{1 -} _r_/2 [M(1 - M) / n]^1/2].
M + z_{1 -}_r [M(1 - M) / n]^1/2.
M - z_{1 -}_r [M(1 - M) / n]^1/2.

The distribution of Z is closest to normal when p is near 1/2 and farthest from normal when p is near 0 or 1 (extreme).

2. Use the simulation of the proportion estimation experiment to explore the procedure. Use various values of p and various confidence levels, sample sizes, and interval types. For each configuration, run the experiment 1000 times with an update frequency of 10 and note how well the proportion of successful intervals approximates the theoretical confidence level.

$Mathematical Exercise$ 3. Show that the variance of the Bernoulli distribution is maximized when p = 1/2 and thus the maximum variance is 1/4.

$Mathematical Exercise$ 4. Use the result of the previous exercise to show that a conservative 1 - r level two-sided confidence interval, confidence lower bound, and confidence upper bound for p are given as follows:

[M - z_{1 -} _r_/2 / (2n^1/2), M + z_{1 -}_r_/2 / (2n^1/2)].
M + z_{1 -}_r / (2n^1/2).
M - z_{1 -}_r / (2n^1/2).

Thus, the conservative confidence intervals will be larger than the confidence intervals using the first procedure. The conservative estimate can be used to design the experiment.

$Mathematical Exercise$ 5. Suppose that p is to be estimated with margin of error E and with 1 - r confidence. Show that a conservative estimate of the sample size is

n = ceil[(z / 2E)²]

where z = z_{1 -} _r_/2 for a two-sided interval and z = z_{1 -}_r for a one-sided confidence interval.

$Mathematical Exercise$ 6. In a pole of 1000 registered voters in a certain district, 427 prefer candidate X. Construct the 95% two-sided confidence interval for the proportion of all registered voters in the district that prefer X.

$Mathematical Exercise$ 7. A coin is tossed 500 times and results in 302 heads. Construct the 95% confidence lower bound for the probability of heads. Do you believe that the coin is fair?

$Mathematical Exercise$ 8. A sample of 400 memory chips from a production line are tested, and 30 are defective. Construct the conservative 90% two-sided confidence interval for the proportion of defective chips.

$Mathematical Exercise$ 9. A drug company wants to estimate the proportion of persons who will experience an adverse reaction to a certain new drug. The company wants a two-sided interval with margin of error 0.03 with 95% confidence. How large should the sample be?

$Mathematical Exercise$ 10. An advertising agency wants to construct a 99% confidence lower bound for the proportion of dentists who recommend a certain brand of toothpaste. The margin of error is to be 0.02. How large should the sample be?

4. Estimation in the Bernoulli Model

Preliminaries

Confidence Intervals for p

Confidence Intervals for `p`