Virtual Laboratories > Finite Sampling Models > 1 2 [3] 4 5 6 7 8 9 10
Suppose again that we have a dichotomous population D with R objects of type 1 and N - R of type 2. As in the introduction, we sample n objects at random from D:
X = (X1, X2, ..., Xn), where Xi in D is the i'th object chosen.
In many real problems, the parameters R or N (or both) may be unknown. In this case we are interested in drawing inferences about the unknown parameters based on our observation of Y, the number of type 1 objects in the sample. We will assume initially that the sampling is without replacement, the realistic setting in most applications. In this case, recall that Y has the hypergeometric distribution with parameters n, R, and N.
Suppose that the size of the population N is known but that the number of type 1 object R is unknown. This type of problem could arise, for example, if we had a batch of N computer chips containing an unknown number R of defectives. It would be too costly and perhaps destructive to test all N chips, so we might instead select n chips at random and test those.
A simple estimator of R can be derived by hoping that the sample proportion of type 1 objects is close to the population proportion of type 1 objects. That is,
Y / n ~ R / N so R ~ N Y / n.
1. Show
that E(N Y / n) = R.
The result in Exercise 1 means that N Y / n is an unbiased estimator of R. Hence the variance is a measure of the quality of the estimator, in the mean square sense.
2. Show
that var(N Y / n) = R (N - R) (N - n)
/ [n (N - 1)].
3. Show that for
fixed N and R, the mean square error decreases to 0 as n
increases to N.
Thus, the estimator improves as the sample size increases; this property is known as consistency.
4. In the
ball
and urn experiment, select sampling without replacement and set N = 50, R =
20, and n = 10. Run the experiment 100 times, updating after each run.
5. Suppose that 10
memory chips are sampled at random and without replacement from a batch of 100 chips. The
chips are tested and 2 are defective. Estimate the number of defective chips in the entire
batch.
6. A voting
district has 5000 registered voters. Suppose that 100 voters are selected at random and
polled, and that 40 prefer candidate A. Estimate the number of voters in the
district who prefer candidate A.
Sometimes we are not interested in estimating R, but just in determining whether R meets or exceeds a critical value C. In particular, this situation arises in acceptance sampling. Suppose that we have a population of items that are either good or defective. If the number of defective items R is at least C (the critical value), then we would like to reject the entire lot. However, testing the items is expensive and destructive, so we must test a random sample of n items (drawn without replacement, of course) and base our decision to accept or reject the lot on the number of defectives in the sample. Clearly, the only reasonable approach is to choose another critical value c and reject the lot if the number of defectives in the sample is at least c. In statistical terms, we have described an hypothesis test.
In the following problems, suppose that N = 100 and C = 10. Thus we would like to reject the lot of 100 items if the number of defectives R is10 or more. Suppose that we can only afford to sample and test n = 10 items.
We will first study the following test: Reject the lot if the number of defectives in the sample is at least 1.
7. For each of the following values of R (the true number
of defectives), find the probability that we make the correct decision and
the probability that we make the wrong decision:
8. In
the ball
and urn experiment, select sampling without replacement, and set N = 100, n = 10.
For each of the values of R in Exercise 7, run the experiment 1000 times, updating every 100 runs. Compute
the relative frequency of rejections and compare with the true probability in Exercise
7.
Now we will study the following test: Reject the lot if the number of defectives in the sample is at least 2.
9. For each of the following values of R (the true number
of defectives), find the probability that we make the correct decision and
the probability that we make the wrong decision:
10. In
the ball
and urn experiment, select sampling without replacement, and set N = 100, n = 10.
For each of the values of R in Exercise 9, run the experiment 1000 times, updating every 100 runs. Compute
the relative frequency of rejections and compare with the true probability in Exercise
9.
11. Of the two tests studied above,
Suppose now that the number of type 1 objects R is known, but the population size N is unknown. As an example of this type of problem, suppose that we have a lake containing N fish where N is unknown. We capture R of the fish, tag them, and return them to the lake. Next we capture n of the fish and observe Y, the number of tagged fish in the sample. We wish to estimate N from this data. In this context, the estimation problem is sometimes called the capture-recapture problem.
12. Do you think
that the main assumption of the ball and urn experiment, namely equally likely samples,
would be satisfied for a real capture-recapture problem? Explain.
Once again, we can derive a simple estimate of N by hoping that the sample proportion of type 1 objects is close the population proportion of type 1 objects. That is,
Y / n ~ R / N so N ~ nR / Y (if Y > 0).
Thus, our estimator of N is nR / Y if Y > 0 and is undefined if Y = 0.
13. In
the ball
and urn experiment, select sampling without replacement and set N =
80, R = 30, and n = 20. Run the experiment 100 times, updating after each
run.
14. From a certain
lake, 200 fish are caught, tagged and returned to the lake. Then 100 fish are caught and
it turns out that 10 are tagged. Estimate the population of fish in the lake.
15. Show
that if k > 0, then nR / k maximizes P(Y = k) as a
function of N for fixed R and n. This means that nR / Y is the
maximum likelihood estimator of N.
16. Use Jensen's
inequality to show that E(nR / Y)
N.
Thus, the estimator is biased
and tends to over-estimate N. Indeed, if n N -
R, so that P(Y = 0) > 0, E(nR / Y)
is infinite.
17. In
the ball
and urn experiment, select sampling without replacement and set N =
100, R = 60, and n = 30. Run the experiment 100 times, updating after each
run. On each run, compute nR / Y, the estimate of N. Average the
estimates and compare with N.
For another approach to estimating N, see the section on Order Statistics.
Suppose now that the sampling is with replacement, even though this is unrealistic in most applications. In this case, Y has the binomial distribution with parameters n and R / N.
18. Show that
Thus, the estimator of R with N known is still unbiased, but has larger mean square error. Thus, sampling without replacement works better, for any values of the parameters, than sampling with replacement.