Virtual Laboratories > Finite Sampling Models > 1 2 3 4 5 6 7 8 [9] 10
Our random experiment is to sample
repeatedly, with replacement, from the population
X1, X2, X3, ...
We will often interpret the sampling in terms of a coupon collector: each time the collector buys a certain product (bubble gum or Cracker Jack, for example) she receives a coupon (a baseball card or a toy, for example) which is equally likely to be any one of N types. Thus, in this setting, Xi is the coupon type received on the i'th purchase.
Let VN, n denote the number of distinct values in the first n selections, the random variable studied in the last section. Our interest is in this section is the sample size needed to get k distinct sample values:
WN, k = min{n: VN, n = k}, k = 1, 2, ..., N.
In terms of the coupon collector, this random variable gives the number of products required to get k distinct coupon types. Note that the possible values of WN, k are k, k + 1, k + 2, .... We will be particularly interested in WN,N, the sample size needed to get the entire population. In terms of the coupon collector, this is the number of products required to get the entire set of coupons.
1. In the
coupon collector experiment, set N = 50 and vary k. Note the shape of
and position of the density graph. With k = 20, run the experiment in single-step
mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an
update frequency of 10 and note the apparent convergence of the relative frequency
distribution to the true distribution.
Now let's find the distribution of WN, k. The results of the previous section will be very helpful
2. Argue that
WN, k = n if and only if VN, n - 1 = k - 1 and VN, n = k.
3. Use Exercise 2
and a conditional probability argument to show that
P(WN, k = n) = P(VN, n - 1 = k - 1)(N - k + 1) / N.
4. Use the result
of the last exercise and the distribution of VN, n - 1 from
the last section to show that for n = k, k + 1, ...,
P(WN,k = n) = C(N
- 1, k - 1) j
= 0, ..., k - 1 (-1)j C(k - 1, j)[(k
- 1 - j) / N]n - 1.
5. In the
coupon collector experiment, set N = 100 and vary k. Note the shape of
and position of the density graph. With k = 50, run the experiment in single-step
mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an
update frequency of 10 and note the apparent convergence of the relative frequency
function to the density function.
6. Suppose that
people are sampled at random until 10 distinct birth weeks are obtained. Find the
probability that no more than 12 persons are sampled.
7. Suppose that a
fair die is rolled until all 6 scores have occurred. Find the probability that the
number of rolls is at least 10.
8. A box of a
certain brand of cereal comes with a special toy. There are 10 different toys in all. Find
the probability that all 10 toys can be obtained with no more than 15 boxes of cereal.
We will now show that WN, k can be decomposed as a sum of k independent, geometrically distributed variables. This will provide some additional insight into the nature of the distribution and will make the computation of the mean and variance easy.
For i = 1, 2, ... N, let Zi denote the number of additional samples needed to go from i - 1 distinct values to i distinct values.
9. Argue that
Exercise 9 shows clearly that each time a new coupon is obtained, it becomes harder to get the next new coupon.
10. In the
coupon collector experiment, set N = 30 and vary k. Note the shape and
position of the density graph. With k = 25, run the experiment in single-step
mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an
update frequency of 10 and note the apparent convergence of the sample statistics to the
distribution parameters.
11. Use the
results of Exercise 9 to show that
12. Compute the
mean and variance of the number of persons that must be sampled to obtain 10 distinct
birth weeks.
13. Compute the
mean and standard deviation of the number of times a die must be rolled to obtain all 6
scores.
14. A box of a
certain brand of cereal comes with a special toy. There are 10 different toys in all. Find
the mean and standard deviation of the number of boxes that must be purchased to obtain a
complete set of toys.
15. Compute the
mean and standard deviation of the number of persons that must be sampled to obtain all
365 birthdays.
16. In the
coupon collector experiment, set N = 10 and vary k. Note the shape of
and position of the density graph. With k = 10, run the experiment in single-step
mode a few times and watch the outcomes. Finally, run the experiment 1000 times with an
update frequency of 10 and note the apparent convergence of the sample statistics to the
distribution parameters.
17. Use the Result
of Exercise 9 to show that the probability generating function of WN,
k is
GN, k(t) = tki
= 1, ..., k [N - (i - 1)] / [N - (i -
1)t] for |t| < N / (k - 1).
An alternate approach to the distribution of the sample size needed to get k distinct values is via a recursion formula.
18. Let cN,
k(n) = P(WN, k = n) for
n = k, k + 1, .... Use a conditional probability argument to
show that
cN, k(n + 1) = [(k - 1) / N]cN, k(n) + [(N - k + 1) / N]cN, k - 1(n).