Virtual Laboratories > Finite Sampling Models > 1 2 3 4 5 6 7 [8] 9 10
Suppose that our random experiment is to select a random sample of size n, with replacement, from the
population
X = (X1, X2, ..., Xn).
Recall that our basic modeling assumption is that X is uniformly distributed on the sample space
S = {1, 2, ..., N}n.
In this section, we are interested in the number of population values missing from the sample, and the number of (distinct) population values in the sample. Often, we will interpret the sampling experiment as a distribution of n balls into N cells; Xi is the cell number of ball i. In this model, our interest is in the number of empty cells and the number of occupied cells.
For i in D, let Yi denote the number of times that i occurs in the sample:
Yi = #{j {1, 2, ..., n}: Xj = i}.
1. Show that Y
= (Y1, Y2, ..., YN) has the multinomial distribution:
for nonnegative integers k1, ..., kN with k1
+ k2 + ··· + kN = n,
P(Y1 = k1, Y2 = k2, ..., YN = kN) = C(n; k1, k2, ..., kN) / Nn
We will now define the main random variables of interest: the number of population values missing in the sample:
UN, n = #{j {1, 2, ..., N}: Yj = 0},
and the number of (distinct) population values that occur in the sample:
VN, n = #{j {1, 2, ..., N}: Yj > 0}.
Clearly we must have
UN, n + VN, n = N,
so once we have the probability distribution and moments of one variable, we can easily find them for the other variable. Note also that the birthday event, that there is at least one duplication in the sample, can be written as
{VN, n < n} = {UN, n > N - n}.
2. In the
general birthday experiment, set N = 100. Vary n and note the shape of the
density graph of V and its position within the range. With n = 30, run
the experiment in single-step mode a few times and observe the outcomes. Now run the
experiment 1000 times with an update frequency of 10 and note the apparent convergence of
the relative frequency distribution to the true distribution.
For j in D, consider the event that j does not occur in the sample:
Aj = {Yj = 0}.
Let K be a subset of D with #(K) = k. Using the multiplication rule of combinatorics, it is easy to count the number of samples that do not contain any elements of K:
3. Show that
#[j
in K Aj] = (N - k)n.
Now the inclusion-exclusion rule of combinatorics can be used to count the number samples that are missing at least one population value:
4. Show that
#[j = 1,
..., N Aj] =
k
= 1, ..., N (-1)k - 1 C(N, k)
(N - k)n.
Once we have this, it's simple to count the number samples that contain all population values:
5. Show that
#[j
= 1, ..., N Ajc] =
k
= 1, ..., N (-1)k C(N, k)
(N - k)n.
Now we can use a two-step procedure to generate all samples that exclude exactly j population values: first, choose the j values that are to be excluded. Then select a sample of size n from the remaining population values so that none are excluded. Thus, we can use the multiplication principle of combinatorics to count the number of samples that exclude j population values.
6. Show that
#{UN,n = j} = C(N, j)
k = 0, ..., N
- j (-1)k C(N - j, k) (N
- j - k)n.
Finally, since the probability distribution of X on the sample space S is uniform, we can find the density function of the number of excluded values:
7. Show that for j
= max{N - n, 0}, ..., N - 1,
P(UN,n = j) = C(N,
j) k =
0, ..., N - j (-1)k C(N - j, k)
[1 - (j + k) / N]n.
Also, we can easily find the density function of the number distinct values in the sample:
8. Show that for j
= 1, 2, ..., min{N, n},
P(VN,n = j) = C(N,
j) k =
0, ..., j (-1)k C(j, k)
[(j - k) / N]n.
9. Suppose that 20
persons are chosen at random. Find the probability that there are at least 18 different
birth weeks represented.
10. In the
general birthday experiment, set N = 52. Vary n and note the shape and
location of the density function. With n = 20 run the experiment 1000 times with
an update frequency of 10 and note the apparent convergence of the relative frequency
function to the density function.
11. Suppose that
10 fair dice are rolled. Find the probability that there will 4 or fewer distinct scores.
12. In the
general birthday experiment, set N = 6. Vary n and note the shape and
location of the density function. With n = 10 run the experiment 1000 times with
an update frequency of 10 and note the apparent convergence of the relative frequency
function to the density function.
The distribution of the number of excluded values can also be obtained by a recursion argument.
13. Let aN,
n(j) = P(UN, n = j) for
j = max{N - n, 0}, ..., N - 1. Use probability
arguments to show that
14. Suppose that
20 persons are chosen at random. Find the probability that at least 3 birth months are not
represented.
15. In the
general birthday experiment, set N = 12. Vary n and note the shape and
location of the density function. With n = 20 run the experiment 1000 times with
an update frequency of 10 and note the apparent convergence of the relative frequency
function to the density function.
16. A fast food
restaurant gives away one of 10 different types of toy with the purchase of each kid's
meal. A family buys 15 kid's meals. Find the probability that at least 3 different toy
types are missing.
17. In the
general birthday experiment, set N = 10. Vary n and note the shape and
location of the density function. With k = 15 run the experiment 1000 times with
an update frequency of 10 and note the apparent convergence of the relative frequency
function to the density function.
Now we will find the means and variances. The number of excluded values and the number of distinct values are counting variables and hence can be written as sums of indicator variables. As we have seen in many other models, this representation is frequently the best for computing moments.
Let Ij = 1 if Aj occurs (j is not in the sample) and Ij = 0 if Aj does not occur (j is in the sample).
Note that the number of population values missing in the sample can be written as
UN, n = I1 + I2 + ··· + IN.
18. Show that
19. Use the
results of the previous exercise to show that
20. Use the result
of Exercise 18 to show that
19. Use the
results of the previous exercise, and basic properties of variance to show that
var(UN, n) = var(VN, n) = N(N - 1)(1 - 2 / N)n + N(1 - 1 / N)n - N2(1 - 1 / N)2n.
20. Suppose that
100 persons are chosen at random. Find the mean and standard deviation of the number of
distinct birthdays.
21. Suppose that
30 persons are chosen at random. Find the mean and standard deviation of the number of
distinct birth weeks.
22. In the
general birthday experiment, set N = 52. Vary n and note the size and
shape of the mean/standard deviation bar. With n = 30 run the experiment 1000
times with an update frequency of 10 and note the apparent convergence of the empirical
mean and standard deviation to the distribution mean and standard deviation, respectively.
23. Suppose that
20 persons are chosen at random. Find the mean and standard deviation of the number of
distinct birth months.
24. In the
general birthday experiment, set N = 12. Vary n and note the size and
shape of the mean/standard deviation bar. With n = 20 run the experiment 1000
times with an update frequency of 10 and note the apparent convergence of the empirical
mean and standard deviation to the distribution mean and standard deviation, respectively.
25. The lying students problem. Suppose that 3 students,
who ride together, miss a mathematics exam. They decide to lie to the
instructor by saying that the car had a flat tire. The instructor
separates the students and asks each of them which tire was flat. The students,
who did not anticipate this, select their answers independently and at random.
26. The duck hunter problem. Suppose that there are 5 duck
hunters, each a perfect shot. A flock of 10 ducks fly over, and each hunter
selects one duck at random and shoots.