Virtual Laboratories > Finite Sampling Models > 1 2 3 [4] 5 6 7 8 9 10
Suppose now that we have a multi-type population, in which each object is one of k types. For example, we could have an urn with balls of several different colors, or a population of voters who are either democrat, republican, or independent. Let Di denote the subset of all type i objects and let Ni denote the number of type i objects, for i = 1, 2, ..., k. Thus
D = D1 D2
···
Dk
and N = N1 + N2 + ··· + Nk.
The dichotomous model considered earlier is clearly a special case, with k = 2. As in the basic sampling model, we sample n objects at random from D:
X = (X1, X2, ..., Xn), where Xi in D is the i'th object chosen.
Now let Yi denote the number of type i objects in the sample, for i = 1, 2, ..., k. Note that
Y1 + Y2 + ··· + Yk = n,
so if we know the values of k - 1 of the counting variables, we can find the value of the remaining counting variable. As with any counting variable, we can express Yi as a sum of indicator variables:
1. Show that Yi
= Ii1 + Ii2 + ··· + Iin
where Iij = 1 if Xj in Di and Iij
= 0 otherwise.
We assume initially that the sampling is without replacement, since this is the realistic case in most applications.
Basic combinatorial arguments can be used to derive the joint density of the counting variables. Recall that since the sampling is without replacement, the unordered sample is uniformly distributed over the combinations of size n chosen from D.
2. Show that for
nonnegative integers j1, j2, ..., jk
with j1 + j2 + ··· + jk = n,
P(Y1 = j1, Y2 = j2, ..., Yk = jk) = C(N1, j1)C(N2, j2) ··· C(Nk, jk) / C(N, n).
The distribution of (Y1, Y2, ..., Yk) is called the multivariate hypergeometric distribution with parameters N, N1, N2, ..., Nk, and n. We also say that (Y1, Y2, ..., Yk - 1) has this distribution (recall again that the values of any k - 1 of the variables give the value of the remaining variable). Usually it is clear from context which meaning is intended. The ordinary hypergeometric distribution corresponds to k = 2.
3. Show the
following alternate from of the multivariate hypergeometric density in two ways:
combinatorially, by considering the ordered sample uniformly distributed over the
permutations of size n chosen from D, and algebraically, starting with
the result in Exercise 2.
P(Y1 = j1, Y2 = j2, ..., Yk = jk) = C(n; j1, j2, ..., jk) (N1)j1(N2)j2··· (Nk)jk / (N)n.
4. Show that Yi
has the hypergeometric distribution with parameters N, Ni,
and n:
P(Yi = j) = C(Ni, j)C(N - Ni, n - j) / C(N, n) for j = 0, 1, ..., n.
The multivariate hypergeometric distribution is preserved when the counting variables are combined. Specifically, suppose that A1, A2, ..., Al is a partition of the index set {1, 2, ..., k} into nonempty subsets. For each j, let Wj denote the sum of Yi over i in Aj, and let Mj denote the sum of Ni over i in Aj.
5. Show that (W1,
W2, ..., Wl) has the multivariate hypergeometric
distribution with parameters N,
The multivariate hypergeometric distribution is also preserved when some of the counting variables are observed. Specifically, suppose that A, B is a partition of the index set {1, 2, ..., k} into nonempty subsets. Suppose that we observe Yj = yj for j in B. Let z denote the sum of yj over j in B. and let M denote the sum of Ni over i in A.
6. Show that the
conditional distribution of Yi, i in A given Yj
= yj, j in B is multivariate hypergeometric with
parameters M, Ni, for i in A, and n
- z.
Combinations of the basic results in Exercises 5 and 6 can be used to compute any marginal or conditional distributions of the counting variables.
We will compute the mean, variance, covariance, and correlation of the counting variables. Results from the hypergeometric distribution and the representation in terms of indicator variables are the main tools.
7. Show that
8. Suppose that i
and j are distinct. Show that
9. Suppose that i
and j are distinct. Show that
In particular, Iir, Ijr are negatively correlated for distinct i, j and for any r, s. Does this result seem reasonable?
10. Use the result
of Exercises 7 and 8 to show that for distinct i and j,
Suppose now that the sampling is with replacement, even though this is usually not realistic in applications.
11. Show that the
types of the objects in the sample form a sequence of n multinomial trials with parameters N1 / N,
N2 / N, ..., Nk / N.
The following results now follow immediately from the general theory of multinomial trials, although modifications of the arguments above could also be used.
12. Show that (Y1,
Y2, ..., Yk) has the multinomial distribution with
parameters n and N1 / N, N2 / N,
..., Nk / N: for nonnegative integers j1,
j2, ..., jk with j1 + j2
+ ··· + jk = n,
P(Y1 = j1, Y2 = j2, ..., Yk = jk) = C(n; j1, j2, ..., jk) N1j1N2j2··· Nkjk / Nn.
13.
Show that
Suppose that the population size N is very large compared to the sample size n. In this case, it seems reasonable that sampling without replacement is not too much different than sampling with replacement, and hence the multivariate hypergeometric distribution should be well-approximated by the multinomial. The following exercise makes this observation precise. Practically, it is a valuable result, since in many cases we do not know the population size exactly.
14. Suppose that Ni
depends on N and that
Ni / N pi
in [0, 1] as N
for i = 1, 2, ..., k.
Show that for fixed n, the multivariate hypergeometric density function with parameters N, N1, N2, ..., Nk, and n converges to the multinomial density with parameters n and p1, p2..., pk. Hint: Use the representation in Exercise 3.
15. Suppose that a
bridge hand (13 cards) is dealt at random from a standard deck of 52 cards. Find the
probability that the hand has
16. Suppose that a
bridge hand (13 cards) is dealt at random from a standard deck of 52 cards. Find the
17. A population
of 100 voters consists of 40 republicans, 35 democrats and 25 independents. A random
sample of 10 voters is chosen.
18. A bridge hand
(13 cards) is dealt at random from a deck of 52 cards. Find the conditional probability
that the hand has
In the card experiment, a hand that does not contain any cards of a particular suit is said to be void in that suit.
19. Use the
inclusion-exclusion rule to show that the probability that a
poker hand is void in at least one suit is
1,913,496 / 2,598,960 ~ 0.736.
20. In the
card
experiment, set n = 5. Run the simulation 1000 times,
updating after each run. Compute the relative frequency of the event that the hand is void
in at least one suit. Compare the relative frequency with the true probability given in
Exercise 10.
21. Use the
inclusion-exclusion rule to show that the probability that a bridge hand is void in at
least one suit is
32,427,298,180 / 635,013,559,600 ~ 0.051.