Virtual Laboratories > Finite Sampling Models > 1 [2] 3 4 5 6 7 8 9 10
Suppose that we have a dichotomous population D that consists of two types of objects. For example, we could have balls in an urn that are either red or green, a batch of electronic components that are either good or defective, a population of people who are either male or female, or a population of animals that are either tagged or untagged. Let D1 denote the subset of D consisting of the type 1 objects, and suppose that D1 has cardinality R. As in the basic sampling model, we sample n objects at random from D:
X = (X1, X2, ..., Xn), where Xi in D is the i'th object chosen.
In this section, we are interested in the the random variable Y that gives the number of type 1 objects in the sample. Note that Y is a counting variable, and thus like all counting variables, can be written as a sum of indicator variables.
1. Show that
Y = I1 + I2 + ··· + In
where Ii = 1 if Xi is in D1
(the i'th object is type 1) and Ii = 0 otherwise.
We will assume initially that the sampling without replacement, which is usually the realistic setting with dichotomous populations.
Recall that since the sampling is without replacement, the unordered sample is uniformly distributed over the set of all combinations of size n chosen from D. This observation leads to a simple combinatorial derivation of the density of Y.
2. Show that for k
= max{0, n - (N - R)}, ..., min{n, R},
P(Y = k) = C(R, k) C(N - R, n - k) / C(N, n).
This is known as the hypergeometric distribution with parameters N, R, and n. If we adopt the convention that C(j, i) = 0 for i > j then the formula for the density function is correct for k = 0, 1, ..., n.
3. Show the
following alternative form of the hypergeometric density in two ways: combinatorially
by treating the outcome as a permutation of size n
chosen from the population of N balls, and algebraically, starting from the
result in Exercise 2.
P(Y = k) = C(n, k) (R)k (N - R)n - k / (N)n for k = 0, 1, ..., n.
4. In the
ball and urn
experiment, select sampling without replacement. Vary the
parameters and note the shape of the graph of the density function. Now let N = 50,
R = 30, and n = 10 and run the experiment with an update frequency of 100.
Watch the apparent convergence of the relative frequency function to the density function.
In the following exercises, we will derive the mean and variance of Y. The exchangeable popery of the indicator variables, and properties of covariance and correlation will play a key role.
5. Show E(Ii)
= R / N for any i.
6. Show
that E(Y) = n (R / N).
8. Show
that var(Ii) = (R / N) (1 - R / N)
for any j.
9. Show
that for distinct i and j,
Note from Exercise 9 that the event of a type 1 object on draw i and the event of a type 1 object on draw j are negatively correlated, but the correlation depends only on the population size and not on the number of type 1 objects. Note also that the correlation is perfect if N = 2. Think about these result intuitively.
10. In the
ball and urn
experiment, set N = 50, R = 20, and n =
10. Now run the experiment 500 times, updating after each run. Compute the empirical
correlation of the events of a red ball on draw 3 and a red ball on draw 7. Compare with
the theoretical result in the last exercise.
11. Use the
results of Exercises 8 and 9 to show that
var(Y) = n (R / N)(1 - R / N) (N - n) / (N - 1).
Note that var(Y) = 0 if R = 0, R = N, or n = N. Think about these results.
14. In the
ball and urn
experiment, select sampling without replacement. Vary the
parameters and note the size and location of the mean/standard deviation bar. Now let N
= 50, R = 30, and n = 10 and run the experiment with an update frequency of
100. Watch the apparent convergence of the empirical moments to the true moments.
15. A batch of 100
computer chips contains 10 defective chips. Five chips are chosen at random, without
replacement.
16. A club
contains 50 members; 20 are men and 30 are women. A committee of 10 members is chosen at
random.
Suppose now that the sampling is with replacement, even though this is usually not realistic in applications.
17. Show that I1,
I2, ..., In form a sequence of n Bernoulli trials with success parameter R / N.
The following results now follow immediately from the general theory of Bernoulli trials, although modifications of the arguments above could also be used.
18. Show that Y
has the binomial distribution with parameters n
and R / N:
P(Y = k) = C(n, k) (R / N)k(1 - R / N)n - k for k = 0, 1, ..., n.
19. Show that
Note that for any values of the parameters, E(Y) is the same, whether
the sampling is with or without replacement. On the other hand, var(Y) is smaller,
by a factor of
Suppose that the population size N is very large compared to the sample size n. In this case, it seems reasonable that sampling without replacement is not too much different than sampling with replacement, and hence the hypergeometric distribution should be well-approximated by the binomial. The following exercise makes this observation precise. Practically, it is a valuable result, since in many cases we do not know the population size exactly.
20. Suppose that R
depends on N and that
R / N p in [0,
1] as N
.
Show that for fixed n, the hypergeometric density function with parameters N, R, and n converges to the binomial density with parameters n and p. Hint: Use the representation in Exercise 3.
21. In the
ball and urn
experiment, vary the parameters and switch between sampling
without replacement and sampling with replacement. Note the difference between the graphs
of the hypergeometric density and the binomial density. Now set N = 100, n
= 10, and R = 30. Run the simulation 1000 times, updating every 100 runs. Compare
the relative frequency function, the hypergeometric density function, and the
approximating binomial density function.
22. A small pond
contains 1000 fish; 100 are tagged. Suppose that 20 fish are caught.
23. Forty percent
of the registered voters in a certain district prefer candidate A. Suppose that
10 voters are chosen at random. Find the probability that at least 5 prefer candidate A.
24.
In the setting of Exercise 20, show that the mean and variance of the
hypergeometric distribution converge to the mean and variance of the binomial
distribution as as N
.