Virtual Laboratories > Finite Sampling Models > 1 2 3 4 5 [6] 7 8 9 10

6. The Matching Problem


The matching experiment is a random experiment can the formulated in a number of colorful ways:

These experiments are clearly equivalent from a mathematical point of view, and correspond to selecting the entire population, without replacement from the population D = {1, 2, ..., n}. Thus, the outcome variable

X = (X1, X2, ..., Xn)

is uniformly distributed on the sample space of permutations of D. The number of objects n is the basic parameter of the experiment.

Matches

We will say that a match occurs at position i if Xi = i. Thus, number of matches is the random variable Nn defined mathematically by

Nn = I1 + I2 + ··· + In where Ij = 1 if Xj = j and Ij = 0 otherwise.

To find the discrete density function of the number of matches, we need to count the number of permutations with a specified number of matches. This will turn out to be easy once we have counted the number of permutations with no matches; these are called derangements of {1, 2, ..., n}. We will denote the number of permutations with exactly k matches by

bn(k) = #{Nn = k} for k in {0, 1, ..., n}.

Derangements

Mathematical Exercise 1. Use basic properties of counting measure to show that

bn(0) = n! - #{Xi = i for some i}.

Mathematical Exercise 2. Use the inclusion-exclusion formula of combinatorics to show that

bn(0) = n! - sumj = 1, ..., n (-1)j nj.

where nj = sum{J: #(J) = j} #{Xi = i for i in J}.

Mathematical Exercise 3. Show that if #(J) = j then

#{Xi = i for i in J} = (n - j)!.

Mathematical Exercise 4. Use the results of Exercises 2 and 3 to show that

bn(0) = n! sumj = 0, ..., n (-1)j / j!.

Mathematical Exercise 5. Compute the number of derangements of 10 objects.

Permutations with k Matches

Mathematical Exercise 6. Show that the following two-step procedure generates all permutations with exactly k matches.

  1. Select the k integers that will match.
  2. Select a permutation of the remaining n - k integers with no matches.

Mathematical Exercise 7. Show that the number of ways of performing the steps in Exercise 6 are, respectively,

  1. C(n, k)
  2. bn - k(0)

Mathematical Exercise 8. Use the multiplication principle of combinatorics to show that

bn(k) = (n! / k!) sumj = 0, ..., n - k (-1)j / j!.

Mathematical Exercise 9. With n = 5, compute the number of permutations with k matches, for k = 0, ..., 5.

The Density Function

Mathematical Exercise 10. Use the result of Exercise 8 to show that the density function of Nn is

P(Nn = k) = (1 / k!) sumj = 0, ..., n - k (-1)j / j! for k = 0, 1, ..., n.

Simulation Exercise 11. In the matching experiment, start with n = 2 and click repeatedly on the scrollbar to increase n. Note how the graph of the density function changes. Now with n = 10, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of empirical density to the true density.

Mathematical Exercise 12. Compute explicitly the density function of N5.

Mathematical Exercise 13. Show that P(Nn = n - 1) = 0. Give a probabilistic proof by arguing that the event is impossible and an algebraic proof using the density function in Exercise 8.

The Poisson Approximation

Mathematical Exercise 14. Show that

P(Nn = k) converges to e-1 / k! as n converges to infinity.

As a function of k, the right-hand side of the expression in Exercise 1 is the Poisson density function with parameter 1. Thus, the distribution of the number of matches converges to the Poisson distribution with parameter 1 as the n increases. The convergence is very rapid. Indeed, the distribution of the number of matches with n = 10 is essentially the same as the distribution of the number of matches with n = 1000000!

Simulation Exercise 15. In the matching experiment, set n = 10. Run the experiment 1000 times with an update frequency of 10. Compare the relative frequencies, the true probabilities, and the limiting Poisson probabilities for the number of matches.

Moments

The mean and variance of the number of matches could be computed directly from the distribution. However, it is much better to use the representation in terms of indicator variables, The exchangeable property is an important tool in this section.

Mathematical Exercise 16. Show that E(Ij) = 1 / n for any j.

Mathematical Exercise 17. Show that E(Nn) = 1. Hint: Use the result of Exercise 1 and basic properties of expected value.

Thus, the expected number of matches is 1, regardless of the size of the permutation n.

Simulation Exercise 18. In the matching experiment, start with n = 2 and click repeatedly on the scrollbar to increase n. Note that the mean does not change. Now with n = 10, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of sample mean to the distribution mean.

Mathematical Exercise 19. Show that var(Ij) = (n - 1) / n2. for any j.

A match in one position would seem to make it more likely that there would be a match in another position. Thus, we might guess that the indicator variables are positively correlated.

Mathematical Exercise 20. Show that if j and k are distinct then

  1. cov(Ij, Ik) = 1 / [n2(n - 1)].
  2. cor(Ij, Ik) = 1 / (n - 1)2.

From Exercise 20, when n = 2, the event that there is a match in position 1 is perfectly correlated with the event that there is a match in position 2. Does this result seem reasonable?

Mathematical Exercise 21. Show that var(Nn) = 1. Hint: Use Exercises 4 and 5 and basic properties of covariance.

Thus, the variance of the number of matches is 1, regardless of the size of the permutation n.

Mathematical Exercise 22. With n = 5, compute the covariance and correlation between a match in position j and a match in position k, where j and k are distinct.

Simulation Exercise 23. In the matching experiment, start with n = 2 and click repeatedly on the scrollbar to increase n. Note that the standard deviation does not change. Now with n = 10, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of sample standard deviation to the distribution standard deviation.

Mathematical Exercise 24. Show that for distinct j and k,

cov(Ij, Ik) 0 as n .

Thus, the event that a match occurs in position j is nearly independent of the event that a match occurs in position k if n is large. For large n, the indicator variables behave nearly like n Bernoulli trials with probability 1 / n of success. This gives additional insight into the convergence of the distribution of the number of matches to the Poisson distribution as n increases. Note also that the limiting Poisson distribution has mean 1 and variance 1.