The Central Limit Theorem

5. The Central Limit Theorem

Statement of the Theorem

The central limit theorem and the law of large numbers are the two fundamental theorems of probability. Roughly, the central limit theorem states that the distribution of the sum of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution. The importance of the central limit theorem is hard to overstate; indeed it is the reason that many statistical procedures work.

As usual, we start with a basic random experiment that has a sample space and a probability measure P. Suppose that X is a real-valued random variable with mean Е and standard deviation d respectively (which we assume are finite). Now suppose that we repeat the experiment over and over to form a sequence of independent random variables, each with the same distribution as X (that is, we sample from the distribution of X):

X₁, X₂, X₃, ...

Let Y_n = _i_{= 1, ...,} _nX_i denote the n'th partial sum. Note that M_n = Y_n / n is the sample mean of the first n sample variables.

$Mathematical Exercise$ 1. Show that if X has density f then the density of Y_n is f^*n, the n-fold convolution of f.

2. In the dice experiment, select the sum random variable. For each die distribution, start with n = 1 die and increase the number of dice by one until you get to n = 20 dice. Note the shape and location of the density function at each stage. With 20 dice, run the simulation 1000 times with an update frequency of 10. Note the apparent convergence of the empirical density function to the true density function.

In the last exercise, you should have been struck by the fact that the density of the sum becomes increasingly bell-shaped, as the sample size increases, regardless of the shape of the underlying density. Even more remarkably, this phenomenon is not just qualitative: one particular family of density functions (the normal family) describes the limiting distribution of the sum, regardless of the basic distribution we start with.

$Mathematical Exercise$ 3. Show (again!) that

E(Y_n) = nЕ.
var(Y_n) = nd².

4. In the dice experiment, select the sum random variable. For each die distribution, start with n = 1 die and increase the number of dice by one until you get to n = 20 dice. Note the shape and location of the density function, and the scale on the horizontal and vertical axes, at each stage. With 20 dice, run the simulation 1000 times with an update frequency of 10. Note the apparent convergence of the empirical density function to the true density function.

We will now make the central limit theorem precise. From Exercise 3, we cannot expect Y_n itself to have a limiting distribution; the variance of Y_n grows to infinity and, unless Е = 0, the mean drifts to either infinity (if Е > 0) or to negative infinity (if Е < 0). Thus, to obtain a limiting distribution that is not degenerate, we need to consider, not Y_n itself, but the standard score of Y_n. Thus, let

Z_n = (Y_n - nЕ) / (n^1/2 d).

$Mathematical Exercise$ 5. Show that E(Z_n) = 0 and var(Z_n) = 1.

$Mathematical Exercise$ 6. In the definition of Z_n, divide the numerator and denominator by n to show that Z_n is also the standard score of the sample mean M_n.

The central limit theorem states that the distribution of the standard score Z_n converges to the standard normal distribution as n increases to infinity.

Proof of the Central Limit Theorem

We need to show that

F_n(z) F(z) as n for each z in R,

where F_n is the distribution function of Z_n and F the distribution function of the standard normal distribution. However we will show instead that

G_n(t) exp(t² / 2) as n for each t in R.

where G_n is the moment generating function of Z_n and the expression on the right is the moment generating function of the standard normal distribution. This is a slightly less general version of the central limit theorem, because it requires that the moment generating function of the underlying distribution be finite on an interval about 0. For a proof of the general version, see for example, Probability and Measure by Patrick Billingsley.

The following exercises make up the proof of the central limit theorem. Ultimately, the proof hinges on a generalization of a famous limit from calculus.

$Mathematical Exercise$ 7. Suppose that a_n a as n . Show that

(1 + a_n / n)ⁿ e^a as n .

Now let

g(t) = E{exp[t(X_i - Е) / d]}
G_n(t) = E[exp(tZ_n)].

Note that g is the moment generating function of the standard score of a sample variable X_i and G_n is the moment generating function of the standard score Z_n.

$Mathematical Exercise$ 8. Show that

g(0) = 1
g'(0) = 0
g''(0) = 1

$Mathematical Exercise$ 9. Show that

Z_n = (1 / n^1/2) _{i
= 1, ..., n} [(X_i - Е) / d].

$Mathematical Exercise$ 10. Use properties of moment generating functions to show that

G_n(t) = [g(t / n^1/2)]ⁿ.

$Mathematical Exercise$ 11. Use Taylor's theorem with remainder to show that

g(t / n^1/2) = 1 + g''(s_n) t² /(2n) where |s_n| |t| / n^1/2.

$Mathematical Exercise$ 12. In the context of previous exercise, show that

s_n 0 and hence g''(s_n) 1 as n .

$Mathematical Exercise$ 13. Finally, show that

G_n(t) = [1 + g''(s_n) t² / (2n)]ⁿ exp(t² / 2) as n .

Normal Approximations

The central limit theorem implies that if the sample size n is "large," then the distribution of the partial sum Y_n (or equivalently the sample mean M_n) is approximately normal. This fact is of fundamental importance, because it means that we can approximate the distribution of certain statistics, even if we know very little about the underlying sampling distribution.

Of course, the term "large" is relative. Roughly, the more "abnormal" the basic distribution, the larger n must be for normal approximations to work well. The rule of thumb is that a sample size n of at least 30 will suffice; although for many distributions smaller n will do.

$Mathematical Exercise$ 14. Suppose that X₁, X₂, ..., X₃₀ is a random sample of size 30 from the uniform distribution on (0, 1). Let Y = X₁ + X₂ + ЗЗЗ + X₃₀. Find normal approximations to

P(13 < Y < 18).
The 90th percentile of Y.

$Mathematical Exercise$ 15. Let M denote the sample mean from a random sample of size 50 from the distribution with density function f(x) = 3x^-4, x > 0. Find normal approximations to

P(M > 1.6).
The 60th percentile of M.

A slight technical problem arises when the sampling distribution is discrete. In this case, the partial sum also has a discrete distribution, and hence we are approximating a discrete distribution with a continuous one.

$Mathematical Exercise$ 16. Suppose that X takes integer values and hence so doe the partial sum Y_n. Show that for any h in (0, 1], the event {k - h < Y_n < k + h}is equivalent to the event {Y_n = k}

In the context of the previous exercise, different values of h lead to different normal approximations, even though the events are equivalent. The smallest approximation would be 0 when h = 0, and the approximations increase as h increases. It is customary to split the difference by using h = 0.5 for the normal approximation. This is sometimes called the continuity correction. The continuity correction is extended to other events in the natural way, using the additivity of probability.

$Mathematical Exercise$ 17. Let Y denote the sum of the scores of 20 fair dice. Compute the normal approximation to

P(60 Y 75).

18. In the dice experiment, set the die distribution to fair, select the sum random variable Y, and set n = 20. Run the simulation 1000 times, updating every 10 runs. Compute the following and compare with the result in the previous exercise:

P(60 Y 75).
The relative frequency of the event {60 Y₂₀ 75}

Normal Approximation to the Gamma Distribution

If Y has the gamma distribution with shape parameter k and scale parameter b, and if k is a positive integer, then

Y = _i_{= 1, ...,} _nX_i

where X₁, X₂, ..., X_k are independent and each has the exponential distribution with scale parameter b. It follows that if k is large (and not necessarily integer), the gamma distribution can be approximated by the normal distribution with mean kb and variance kb².

19. In the gamma experiment, vary k and r and note the shape of the density function. With k = 10 and b = 2, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the empirical density function to the true density function.

$Mathematical Exercise$ 20. Suppose that Y has the gamma distribution with shape parameter k = 10 and scale parameter b = 2. Find normal approximations to

P(18 < Y < 23).
The 80th percentile of Y.

Normal Approximation to the Chi-Square Distribution

The chi-square distribution with n degrees of is the gamma distribution with parameters k = n / 2 and 1 / 2. From the central limit theorem, if n is large the chi-square distribution can be approximated by the normal distribution with mean n and variance 2n.

21. In the chi-square experiment, vary n and note the shape of the density function. With n = 20, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the empirical density function to the true density function.

$Mathematical Exercise$ 22. Suppose that Y has the chi-square distribution with n = 20 degrees of freedom. Find normal approximations to

P(18 < Y < 25).
The 75th percentile of Y.

Normal Approximation to the Binomial Distribution

If X has the binomial distribution with parameters n and p, then

X = _i_{= 1, ...,} _nI_i

where I₁, I₂, ..., I_n are independent indicator variables with P(I_j = 1) = p for each j. It follows that if n is large, the binomial distribution with parameters n and p can be approximated by the normal distribution with mean np and variance np(1 - p). The rule of thumb is that n should be large enough for np 5 and n(1 - p) 5.

23. In the binomial timeline experiment, vary n and p and note the shape of the density function. With n = 50 and p = 0.3, run the simulation 1000 times, updating every 10 runs. Compute the following:

P(12 X 16)
The relative frequency of the event {12 X 16}.

$Mathematical Exercise$ 24. Suppose that X has the binomial distribution with parameters n = 50 and p = 0.3. Compute the normal approximation to P(12 X 16) and compare with the results of the previous exercise.

Normal Approximation to the Poisson Distribution

If Y has the Poisson distribution with mean n, then

Y = _i_{= 1, ...,} _nX_i

where X₁, X₂, ..., X_k are independent and each has the Poisson distribution with mean 1. It follows from the central limit theorem that if Е is large (and not necessarily integer), the Poisson distribution with parameter Е can be approximated by the normal distribution with mean Е and variance Е.

$Mathematical Exercise$ 25. Suppose that Y has the Poisson distribution with mean 20. Find the normal approximation to

P(16 Y 13)