Virtual Laboratories > Random Samples > 1 [2] 3 4 5 6 7 8 9
As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a real-valued random variable. We will denote the mean and standard deviation of X by µ and d respectively.
Now suppose we perform independent replications of the basic experiment. This defines a new, compound experiment with a sequence of independent random variables, each with the same distribution as X:
X1, X2, ...,
Recall that in statistical terms, (X1, X2, ..., Xn) is a random sample of size n from the distribution of X for each n. The sample mean is simply the average of the variables in the sample:
Mn
= (X1 + X2 + ··· + Xn) / n.The sample mean is a real-valued function of the random sample and thus is a statistic. Like any statistic, the sample mean is itself a random variable with a distribution, mean, and variance of its own. Many times, the distribution mean is unknown and the sample mean is used as an estimator of the distribution mean.
1. In the
dice
experiment, select the average random variable. For each die
distribution, start with n = 1 die and increase the number of dice by one until
you get to n = 20 dice. Note the shape and location of the density function at
each stage. With 20 dice, run the simulation 1000 times with an update frequency of 10.
Note the apparent convergence of the empirical density function to the true density
function.
2.
Show that E(Mn) = µ.
Exercise 1 shows that Mn is an unbiased estimator of µ. Therefore, the variance of the sample mean is the mean square error, when the sample mean is used as an estimator of the distribution mean.
3.
Show that var(Mn) = d2 / n.
From Exercise 3, the variance of the sample mean is an increasing function of the distribution variance and a decreasing function of the sample size. Both of these make intuitive sense if we think of the sample mean as an estimator of the distribution mean.
4. In the
dice
experiment, select the average random variable. For each die
distribution, start with n = 1 die and increase the number of dice by one until
you get to n = 20 dice. Note that the mean of the sample mean stays the same, but
the standard deviation of the sample mean decreases (as we now know, in inverse proportion
to the square root of the sample size). Run the simulation 1000 times, updating every 10
runs. Note the apparent convergence of the empirical moments of the sample mean to the
true moments.
5. Compute the
sample mean of the petal width variable for the following cases in Fisher's iris
data. Compare the results.
By Exercise 3, note that var(Mn)
0 as n
.
This means that Mn
µ as n
in mean square.
6. Use Chebyshev's
inequality to show that
P[|Mn - µ| > r] 0 as n
for any r > 0.
This result is known as the weak law of large numbers, and states that the sample mean converges to the mean of the distribution in probability. Recall that in general, convergence in mean square implies convergence in probability.
The strong law of large numbers states that the sample mean Mn converges to the distribution mean µ with probability 1:
P(Mn µ as n
) = 1.
As the name suggests, this is a much stronger result than the weak law. We will construct a fairly simple proof under the assumption that the 4'th central moment is finite:
b4 = E[(X - µ)4] < .
However, there are better proofs that do not need this assumption--see for example, the book Probability and Measure by Patrick Billingsley.
7. Let Yi
= Xi - µ. and let Wn = Y1 + Y2
+ ··· + Yn. Show that
By Exercise 7, we want to show that with probability 1, Wn / n 0 as n
.
8. Show that Wn / n
does not converge to 0 if and only if there exists a rational number r > 0
such that |Wn / n| > r for infinitely
many n.
Thus, we need to show that the event described in Exercise 8 has probability 0.
9. Show that Wn4
is the sum of YiYjYkYl over all i,
j, k, l in {1, 2, ..., n}.
10. Show that
11. Use the
results in Exercise 10 to show that E(Sn4)
Cn2 for some constant C
(independent of n).
12. Use Markov's
inequality and the result of Exercise 11 to show that for r > 0,
P(|Wn / n| > r) = P(Wn4
> r4n4) C
/ (r4n2).
13. Use the
first Borel-Cantelli lemma to show that
P(|Wn / n| > r for infinitely many n) = 0.
14. Finally,
show that
P(there exists rational r > 0 such that |Wn / n| > r for infinitely many n) = 0.
15. In the
dice
experiment, select the average random variable. For each die
distribution, start with n = 1 die and increase the number of dice by one until
you get to n = 20 dice. Note how the distribution of the sample mean begins to
resemble a point mass distribution. Run the simulation 1000 times, updating every 10 runs.
Note the apparent convergence of the empirical density of the sample mean to the true
density.
Many of the applets in this project are simulations of experiments with a basic random variable of interest. When you run the simulation, you are performing independent replications of the experiment. In most cases, the applet displays the mean of the distribution numerically in a table and graphically as the center of the blue horizontal bar in the graph box. When you run the simulation, sample mean is also displayed numerically in the table and graphically as the center of the red horizontal bar in the graph box.
16. In the
simulation of the binomial coin experiment, the
random variable is the number of heads. Run the simulation 1000 times updating every 10
runs and note the apparent convergence of the sample mean to the distribution mean.
17. In the
simulation of the matching experiment, the random variable
is the number of matches. Run the simulation 1000 times updating every 10 runs and note
the apparent convergence of the sample mean to the distribution mean.
18. Run the
simulation of the exponential experiment 1000 times
with an update frequency of 10. Note the apparent convergence of the sample mean to the
distribution mean.