Virtual Laboratories > Expected Value > 1 [2] 3 4 5 6 7

2. Variance and Higher Moments


Definition

As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a random variable for the experiment, taking values in a subset S of R. Recall that the expected value or mean of X gives the center of the distribution of X. The variance of X is a measure of the spread of the distribution about the mean and is defined by

var(X) = E{[X - E(X)]2}

Thus, the variance is the second central moment of X.

Mathematical Exercise 1. Suppose that X has a discrete distribution with density function f. Use the change of variables theorem to show that

var(X) = sumx in S [x - E(X)]2 f(x).

Mathematical Exercise 2. Suppose that X has a continuous distribution with density function f. Use the change of variables theorem to show that

var(X) = integralS [x - E(X)]2 f(x)dx.

The standard deviation of X is the square root of the variance:

sd(X) = [var(X)]1/2.

It also measures dispersion about the mean but has the same physical units as the variable X.

Properties

The following exercises give some basic properties of variance, which in turn rely on basic properties of expected value:

Mathematical Exercise 3. Show that var(X) = E(X2) - [E(X)]2.

Mathematical Exercise 4. Show that var(X) 0

Mathematical Exercise 5. Show that var(X) = 0 if and only if P(X = c) = 1 for some constant c.

Mathematical Exercise 6. Show that if a and b are constants then var(aX + b) = a2var(X)

Mathematical Exercise 7. Let Z = [X - E(X)] / sd(X). Show that Z has mean 0 and variance 1.

The random variable Z in Exercise 7 is sometimes called the standard score associated with X. Since X and its mean and standard deviation all have the same physical units, the standard score Z is dimensionless. It measures the directed distance from E(X) to X in terms of standard deviations.

On the other hand, when E(X) is not zero, the ratio of standard deviation to mean is called the coefficient of variation:

sd(X) / E(X)

Note that this quantity also is dimensionless, and is sometimes used to compare variability for random variables with different means.

Examples and Special Cases

Mathematical Exercise 8. Suppose that I is an indicator variable with P(I = 1) = p.

  1. Show that var(I) = p(1 - p).
  2. Sketch the graph of var(I) as a function of p.
  3. Find the value of p that maximizes var(I).

Mathematical Exercise 9. The score on a fair die is uniformly distributed on {1, 2, 3, 4, 5, 6}. Find the mean, variance, and standard deviation..

Simulation Exercise 10. In the dice experiment, select one fair die. Run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation..

Mathematical Exercise 11. For an ace-six flat die, faces 1 and 6 have probability 1/4 each, and faces 2, 3, 4, 5 have probability 1/8 each. Find the mean, variance and standard deviation.

Simulation Exercise 12. In the dice experiment, select one ace-six flat die. Run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Mathematical Exercise 13. Suppose that X is uniformly distributed on {1, 2, ..., n}. Show that

var(X) = (n2 - 1) / 12.

Mathematical Exercise 14. Suppose that Y has density function f(n) = p(1 - p)n - 1 for n = 1, 2, ..., where 0 < p < 1 is a parameter. This defines the geometric distribution with parameter p. Show that

var(Y) = (1 - p) / p2.

Mathematical Exercise 15. Suppose that N as density function f(n) = exp(-t)tn / n! for n = 0, 1, ..., where t > 0 is a parameter. This defines the Poisson distribution with parameter t. Show that

var(N) = t.

Mathematical Exercise 16. Suppose that X is uniformly distributed on the interval (a, b) where a < b. Show that

var(X) = (b - a)2 / 12.

Note in particular that the variance depends only on the length of the interval, which is intuitively reasonable.

Mathematical Exercise 17. Suppose that X has density function f(x) = r exp(-rx) for x > 0. This defines the exponential distribution with rate parameter r > 0. Show that

sd(X) = 1 / r.

Simulation Exercise 18. In the gamma experiment, set k = 1 to get the exponential distribution. Vary r with the scroll bar and note the size and location of the mean-standard deviation bar. Now with r = 2, run the experiment 1000 times updating every 10 runs. Note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Mathematical Exercise 19. Suppose that X has density f(x) = a / xa + 1 for x > 1, where a > 0 is a parameter. This defines the Pareto distribution with shape parameter a. Show that

  1. var(X) = infinity if 1 < a <= 2
  2. var(X) = a / [(a - 1)2(a - 2)] if a > 2.

Mathematical Exercise 20. Suppose that Z has density f(z) = exp(-z2 / 2) / (2pi)1/2 for z in R. This defines the standard normal distribution. Show that

var(Z) = 1.

Hint: In the integral for E(Z2), integrate by parts.

Simulation Exercise 21. In the random variable experiment, select the normal distribution (the default parameter values give the standard normal distribution). Run the experiment 1000 times updating every 10 runs and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Mathematical Exercise 22. Suppose that X is a random variable with E(X) = 5, var(X) = 4. Find

  1. var(3X - 2)
  2. E(X2)

Mathematical Exercise 23. Suppose that X1 and X2 are independent random variables with E(Xi) = µi, var(X) = di2 for i = 1, 2. Show that

var(X1X2) = (d12 + µ12)(d22 + µ22) - µ12µ22.

Mathematical Exercise 24. Marilyn Vos Savant has an IQ of 228. Assuming that the distribution of IQ scores has mean 100 and standard deviation 15, find Marilyn's standard score.

Chebyshev's Inequality

Chebyshev's inequality (named after Pafnuty Chebyshev) gives an upper bound on the probability that a random variable will be more than a specified distance from its mean.

Mathematical Exercise 25. Use Markov's inequality to prove Chebyshev's inequality: for t > 0,

P[|X - E(X)| t] <= var(X) / t2.

Mathematical Exercise 26. Establish the following equivalent version of Chebyshev's inequality: for k > 0,

P[|X - E(X)| k sd(X)] <= 1 / k2.

Mathematical Exercise 27. Suppose that Y has the geometric distribution with parameter p = 3/4. Compute the true value and the Chebyshev bound for the probability that Y is at least 2 standard deviations away from the mean.

Mathematical Exercise 28. Suppose that X has the exponential distribution with rate parameter r > 0. Compute the true value and the Chebyshev bound for the probability that X is at least k standard deviations away from the mean.

Skewness and Kurtosis

Recall again that the variance of X is the second moment of X about the mean, and measures the spread of the distribution of X about the mean. The third and fourth moments of X about the mean also measure interesting features of the distribution. The third moment measures skewness, the lack of symmetry, while the fourth moment measures kurtosis, the degree to which the distribution is peaked. The actual numerical measures of these characteristics are standardized to eliminate the physical units, by dividing by an appropriate power of the standard deviation.

Thus, let µ = E(X) and d = sd(X). The skewness of X is defined to be

skew(X) = E[(X - µ )3] / d3.

The kurtosis of X is defined to be

kurt(X) = E[(X - µ )4] / d4.

Mathematical Exercise 29. Suppose that X has density f, which is symmetric with respect to µ. Show that skew(X) = 0.

Mathematical Exercise 30. Show that

skew(X) = [E(X3) - 3µE(X) + 2µ3] / d3.

Mathematical Exercise 31. Show that

kurt(X) = [E(X4) - 4µE(X) + 6µ2 E(X2) - 3µ4] / d4.

Mathematical Exercise 32. Graph the following density functions and compute the skewness and kurtosis of each. (These distributions are all members of the beta family).

  1. f(x) = 6x(1 - x), 0 < x < 1.
  2. f(x) = 12x2(1 - x), 0 < x < 1.
  3. f(x) = 12x(1 - x)2, 0 < x < 1.

Norm

The variance and higher moments are related to the concept of norm and distance in the theory of vector spaces. This connection can help unify and illuminate some of the ideas. Thus, let X be a real-valued random variable. For k >= 1, we define the k-norm by

||X||k = [E(|X|k)]1/k.

Thus, ||X||k is a measure of the size of X in a certain sense. For a given probability space (that is, a given random experiment), the set of random variables with finite k'th moment forms a vector space (if we identify two random variables that agree with probability 1). The following exercises show that the k-norm really is a norm on this vector space.

Mathematical Exercise 33. Show that ||X||k >= 0 for any X.

Mathematical Exercise 34. Show that ||X||k = 0 if and only if P(X = 0) = 1.

Mathematical Exercise 35. Show that ||cX||k = |c| ||X||k for any constant c.

The next exercise gives Minkowski's inequality, named for Hermann Minkowski. It is also known as the triangle inequality.

Mathematical Exercise 36. Show that ||X + Y||k <= ||X||k + ||Y||k for any X and Y.

  1. Show that g(x, y) = (x1/k + y1/k)k is concave on {(x, y) in R2: x >= 0, y >= 0}.
  2. Use (a) and Jensen's inequality to conclude that if U and V are nonnegative random variables then E[(U1/k + V1/k)k] <= {[E(U)]1/k + [E(V)]1/k}k.
  3. In (b) let U = |X|k and V = |Y|k and then do some algebra.

Our next exercise gives Lyapanov's inequality, named for Aleksandr Lyapunov. This inequality shows that the k-norm of a random variable is increasing in k.

Mathematical Exercise 37. Show that if j <= k then ||X||j <= ||X||k.

  1. Show that g(x) = xk/j is convex on {x: x >= 0}.
  2. Use part (a) and Jensen's inequality to conclude that if U is a nonnegative random variable then [E(U)]k/j <= E(Uk/j).
  3. In (b), let U = |X|j and do some algebra.

Lyapanov's inequality shows that if X has a finite k'th moment, and j < k, then X has a finite j'th moment as well.

Mathematical Exercise 38. Suppose that X is uniformly distributed on the interval (0, 1).

  1. Find ||X||k.
  2. Graph ||X||k as a function of k.
  3. Find limit ||X||k as k goes to infinity.

Mathematical Exercise 39. Suppose that X has density f(x) = a / xa + 1 for x > 1, where a > 0 is a parameter. This defines the Pareto distribution with shape parameter a.

  1. Find ||X||k.
  2. Graph ||X||k as a function of k < a.
  3. Find limit ||X||k as k goes to a-.

Mathematical Exercise 40. Suppose that (X, Y) has density f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Verify Minkowski's inequality.

Distance

The k-norm, like any norm, can be used to measure distance; we simply compute the norm of the difference between the objects. Thus, we define the k-distance (or k-metric) between real-valued random variables X and Y to be

dk(X, Y) = ||Y - X||k = [E(|Y - X|k)]1 / k.

The properties in the following exercises are analogues of the properties in Exercises 33-36 (and thus very little additional work should be required). These properties show that the k-distance really is a distance.

Mathematical Exercise 41. Show that dk(X, Y) >= 0 for any X, Y.

Mathematical Exercise 42. Show that dk(X, Y) = 0 if and only if P(Y = X) = 1.

Mathematical Exercise 43. Show that dk(X, Y) <= dk(X, Z) + dk(Z, Y) for any X, Y, Z (this is known as the triangle inequality).

Thus, the standard deviation is simply the 2-distance from X to its mean:

sd(X) = d2[X, E(X)] = {E[(X - E(X)]2}1/2.

and the variance is the square of this. More generally, the k'th moment of X about a is simply the k'th power of the k-distance from X to a. The 2-distance is especially important for reasons that will become clear below and in the next section. This distance is also called the root mean square distance.

Center and Spread Revisited

Measures of center and measures of spread are best thought of together, in the context of a measure of distance. For a random variable X, we first try to find the constants t that are closest to X, as measured by the given distance; any such t is a measure of center relative to the distance. The minimum distance itself is the corresponding measure of spread.

Let us apply this procedure to the 2-distance. Thus, we define the root mean square error function by

d2(X, t) = ||X - t||2 = {E[(X - t)2]}1/2.

Mathematical Exercise 44. Show that d2(X, t) is minimized when t = E(X) and that the minimum value is sd(X). Hint: The minimum value occurs at the same points as the minimum value of E[(X - t)2]. Expand this and take expected values term by term. The resulting expression is a quadratic function of t.

Simulation Exercise 45. In the histogram applet, construct a discrete distribution each of the types indicated below. Note the position and size of the mean ± standard deviation bar and the shape of the mean square error graph.

  1. A uniform distribution.
  2. A symmetric, unimodal distribution.
  3. A unimodal distribution that is skewed right.
  4. A unimodal distribution that is skewed left.
  5. A symmetric bimodal distribution.
  6. A u-shaped distribution.

Next, let us apply our procedure to the 1-distance. Thus, we define the mean absolute error function by

d1(X, t) = ||X - t||1 = E[|X - t|].

Mathematical Exercise 46. Show that d1(X, t) is minimized when t is any median of X.

The last exercise shows that mean absolute error has a basic deficiency as a measure of error, because in general there does exist a unique minimizing value of t. Indeed, for many discrete distributions, there is a median interval. Thus, in terms of mean absolute error, there is no compelling reason to choose one value in this interval, as the measure of center, over any other value in the interval.

Simulation Exercise 47. Construct a distribution of each of the types indicated below. In each case, note the position and size of the boxplot and the shape of the mean absolute error graph.

  1. A uniform distribution.
  2. A symmetric, unimodal distribution.
  3. A unimodal distribution that is skewed right.
  4. A unimodal distribution that is skewed left.
  5. A symmetric bimodal distribution
  6. A u-shaped distribution.

Mathematical Exercise 48. Let I be an indicator variable with P(I = 1) = p. Graph E[|I - t|] as a function of t in each of the cases below. In each case, find the minimum value of the mean absolute error function and the values of t where the minimum occurs.

  1. p < 1/2
  2. p = 1/2
  3. p > 1/2

Convergence

Whenever we have a measure of distance, we automatically have a criterion for convergence. Let Xn, n = 1, 2, ..., and X be real-valued random variables. We say that Xn converges to X as n converges to in k'th mean if

dk(Xn, X) converges to 0 as n converges to infinity, equivalently E(|Xn - X|k) converges to 0 as n converges to infinity.

When k = 1, we simply say that Xn converges to X as n converges to in mean; when k = 2, we say that Xn converges to X as n converges to in mean square. These are the most important special cases.

Mathematical Exercise 49. Use Lyaponov's inequality to show that if j < k then

Xn converges to X as n converges to in k'th mean implies Xn converges to X as n converges to in j'th mean.

Our next sequence of exercises shows that convergence in mean is stronger than convergence in probability.

Mathematical Exercise 50. Use Markov's inequality to show that

Xn converges to X as n converges to in mean implies Xn converges to X as n converges to in probability.

The converse is not true. Moreover, convergence with probability 1 does not imply convergence in k'th mean and convergence in k'th mean does not imply convergence with probability 1. The next two exercises give some counterexamples.

Mathematical Exercise 51. Suppose that X1, X2, X3, ... is a sequence of independent random variables with

P(Xn = n3) = 1 / n2, P(Xn = 0) = 1 - 1 / n2 for n = 1, 2, ...

  1. Use the first Borel-Cantelli lemma to show that Xn converges to 0 as n converges to with probability 1.
  2. Show that Xn converges to 0 as n converges to in probability.
  3. Show that E(Xn) converges to as n converges to

Mathematical Exercise 52. Suppose that X1, X2, X3, ... is a sequence of independent random variables with

P(Xn = 1) = 1 / n, P(Xn = 0) = 1 - 1 / n for n = 1, 2, ...

  1. Use the second Borel-Cantelli lemma to show that P(Xn = 0 for infinitely many n) = 1.
  2. Use the second Borel-Cantelli lemma to show that P(Xn = 1 for infinitely many n) = 1.
  3. Show that P(Xn does not converge as n converges to ) = 1.
  4. Show that Xn converges to 0 as n converges to in k'th mean for any k >= 1.

To summarize, the implications go from left to right in the following table (where j < k); no other implications hold in general.

convergence with probability 1 convergence in probability convergence in distribution
convergence in k'th mean convergence in j'th mean

Related Topics

For a related statistical topic, see the section on the Sample Variance in the chapter on Random Samples. The variance of a sum of random variables is best understood in terms of a related concept known as covariance, that will be studied in detail in the next section.