Virtual Laboratories > Expected Value > 1 [2] 3 4 5 6 7
As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a random variable for the experiment, taking values in a subset S of R. Recall that the expected value or mean of X gives the center of the distribution of X. The variance of X is a measure of the spread of the distribution about the mean and is defined by
var(X) = E{[X - E(X)]2}
Thus, the variance is the second central moment of X.
1. Suppose that X
has a discrete distribution with density function f.
Use the change of variables theorem to show that
var(X) = x
in S [x - E(X)]2 f(x).
2. Suppose that X
has a continuous distribution with density function f.
Use the change of variables theorem to show that
var(X) = S
[x - E(X)]2 f(x)dx.
The standard deviation of X is the square root of the variance:
sd(X) = [var(X)]1/2.
It also measures dispersion about the mean but has the same physical units as the variable X.
The following exercises give some basic properties of variance, which in turn rely on basic properties of expected value:
3. Show that var(X)
= E(X2) - [E(X)]2.
4. Show that var(X)
0
5. Show that var(X)
= 0 if and only if P(X = c) = 1 for some constant c.
6. Show that if a
and b are constants then var(aX + b) = a2var(X)
7. Let Z
= [X - E(X)] / sd(X). Show that Z has mean 0
and variance 1.
The random variable Z in Exercise 7 is sometimes called the standard score associated with X. Since X and its mean and standard deviation all have the same physical units, the standard score Z is dimensionless. It measures the directed distance from E(X) to X in terms of standard deviations.
On the other hand, when E(X) is not zero, the ratio of standard deviation to mean is called the coefficient of variation:
sd(X) / E(X)
Note that this quantity also is dimensionless, and is sometimes used to compare variability for random variables with different means.
8. Suppose that I
is an indicator variable with P(I = 1) = p.
9. The
score on a fair die is uniformly distributed on {1, 2, 3, 4, 5, 6}. Find the mean,
variance, and
standard deviation..
10. In the
dice experiment, select one fair die. Run the experiment 1000 times,
updating every 10 runs, and note the apparent convergence of the empirical mean and
standard deviation to the distribution mean and standard deviation..
11. For an ace-six flat
die, faces 1 and 6 have probability 1/4 each, and faces 2, 3, 4, 5 have
probability 1/8 each. Find the mean, variance and standard deviation.
12. In the
dice experiment, select one ace-six flat die. Run the experiment 1000
times, updating every 10 runs, and note the apparent convergence of the empirical mean and
standard deviation to the distribution mean and standard deviation.
13. Suppose
that X is uniformly distributed on {1, 2, ..., n}. Show that
var(X) = (n2 - 1) / 12.
14.
Suppose that Y has density function f(n) = p(1 - p)n
- 1 for n = 1, 2, ..., where 0 < p < 1 is a parameter. This defines
the geometric distribution with
parameter p. Show that
var(Y) = (1 - p) / p2.
15. Suppose that N as density function f(n) =
exp(-t)tn / n! for n = 0, 1, ..., where
t > 0 is a parameter. This defines the Poisson
distribution with parameter t. Show that
var(N) = t.
16. Suppose that X
is uniformly distributed on the interval (a, b) where a < b. Show
that
var(X) = (b - a)2 / 12.
Note in particular that the variance depends only on the length of the interval, which is intuitively reasonable.
17. Suppose
that X has density function f(x) = r exp(-rx)
for x > 0.
This defines the exponential
distribution with rate parameter r > 0. Show that
sd(X) = 1 / r.
18. In the
gamma experiment, set k = 1 to get the exponential
distribution. Vary r with the scroll bar and note the size and location of the
mean-standard deviation bar. Now with r = 2, run the experiment 1000 times
updating every 10 runs. Note the apparent convergence of the empirical mean and standard
deviation to the distribution mean and standard deviation.
19. Suppose that X
has density f(x) = a / xa + 1
for x
> 1, where a > 0 is a parameter. This defines the Pareto
distribution with shape parameter a. Show that
20.
Suppose that Z has density f(z) = exp(-z2
/ 2) / (2
)1/2
for z in R. This defines the standard normal
distribution. Show that
var(Z) = 1.
Hint: In the integral for E(Z2), integrate by parts.
21. In the
random variable experiment, select the normal distribution
(the default parameter values give the standard normal distribution). Run the experiment 1000 times
updating every 10 runs and note the apparent convergence of the empirical mean and standard
deviation to the distribution mean and standard deviation.
22. Suppose that
X
is a random variable with E(X) = 5, var(X) = 4. Find
23.
Suppose that X1 and X2 are independent random variables with
E(Xi) = µi, var(X) = di2
for i = 1, 2. Show that
var(X1X2) = (d12 + µ12)(d22 + µ22) - µ12µ22.
24. Marilyn Vos Savant has an IQ of 228.
Assuming that the distribution of IQ scores has mean 100 and standard deviation 15, find
Marilyn's standard score.
Chebyshev's inequality (named after Pafnuty Chebyshev) gives an upper bound on the probability that a random variable will be more than a specified distance from its mean.
25. Use
Markov's
inequality to prove Chebyshev's inequality: for t > 0,
P[|X - E(X)|
t]
var(X) / t2.
26. Establish
the following equivalent version of Chebyshev's inequality: for k > 0,
P[|X - E(X)|
k sd(X)]
1 / k2.
27. Suppose
that Y has the geometric distribution with parameter p = 3/4. Compute the true value and
the Chebyshev bound for the probability that Y is at least 2 standard deviations
away from the mean.
28. Suppose
that X has the exponential distribution with rate parameter r >
0. Compute the
true value and the Chebyshev bound for the probability that X is at least
k standard deviations away from the mean.
Recall again that the variance of X is the second moment of X about the mean, and measures the spread of the distribution of X about the mean. The third and fourth moments of X about the mean also measure interesting features of the distribution. The third moment measures skewness, the lack of symmetry, while the fourth moment measures kurtosis, the degree to which the distribution is peaked. The actual numerical measures of these characteristics are standardized to eliminate the physical units, by dividing by an appropriate power of the standard deviation.
Thus, let µ = E(X) and d = sd(X). The skewness of X is defined to be
skew(X) = E[(X - µ )3] / d3.
The kurtosis of X is defined to be
kurt(X) = E[(X - µ )4] / d4.
29.
Suppose that X has density f, which is symmetric with respect to
µ. Show that skew(X) = 0.
30.
Show that
skew(X) = [E(X3) - 3µE(X) + 2µ3] / d3.
31.
Show that
kurt(X) = [E(X4) - 4µE(X) + 6µ2 E(X2) - 3µ4] / d4.
32.
Graph the following density functions and compute the skewness and kurtosis of
each.
(These distributions are all members of the beta
family).
The variance and higher moments are related to the concept of norm and
distance in the theory of vector spaces. This connection can help unify and
illuminate some of the ideas. Thus, let X be a real-valued random variable. For k
1, we define the k-norm by
||X||k = [E(|X|k)]1/k.
Thus, ||X||k is a measure of the size of X in a certain sense. For a given probability space (that is, a given random experiment), the set of random variables with finite k'th moment forms a vector space (if we identify two random variables that agree with probability 1). The following exercises show that the k-norm really is a norm on this vector space.
33. Show that ||X||k
0 for any X.
34. Show that ||X||k
= 0 if and only if P(X = 0) = 1.
35. Show that ||cX||k
= |c| ||X||k for any constant c.
The next exercise gives Minkowski's inequality, named for Hermann Minkowski. It is also known as the triangle inequality.
36.
Show that ||X + Y||k
||X||k + ||Y||k for any X
and Y.
Our next exercise gives Lyapanov's inequality, named for Aleksandr Lyapunov. This inequality shows that the k-norm of a random variable is increasing in k.
37.
Show that if j
k then ||X||j
||X||k.
Lyapanov's inequality shows that if X has a finite k'th moment, and j < k, then X has a finite j'th moment as well.
38.
Suppose that X is uniformly distributed on the interval (0, 1).
39.
Suppose that X
has density f(x) = a / xa + 1
for x
> 1, where a > 0 is a parameter. This defines the Pareto
distribution with shape parameter a.
40.
Suppose that (X, Y) has density f(x, y) = x
+ y for 0 < x < 1, 0 < y < 1. Verify
Minkowski's inequality.
The k-norm, like any norm, can be used to measure distance; we simply compute the norm of the difference between the objects. Thus, we define the k-distance (or k-metric) between real-valued random variables X and Y to be
dk(X, Y) = ||Y - X||k = [E(|Y - X|k)]1 / k.
The properties in the following exercises are analogues of the properties in Exercises 33-36 (and thus very little additional work should be required). These properties show that the k-distance really is a distance.
41. Show that
dk(X, Y)
0 for any X, Y.
42. Show that
dk(X, Y) = 0 if and only if P(Y = X) =
1.
43.
Show that dk(X, Y)
dk(X, Z) + dk(Z, Y)
for
any X, Y, Z (this is known as the triangle
inequality).
Thus, the standard deviation is simply the 2-distance from X to its mean:
sd(X) = d2[X, E(X)] = {E[(X - E(X)]2}1/2.
and the variance is the square of this. More generally, the k'th moment of X about a is simply the k'th power of the k-distance from X to a. The 2-distance is especially important for reasons that will become clear below and in the next section. This distance is also called the root mean square distance.
Measures of center and measures of spread are best thought of together, in the context of a measure of distance. For a random variable X, we first try to find the constants t that are closest to X, as measured by the given distance; any such t is a measure of center relative to the distance. The minimum distance itself is the corresponding measure of spread.
Let us apply this procedure to the 2-distance. Thus, we define the root mean square error function by
d2(X, t) = ||X - t||2 = {E[(X - t)2]}1/2.
44. Show that
d2(X, t) is minimized when t = E(X) and that the minimum value is
sd(X).
Hint: The minimum value occurs at
the same points as the minimum value of E[(X - t)2].
Expand this and take expected values term by term. The resulting expression is a
quadratic function of t.
45. In the histogram
applet, construct a discrete distribution each of the types indicated
below. Note the position and size of the mean ± standard deviation bar and the shape of
the mean square error graph.
Next, let us apply our procedure to the 1-distance. Thus, we define the mean absolute error function by
d1(X, t) = ||X - t||1 = E[|X - t|].
46. Show that
d1(X, t) is minimized when t is any median of X.
The last exercise shows that mean absolute error has a basic deficiency as a measure of error, because in general there does exist a unique minimizing value of t. Indeed, for many discrete distributions, there is a median interval. Thus, in terms of mean absolute error, there is no compelling reason to choose one value in this interval, as the measure of center, over any other value in the interval.
47. Construct
a distribution of each of the types indicated below. In each case, note the position and
size of the boxplot and the shape of the mean absolute error graph.
48.
Let I be an indicator variable with P(I = 1) = p.
Graph E[|I - t|] as a function of t in each of the
cases below. In each case, find the minimum value of the mean absolute error
function and the values of t where the minimum occurs.
Whenever we have a measure of distance, we automatically have a criterion for
convergence. Let Xn, n = 1, 2, ..., and X be
real-valued random variables. We say that
Xn X
as n
in k'th mean if
dk(Xn, X)
0 as n
,
equivalently E(|Xn - X|k)
0 as n
.
When k = 1, we simply say that Xn
X as n
in mean; when k = 2, we say that Xn
X as n
in mean square. These are the most important special cases.
49.
Use Lyaponov's inequality to show that if j < k then
Xn
X as n
in k'th mean implies Xn
X as n
in j'th mean.
Our next sequence of exercises shows that convergence in mean is stronger than convergence in probability.
50.
Use Markov's inequality to show that
Xn
X as n
in mean implies Xn
X as n
in probability.
The converse is not true. Moreover, convergence with probability 1 does not imply convergence in k'th mean and convergence in k'th mean does not imply convergence with probability 1. The next two exercises give some counterexamples.
51.
Suppose that X1, X2,
X3, ... is a sequence of independent random variables
with
P(Xn = n3) = 1 / n2, P(Xn = 0) = 1 - 1 / n2 for n = 1, 2, ...
52.
Suppose that X1, X2,
X3, ... is a sequence of independent random variables
with
P(Xn = 1) = 1 / n, P(Xn = 0) = 1 - 1 / n for n = 1, 2, ...
To summarize, the implications go from left to right in the following table (where j < k); no other implications hold in general.
convergence with probability 1 | convergence in probability | convergence in distribution | |
---|---|---|---|
convergence in k'th mean | convergence in j'th mean |
For a related statistical topic, see the section on the Sample Variance in the chapter on Random Samples. The variance of a sum of random variables is best understood in terms of a related concept known as covariance, that will be studied in detail in the next section.