Virtual Laboratories > Expected Value > [1] 2 3 4 5 6 7
Expected value is one of the most important concepts in probability. The expected value of a real-valued random variable gives the center of the distribution of the variable, in a special sense. Additionally, by computing expected values of various real transformations of a general random variable, we con extract a number of interesting characteristics of the distribution of the variable, including measures of spread, symmetry, and correlation.
As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a random variable for the, taking values in a subset S of R.
If X has a discrete distribution with density function f then the expected value of X is defined by
E(X) = x
in S xf(x).
If X has a continuous distribution with density function f then the expected value of X is defined by
E(X) = S
xf(x)dx.
Finally, suppose that X has a mixed
distribution, with partial discrete density g on D and partial
continuous density h on C, where D and C are
disjoint, D is countable, and S = D
C. The expected value of X is defined by
E(X) = x
in C xg(x) +
C
xh(x)dx.
In any case, the expected value of X may not exist because the sum or the integral may not converge. The expected value of X is also called the mean of the distribution of X and is frequently denoted µ.
The mean is the center of the probability distribution of X in a special way. Indeed, if we think of the distribution as a mass distribution, then the mean is the center of mass as defined in physics. Please recall the other measures of the center of a distribution that we have studied: a mode is any value of x that maximizes f(x). A median is any value of x that satisfies
P(X < x) 1/2,
P(X
x)
1/2.
To understand expected value in a probabilistic way, suppose that we create a new, compound experiment by repeating the basic experiment over and over again. This gives a sequence of independent random variables,
X1, X2, X3 ...
each with the same distribution as X. In statistical terms, we are sampling from the distribution of X. The average value, or sample mean, after n runs is
Mn
= (X1 + X2 + ··· + Xn) / nThe average value Mn converges to the expected value µ
as n
. The precise statement of this is the law of large numbers,
one of the fundamental theorems of probability.
1.
A constant c can be thought of as a random variable that takes only the
value c with probability 1. The corresponding distribution is sometimes
called point mass at c. Show that
E(c) = c.
2. Let I be
an indicator random variable (that is, a variable that takes only the values 0 and 1).
Show that
E(I) = P(I = 1).
In particular, if IA is the indicator of an event A, then E(IA) = P(A), so in a sense, expected value subsumes probability. For a book that takes expected value, rather than probability, as the fundamental starting concept, see Probability via Expectation, by Peter Whittle.
3. Suppose that X
is uniformly distributed on a finite subset S of
R. Show that E(X)
is the arithmetic average of the numbers in S.
4. The
score on a fair die is uniformly distributed on {1, 2, 3, 4, 5, 6}. Find the expected
score.
5. In the
dice experiment, select one fair die. Run the experiment 1000 times,
updating every 10 runs, and note the apparent convergence of the sample mean to the
distribution mean.
6. Find the
expected score for an ace-six flat die. The density function is
f(1) = 1/4, f(2) = f(3) = f(4) = f(5) = 1/8, f(6) = 1/4
7. In the
dice experiment, select one ace-six flat die. Run the experiment 1000
times, updating every 10 runs, and note the apparent convergence of the sample mean to the
distribution mean.
8.
Suppose that Y has density function f(n) = p(1 - p)n
- 1 for n = 1, 2, ..., where 0 < p < 1 is a parameter.
This defines the geometric distribution
with parameter p. Show that
E(Y) = 1 / p.
9.
Suppose that N has density function f(n) = exp(-t)tn
/ n! for n = 0, 1, ..., where t > 0 is a parameter. This
defines the Poisson distribution with parameter t. Show that
E(N) = t.
10. Suppose that X
is uniformly distributed on an interval (a, b) of
R. Show that the
mean is the midpoint of the interval:
E(X) = (a + b) / 2
11.
Suppose that X has density f(x) = 12x2(1
- x) for 0 < x < 1.
12. Suppose that X
has the density function f(x) = a / xa
+ 1 for x
> 1, where a > 0 is a parameter. This defines the Pareto
distribution with shape parameter a. Show that
13. In the
random variable experiment, select the Pareto distribution. For the
following values of the shape parameter a, run the experiment 1000 times updating every 10 runs. Note the
the behavior of
the empirical mean.
14. Suppose
that T has density f(t) = r exp(-rt)
for t > 0
where r > 0 is a parameter. This defines the exponential distribution
with rate parameter r. Show that
15. In the
random variable experiment, select the gamma distribution. Set k
= 1 to get the exponential distribution. Vary r with the scroll bar and note the
position of the mean relative to the graph of the density function. Now with r =
2, run the experiment 1000 times updating every 10 runs. Note the apparent convergence of
the sample mean to the distribution mean.
16. Suppose that X
has density f(x) = 1 / [
(1 + x2)],
x in R. This defines the Cauchy distribution (named after Augustin Cauchy),
a member of the family of t-distributions.
17. In the
random variable experiment, select the student t distribution.
Set n = 1 to get the Cauchy distribution. Run the simulation 1000 times, updating
every 10 runs. Note the behavior of the empirical mean.
18.
Suppose that Z has density f(z) = exp(-z2
/ 2) / (2
)1/2
for z in R. This defines the standard normal
distribution. Show that
19. In the
random variable experiment, select the normal distribution
(the default parameter values give the standard normal distribution). Run the simulation 1000 times, updating
every 10 runs, and note the apparent convergence of the empirical mean to the
true mean.
The expected value of a real-valued random variable gives the center of the distribution of the variable. This idea is much more powerful than might first appear. By finding expected values of various functions of a general random variable, we can measure many interesting features of its distribution.
Thus, suppose that X is a variable variable taking values in a general set S, and suppose that r is a function from S into R. Then r(X) is a real-valued random variable and we would like to compute E[r(X)]. However, to compute this expected value from the definition would require that we know the density function of transformed variable r(X) (a difficult problem, in general). Fortunately, there is a much better way, given by the change of variables theorem for expected value.
20. Show that if X
has a discrete distribution with density function f then
E[r(X)] = x
in S r(x)f(x).
Similarly, if X has a continuous distribution with density function f then
E[r(X)] = S
r(x)f(x)dx.
21. Prove
the version of the change of variables theorem when X is
continuous and r is discrete (i.e., r has countable range).
22. Suppose
that X is uniformly distributed on (-1, 3).
23. Suppose
that X has density function f(x) = x2 /
60
for x
{-2, -1,
1, 2, 3, 4, 5}.
24. Suppose that X
has density function f(x) = 12x2(1 - x)
for 0 < x < 1. Find
25. Suppose
that (X, Y) has probability function f(x, y) =
2(x
+ y) for 0 < x < y < 1. Find
26.
Suppose that X is uniformly distributed on the interval [a, b],
and that g is a continuous function from [a, b]
into R. Show that E[g(X)]
is the average value of g on [a, b], as defined
in calculus.
The exercises below gives basic properties of expected value. These properties are true in general, but restrict your proofs to the discrete and continuous cases separately; the change of variables theorem is the main tool you will need. In these exercises X and Y are real-valued random variables for an experiment, c is a constant, and we assume that the indicated expected values exist.
27. Show that E(X
+ Y) = E(X) + E(Y)
28. Show that
E(cX) = cE(X).
Thus, as a consequence of the last two exercises,
E(aX + bY) = aE(X) + bE(Y)
for constants a and b; in words, expected value is a linear operation.
29. Show that if X
0 (with
probability 1) then E(X)
0.
30. Show that if X
Y
(with probability 1) then E(X)
E(Y)
31. Show
that |E(X)|
E(|X|)
The results of these exercises are so basic that it is important to understand them on an intuitive level. Indeed, these properties are in some sense implied by the interpretation of expected value given in the law of large numbers.
32. Suppose
that X and Y are independent. Show that
E(XY) = E(X)E(Y)
The last exercise shows that independent random variables are uncorrelated.
33.
A pair of fair dice are thrown, and the scores (X1, X2)
recorded. Find
the expected value of
34. Suppose that E(X)
= 5 and E(Y) = -2. Find E(3X + 4Y - 7).
35.
Suppose that X and Y are independent, and that E(X)
= 5, E(Y) = -2. Find
E[(3X - 4)(2Y + 7)]
36. Suppose that there are 5 duck
hunters, each a perfect shot. A flock of 10 ducks fly over, and each hunter
selects one duck at random and shoots. Find the expected number of ducks killed.
Hint: Express the number of ducks killed as a sum of indicator random
variables.
For a more complete analysis of the duck hunter problem, see The Number of Distinct Sample Values in the chapter on Finite Sampling Models.
If X is a random variable, a a real number, and n > 0, the n'th moment of X about a is defined to be
E[(X - a)n].
The moments about 0 are simply referred to as moments. The moments about µ = E(X) are the central moments. The second central moment is particularly important and is studied in detail in the section on variance. In some cases, if we know all of the moments of X, we can determine the entire distribution of X. This idea is explored in the section on generating functions.
37.
Suppose that X is uniformly distributed on an interval (a, b).
Find a general formula for the moments of X.
38.
Suppose that X has density f(x) = 12x2(1
- x), 0 < x < 1. Find a general formula for the moments of X.
39.
Suppose that X has a continuous distribution with density f that
is symmetric about a:
f(a + t) = f(a - t) for any t
Show that if E(X) exists, then E(X) = a.
40. Let X
be a nonnegative random variable for an experiment, either discrete or
continuous. Show that
E(X) = {x
> 0} P(X > x)dx.
Hint: In the representation above, express P(X > t) in terms of the density of X, as a sum in the discrete case or an integral in the continuous case. Then interchange the integral and the sum (in the discrete case) or the two integrals (in the continuous case).
41. Prove Markov's
inequality (named after Andrei Markov):
If X is a nonnegative random variable, then for t > 0,
P(X t)
E(X) / t.
Hint: Let It denote the indicator variable of the
event {X
t}. Show that tIt
X.
Then take expected values through the inequality.
42. Use the
result of Exercise 40 to prove the change of variables formula when the random vector X
has a continuous distribution and r is nonnegative.
43.
Use the result of Exercise 40 to show that if X is nonnegative and E(X)
= 0 then P(X = 0) = 1.
The following result is similar to Exercise 40, but is specialized to nonnegative integer valued variables:
44. Suppose
that N is a discrete random variable that takes values in the set of nonnegative
integers. Show that
E(N) = n
= 0, 1, ... P(N > n) =
n
= 1, 2, ... P(N
n).
Hint: In the first representation, express P(N > n) as a sum in terms of the density function of N. Then interchange the two sums. The second representation can be obtained from the first by a change of variables in the summation index.
45. Suppose that X
has the density function f(x) = r exp(-rx) for x
> 0, where r > 0 is a parameter. This defines the exponential
distribution with rate parameter r.
46. Suppose
that Y has density function g(n) = (1 - p)n - 1p
for n = 1, 2, ... where 0 < p < 1 is a parameter.
This defines the geometric distribution with parameter p.
The result in Exercise 40 can be used as the basis of a general formulation of expected value that works for discrete, continuous, or even mixed distributions. First, the result in Exercise 40 is taken as the definition of E(X) if X is nonnegative.
Next, for a real number x, we define the positive and negative parts of x as follows
47. Show that
Finally, if X is a random variable, then X+ and X- , the positive and negative parts of X, are nonnegative random variables. Thus, assuming that E(X+) or E(X-) (or both) is finite, we can define
E(X) = E(X+) - E(X-)
Our next sequence of exercises will establish an important inequality known as Jensen's inequality, named for Johan Jensen. First we need a definition. A real-valued function g defined on an interval S of R is said to be convex on S if for each x0 in S, there exist numbers a and b (that may depend on x0) such that
ax0 + b = g(x0), ax + b g(x) for x in
S.
48. Interpret
the definition of convex function geometrically. The line y
= ax + b is called a supporting line at x0.
You may be more familiar with convexity in terms of the following theorem from calculus:
49. Show
that g is convex on S if g
has a continuous, non-negative second derivative on S. Hint: Show
that the tangent line at x0 is a supporting line at x0.
50. Prove
Jensen's inequality: If X takes values in an interval S and g is
convex on S, then
E[g(X)] g[E(X)]
Hint: In the definition of convexity given above, let x0 = E(X) and replace x with X. Then take expected values through the inequality.
51. Suppose
that X has density function f(x) = a / xa
+ 1 for x
> 1, where a > 1 is a parameter. This defines the Pareto
distribution with shape parameter a.
Jensens's inequality extends easily to higher dimensions. The 2-dimensional version is particularly important, because it will be used to derive several special inequalities in the next section. First, a subset S of R2 is convex if
u, v S
and p
[0, 1] implies (1 - p)u + pv
S.
Next, a real-valued function g on S is said to be convex if for each (x0, y0) in S, there exist numbers a, b, and c (depending on (x0, y0)) such that
ax0 + by0 + c = g(x0,
y0), ax + by + c
g(x, y) for (x, y) in S.
52. Interpret
the definitions of convex set and convex function geometrically. The plane z
= ax + by + c is called a supporting plane at (x0,
y0).
From calculus, g is convex on S if g has continuous second derivatives on S and has a positive non-definite second derivative matrix:
gxx
0, gyy
0, gxxgyy - gxy2
0 on S.
53. Prove
Jensen's inequality: If (X, Y) takes values in a convex set
S and g is convex on S then
E[g(X, Y)] g[E(X),
E(Y)].
Hint: In the definition of convexity, let x0 = E(X), y0 = E(Y), and replace x with X, y with Y. Then take expected values through the inequality.
54.
Suppose
that (X, Y) has probability function f(x, y) =
2(x
+ y) for 0 < x < y < 1.
In both the one and two-dimensional cases, a function g is concave if the inequality in the definition is reversed. Jensen's inequality also reverses.
55.
Suppose that x1, x2, ..., xn
are positive numbers. Show that the arithmetic mean is at least as
large as the geometric mean:
(x1 x2 ··· xn)1/n
(x1 + x2 + ··· + xn) / n.
Hint: Let X be uniformly distributed on {x1, x2, ..., xn} and let g(x) = ln(x).
The expected value of a random variable X is based, of course, on the probability measure P for the experiment. This probability measure could be a conditional probability measure, conditioned on a given event B for the experiment (with P(B) > 0). The usual notation is E(X | B), and this expected value is computed by the definitions given at the beginning of this page, except that the conditional density f(x | B) replaces the ordinary density f(x). It is very important to realize that, except for notation, no new concepts are involved. The results we have established for expected value in general have analogues for these conditional expected values.
56. Suppose that X
has the density function f(x) = r exp(-rx) for x
> 0, where r > 0 is a parameter. This defines the exponential
distribution with rate parameter r. For fixed t > 0,
find
E(X | X > t).
57. Suppose
that Y has density function g(n) = (1 - p)n - 1p
for n = 1, 2, ... where 0 < p < 1 is a parameter.
This defines the geometric distribution with parameter p.
Find
E(Y | Y is even).
58. Suppose
that (X, Y) has density function f(x, y) =
x
+ y for 0 < x < 1, 0 < y < 1. Find
E(XY | Y > X).
More generally, the conditional expected value of a random variable, given the value of another random variable, is a very important topic that is treated in a separate section.