Definitions and Properties

1. Definitions and Properties

Expected value is one of the most important concepts in probability. The expected value of a real-valued random variable gives the center of the distribution of the variable, in a special sense. Additionally, by computing expected values of various real transformations of a general random variable, we con extract a number of interesting characteristics of the distribution of the variable, including measures of spread, symmetry, and correlation.

Definitions

As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a random variable for the, taking values in a subset S of R.

If X has a discrete distribution with density function f then the expected value of X is defined by

E(X) = _{x
in S} xf(x).

If X has a continuous distribution with density function f then the expected value of X is defined by

E(X) = _S xf(x)dx.

Finally, suppose that X has a mixed distribution, with partial discrete density g on D and partial continuous density h on C, where D and C are disjoint, D is countable, and S = D C. The expected value of X is defined by

E(X) = _x
in_C xg(x) + _C xh(x)dx.

In any case, the expected value of X may not exist because the sum or the integral may not converge. The expected value of X is also called the mean of the distribution of X and is frequently denoted Е.

Interpretation

The mean is the center of the probability distribution of X in a special way. Indeed, if we think of the distribution as a mass distribution, then the mean is the center of mass as defined in physics. Please recall the other measures of the center of a distribution that we have studied: a mode is any value of x that maximizes f(x). A median is any value of x that satisfies

P(X < x) 1/2, P(X x) 1/2.

To understand expected value in a probabilistic way, suppose that we create a new, compound experiment by repeating the basic experiment over and over again. This gives a sequence of independent random variables,

X₁, X₂, X₃ ...

each with the same distribution as X. In statistical terms, we are sampling from the distribution of X. The average value, or sample mean, after n runs is

M_n = (X₁ + X₂ + ЗЗЗ + X_n) / n The average value M_n converges to the expected value Е as n . The precise statement of this is the law of large numbers, one of the fundamental theorems of probability. Examples and Special Cases 1. A constant c can be thought of as a random variable that takes only the value c with probability 1. The corresponding distribution is sometimes called point mass at c. Show that E(c) = c. 2. Let I be an indicator random variable (that is, a variable that takes only the values 0 and 1). Show that E(I) = P(I = 1). In particular, if I_A is the indicator of an event A, then E(I_A) = P(A), so in a sense, expected value subsumes probability. For a book that takes expected value, rather than probability, as the fundamental starting concept, see Probability via Expectation, by Peter Whittle. 3. Suppose that X is uniformly distributed on a finite subset S of R. Show that E(X) is the arithmetic average of the numbers in S. 4. The score on a fair die is uniformly distributed on {1, 2, 3, 4, 5, 6}. Find the expected score. 5. In the dice experiment, select one fair die. Run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the sample mean to the distribution mean. 6. Find the expected score for an ace-six flat die. The density function is f(1) = 1/4, f(2) = f(3) = f(4) = f(5) = 1/8, f(6) = 1/4 7. In the dice experiment, select one ace-six flat die. Run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the sample mean to the distribution mean. 8. Suppose that Y has density function f(n) = p(1 - p)ⁿ^{- 1} for n = 1, 2, ..., where 0 < p < 1 is a parameter. This defines the geometric distribution with parameter p. Show that E(Y) = 1 / p. 9. Suppose that N has density function f(n) = exp(-t)tⁿ / n! for n = 0, 1, ..., where t > 0 is a parameter. This defines the Poisson distribution with parameter t. Show that E(N) = t. 10. Suppose that X is uniformly distributed on an interval (a, b) of R. Show that the mean is the midpoint of the interval: E(X) = (a + b) / 2 11. Suppose that X has density f(x) = 12x²(1 - x) for 0 < x < 1. Find E(X). Find the mode of X Find the median of X Sketch the graph of f and show the location of the mean, median, and mode on the x-axis. 12. Suppose that X has the density function f(x) = a / x^a^{+ 1} for x > 1, where a > 0 is a parameter. This defines the Pareto distribution with shape parameter a. Show that E(X) = if 0 < a 1 E(X) = a / (a - 1) if a > 1. 13. In the random variable experiment, select the Pareto distribution. For the following values of the shape parameter a, run the experiment 1000 times updating every 10 runs. Note the the behavior of the empirical mean. a = 1 a = 2 a = 3 14. Suppose that T has density f(t) = r exp(-rt) for t > 0 where r > 0 is a parameter. This defines the exponential distribution with rate parameter r. Show that Show that E(T) = 1 / r. Show that the median of T is 0. Show that the median of T is ln(2) / r. Sketch the graph of f and show the location of the mean, median, and mode on the x-axis. 15. In the random variable experiment, select the gamma distribution. Set k = 1 to get the exponential distribution. Vary r with the scroll bar and note the position of the mean relative to the graph of the density function. Now with r = 2, run the experiment 1000 times updating every 10 runs. Note the apparent convergence of the sample mean to the distribution mean. 16. Suppose that X has density f(x) = 1 / [(1 + x²)], x in R. This defines the Cauchy distribution (named after Augustin Cauchy), a member of the family of t-distributions. Sketch the graph of f. Show that E(X) does not exist. Find the median of X. Find the mode of X. 17. In the random variable experiment, select the student t distribution. Set n = 1 to get the Cauchy distribution. Run the simulation 1000 times, updating every 10 runs. Note the behavior of the empirical mean. 18. Suppose that Z has density f(z) = exp(-z² / 2) / (2)^1/2 for z in R. This defines the standard normal distribution. Show that Show that E(Z) = 0. Sketch the graph of f and show E(Z) on the z-axis. 19. In the random variable experiment, select the normal distribution (the default parameter values give the standard normal distribution). Run the simulation 1000 times, updating every 10 runs, and note the apparent convergence of the empirical mean to the true mean. Change of Variables Theorem The expected value of a real-valued random variable gives the center of the distribution of the variable. This idea is much more powerful than might first appear. By finding expected values of various functions of a general random variable, we can measure many interesting features of its distribution. Thus, suppose that X is a variable variable taking values in a general set S, and suppose that r is a function from S into R. Then r(X) is a real-valued random variable and we would like to compute E[r(X)]. However, to compute this expected value from the definition would require that we know the density function of transformed variable r(X) (a difficult problem, in general). Fortunately, there is a much better way, given by the change of variables theorem for expected value. 20. Show that if X has a discrete distribution with density function f then E[r(X)] = _{x in S} r(x)f(x). Similarly, if X has a continuous distribution with density function f then E[r(X)] = _S r(x)f(x)dx. 21. Prove the version of the change of variables theorem when X is continuous and r is discrete (i.e., r has countable range). 22. Suppose that X is uniformly distributed on (-1, 3). Find the density of X². Find E(X²) using the density function in (a). Find E(X²) using the change of variables theorem. 23. Suppose that X has density function f(x) = x² / 60 for x {-2, -1, 1, 2, 3, 4, 5}. Find E(X). Find the density of X². Find E(X²) using the density in (a) Find E(X²) using the change of variables theorem. 24. Suppose that X has density function f(x) = 12x²(1 - x) for 0 < x < 1. Find E(1/X) E(X^1/2) 25. Suppose that (X, Y) has probability function f(x, y) = 2(x + y) for 0 < x < y < 1. Find E(X) E(Y) E(X²Y). E(X² + Y²) 26. Suppose that X is uniformly distributed on the interval [a, b], and that g is a continuous function from [a, b] into R. Show that E[g(X)] is the average value of g on [a, b], as defined in calculus. Basic Properties The exercises below gives basic properties of expected value. These properties are true in general, but restrict your proofs to the discrete and continuous cases separately; the change of variables theorem is the main tool you will need. In these exercises X and Y are real-valued random variables for an experiment, c is a constant, and we assume that the indicated expected values exist. 27. Show that E(X + Y) = E(X) + E(Y) 28. Show that E(cX) = cE(X). Thus, as a consequence of the last two exercises, E(aX + bY) = aE(X) + bE(Y) for constants a and b; in words, expected value is a linear operation. 29. Show that if X 0 (with probability 1) then E(X) 0. 30. Show that if X Y (with probability 1) then E(X) E(Y) 31. Show that |E(X)| E(|X|) The results of these exercises are so basic that it is important to understand them on an intuitive level. Indeed, these properties are in some sense implied by the interpretation of expected value given in the law of large numbers. 32. Suppose that X and Y are independent. Show that E(XY) = E(X)E(Y) The last exercise shows that independent random variables are uncorrelated. 33. A pair of fair dice are thrown, and the scores (X₁, X₂) recorded. Find the expected value of Y = X₁+ X₂. Z = X₁X₂. U = min{X₁, X₂} V = max{X₁, X₂}. 34. Suppose that E(X) = 5 and E(Y) = -2. Find E(3X + 4Y - 7). 35. Suppose that X and Y are independent, and that E(X) = 5, E(Y) = -2. Find E[(3X - 4)(2Y + 7)] 36. Suppose that there are 5 duck hunters, each a perfect shot. A flock of 10 ducks fly over, and each hunter selects one duck at random and shoots. Find the expected number of ducks killed. Hint: Express the number of ducks killed as a sum of indicator random variables. For a more complete analysis of the duck hunter problem, see The Number of Distinct Sample Values in the chapter on Finite Sampling Models. Moments If X is a random variable, a a real number, and n > 0, the n'th moment of X about a is defined to be E[(X - a)ⁿ]. The moments about 0 are simply referred to as moments. The moments about Е = E(X) are the central moments. The second central moment is particularly important and is studied in detail in the section on variance. In some cases, if we know all of the moments of X, we can determine the entire distribution of X. This idea is explored in the section on generating functions. 37. Suppose that X is uniformly distributed on an interval (a, b). Find a general formula for the moments of X. 38. Suppose that X has density f(x) = 12x²(1 - x), 0 < x < 1. Find a general formula for the moments of X. 39. Suppose that X has a continuous distribution with density f that is symmetric about a: f(a + t) = f(a - t) for any t Show that if E(X) exists, then E(X) = a. Nonnegative Variables 40. Let X be a nonnegative random variable for an experiment, either discrete or continuous. Show that E(X) = _{{x > 0}} P(X > x)dx. Hint: In the representation above, express P(X > t) in terms of the density of X, as a sum in the discrete case or an integral in the continuous case. Then interchange the integral and the sum (in the discrete case) or the two integrals (in the continuous case). 41. Prove Markov's inequality (named after Andrei Markov): If X is a nonnegative random variable, then for t > 0, P(X t) E(X) / t. Hint: Let I_t denote the indicator variable of the event {X t}. Show that tI_t X. Then take expected values through the inequality. 42. Use the result of Exercise 40 to prove the change of variables formula when the random vector X has a continuous distribution and r is nonnegative. 43. Use the result of Exercise 40 to show that if X is nonnegative and E(X) = 0 then P(X = 0) = 1. The following result is similar to Exercise 40, but is specialized to nonnegative integer valued variables: 44. Suppose that N is a discrete random variable that takes values in the set of nonnegative integers. Show that E(N) = _{n = 0, 1, ...} P(N > n) = _{n = 1, 2, ...} P(N n). Hint: In the first representation, express P(N > n) as a sum in terms of the density function of N. Then interchange the two sums. The second representation can be obtained from the first by a change of variables in the summation index. 45. Suppose that X has the density function f(x) = r exp(-rx) for x > 0, where r > 0 is a parameter. This defines the exponential distribution with rate parameter r. Find E(X) using the definition. Find E(X) using the formula in Exercise 40. Compute both sides of Markov's inequality. 46. Suppose that Y has density function g(n) = (1 - p)^{n - 1}p for n = 1, 2, ... where 0 < p < 1 is a parameter. This defines the geometric distribution with parameter p. Find E(Y) using the definition. Find E(Y) using the formula in Exercise 44. Compute both sides of Markov's inequality. A General Definition The result in Exercise 40 can be used as the basis of a general formulation of expected value that works for discrete, continuous, or even mixed distributions. First, the result in Exercise 40 is taken as the definition of E(X) if X is nonnegative. Next, for a real number x, we define the positive and negative parts of x as follows x⁺ = x if x 0 and x⁺ = 0 if x < 0 x^- = 0 if x 0 and x^- = -x if x < 0 47. Show that x⁺ 0, x^- 0 x = x⁺ - x^-. |x| = x⁺ + x^-. Finally, if X is a random variable, then X⁺and X^- , the positive and negative parts of X, are nonnegative random variables. Thus, assuming that E(X⁺) or E(X^-) (or both) is finite, we can define E(X) = E(X⁺) - E(X^-) Jensens's Inequality Our next sequence of exercises will establish an important inequality known as Jensen's inequality, named for Johan Jensen. First we need a definition. A real-valued function g defined on an interval S of R is said to be convex on S if for each x₀ in S, there exist numbers a and b (that may depend on x₀) such that ax₀ + b = g(x₀), ax + b g(x) for x in S. 48. Interpret the definition of convex function geometrically. The line y = ax + b is called a supporting line at x₀. You may be more familiar with convexity in terms of the following theorem from calculus: 49. Show that g is convex on S if g has a continuous, non-negative second derivative on S. Hint: Show that the tangent line at x₀ is a supporting line at x₀. 50. Prove Jensen's inequality: If X takes values in an interval S and g is convex on S, then E[g(X)] g[E(X)] Hint: In the definition of convexity given above, let x₀ = E(X) and replace x with X. Then take expected values through the inequality. 51. Suppose that X has density function f(x) = a / x^a^{+ 1} for x > 1, where a > 1 is a parameter. This defines the Pareto distribution with shape parameter a. Find E(X) using the formula in Exercise 40. Find E(1/X). Show that g(x) = 1/x is convex on (0, ). Verify Jensen's inequality by comparing the results of parts (a) and (b). Jensens's inequality extends easily to higher dimensions. The 2-dimensional version is particularly important, because it will be used to derive several special inequalities in the next section. First, a subset S of R² is convex if u, v S and p [0, 1] implies (1 - p)u + pv S. Next, a real-valued function g on S is said to be convex if for each (x₀, y₀) in S, there exist numbers a, b, and c (depending on (x₀, y₀)) such that ax₀ + by₀ + c = g(x₀, y₀), ax + by + c g(x, y) for (x, y) in S. 52. Interpret the definitions of convex set and convex function geometrically. The plane z = ax + by + c is called a supporting plane at (x₀, y₀). From calculus, g is convex on S if g has continuous second derivatives on S and has a positive non-definite second derivative matrix: g_xx 0, g_yy 0, g_xxg_yy - g_xy² 0 on S. 53. Prove Jensen's inequality: If (X, Y) takes values in a convex set S and g is convex on S then E[g(X, Y)] g[E(X), E(Y)]. Hint: In the definition of convexity, let x₀= E(X), y₀= E(Y), and replace x with X, y with Y. Then take expected values through the inequality. 54. Suppose that (X, Y) has probability function f(x, y) = 2(x + y) for 0 < x < y < 1. Show that g(x, y) = x² + y² is convex on the domain of f. Compute E(X² + Y²). Compute [E(X)]² + [E(Y)]². Verify Jensen's inequality by comparing (b) and (c). In both the one and two-dimensional cases, a function g is concave if the inequality in the definition is reversed. Jensen's inequality also reverses. 55. Suppose that x₁, x₂, ..., x_n are positive numbers. Show that the arithmetic mean is at least as large as the geometric mean: (x₁x₂ЗЗЗ x_n)^1/n (x₁ + x₂ + ЗЗЗ + x_n) / n. Hint: Let X be uniformly distributed on {x₁, x₂, ..., x_n} and let g(x) = ln(x). Conditional Expected Value The expected value of a random variable X is based, of course, on the probability measure P for the experiment. This probability measure could be a conditional probability measure, conditioned on a given event B for the experiment (with P(B) > 0). The usual notation is E(X | B), and this expected value is computed by the definitions given at the beginning of this page, except that the conditional density f(x | B) replaces the ordinary density f(x). It is very important to realize that, except for notation, no new concepts are involved. The results we have established for expected value in general have analogues for these conditional expected values. 56. Suppose that X has the density function f(x) = r exp(-rx) for x > 0, where r > 0 is a parameter. This defines the exponential distribution with rate parameter r. For fixed t > 0, find E(X | X > t). 57. Suppose that Y has density function g(n) = (1 - p)^{n - 1}p for n = 1, 2, ... where 0 < p < 1 is a parameter. This defines the geometric distribution with parameter p. Find E(Y | Y is even). 58. Suppose that (X, Y) has density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Find E(XY | Y > X). More generally, the conditional expected value of a random variable, given the value of another random variable, is a very important topic that is treated in a separate section. Virtual Laboratories > Expected Value > [1] 2 3 4 5 6 7 Contents | Applets | Data Sets | Biographies | Resources | Keywords | Љ