Virtual Laboratories > Expected Value > 1 2 3 4 [5] 6 7

5. Conditional Expected Value


As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X is a random variable taking values in a set S and that Y is a random variable taking values in a subset T of R. In this section, we will study the conditional expected value of Y given X, a concept of fundamental importance in both probability and statistics. As we will see, the expected value of Y given X is the function of X that best approximates Y in the mean square sense. Note that in general, X will be vector-valued.

A technical assumption that we will make is that all random variables occurring in expected values have finite second moment.

The Elementary Definition

Note that we can think of (X, Y) as a random variable that takes values in the subset S × T. Suppose first that the that (X, Y) has a continuous distribution with density function f. Recall that the marginal density g of X is given by

g(x) = integralT f(x, y)dy for x S.

and that the conditional density of Y given X = x is given by

h(y | x) = f(x, y) / g(x), for x S, y T.

Finally, the conditional expected value of Y given X = x is simply the mean computed relative to the conditional distribution:

E(Y | X = x) = integralT y h(y | x)dy.

Of course, the conditional mean of Y depends on the given value x of X. Temporarily, let u denote the function from S into R defined by

u(x) = E(Y | X = x) for x S.

The function u is sometimes refereed to as the regression function. The random variable u(X) is called the conditional expected value of Y given X and is denoted E(Y | X).

The General Definition

The random variable E(Y | X) satisfies a key property that characterizes it among all functions of X.

Mathematical Exercise 1. Suppose that r is a function from S into R. Use the change of variables theorem for expected value to show that

E[r(X)E(Y | X)] = E[r(X)Y].

The result in Exercise 1 would also hold in the case that (X, Y) have a joint discrete definition; the same derivation would work, but with sums replacing the integrals.

In fact, the result in Exercise 1 can be used as a definition of conditional expected value, regardless of the joint distribution of (X, Y). Thus, generally we define E(Y | X) to be the random variable that satisfies the condition in Exercise 1 and is of the form E(Y | X) = u(X) for some function u from S into R. Then we define E(Y | X = x) to be u(x).

Properties

Our first consequence of Exercise 1 is a very compact and elegant statement of the law of total probability:

Mathematical Exercise 2. By taking r to be the constant function 1 in Exercise 1, show that

E[E(Y | X)] = E(Y).

Mathematical Exercise 3. Show that, in light of Exercise 2, the condition in Exercise 1 can be restated as follows: For any function r from S into R, Y - E(Y | X) and r(X) are uncorrelated.

The next exercise show that the condition in Exercise 1 characterizes E(Y | X).

Mathematical Exercise 4. Suppose that u(X) and v(X) satisfy the condition in Exercise 1 and hence also the results in Exercises 2 and 3. Show that

  1. var[u(X) - v(X)] = 0.
  2. u(X) = v(X) (with probability 1).

Mathematical Exercise 5. Suppose that s is a function from S into R. Use the characterization in Exercise 1 to show that

E[s(X)Y | X] = s(X)E(Y | X).

The following rule generalizes Exercise 5 and is sometimes referred to as the substitution rule for conditional expected value.

Mathematical Exercise 6. Suppose that s is a function from S × T into R. Show that

E[s(X, Y) | X = x] = E[s(x, Y) | X = x].

Mathematical Exercise 7. Suppose that X and Y are independent. Use the characterization in Exercise 1 to show that

E(Y | X) = E(Y).

Use the general definition to establish the properties in the following exercises, where Y and Z are real-valued random variables. Note that these are analogues of basic properties of ordinary expected value

Mathematical Exercise 8. Show that E(Y + Z | X) = E(Y | X) + E(Z | X).

Mathematical Exercise 9. Show that E(cY | X) = cE(Y | X).

Mathematical Exercise 10. Show that if Y >= 0 then E(Y | X) >= 0.

Mathematical Exercise 11. Show that if Y <= Z then E(Y | X) <= E(Z | X).

Mathematical Exercise 12. Show that |E(Y | X)| <= E(|Y| | X).

Exercises

Mathematical Exercise 13. Suppose that (X, Y) is uniformly distributed on the square R = {(x, y): -6 < x < 6, -6 < y < 6}. Find E(Y | X).

Simulation Exercise 14. In the bivariate uniform experiment, select the square in the list box. Run the simulation 2000 times, updating every 10 runs. Note the relationship between the cloud of points and the graph of the regression function.

Mathematical Exercise 15. Suppose that (X, Y) is uniformly distributed on the triangle R = {(x, y): -6 < y < x < 6}. Find E(Y | X).

Simulation Exercise 16. In the bivariate uniform experiment, select the triangle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the relationship between the cloud of points and the graph of the regression function.

Mathematical Exercise 17. Suppose that (X, Y) has probability density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Find

  1. E(Y | X)
  2. E(X | Y)

Mathematical Exercise 18. Suppose that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x < y < 1. Find

  1. E(Y | X)
  2. E(X | Y)

Mathematical Exercise 19. Suppose that (X, Y) has probability density function f(x, y) = 6x2y for 0 < x < 1, 0 < y < 1. Find

  1. E(Y | X)
  2. E(X | Y)

Mathematical Exercise 20. Suppose that (X, Y) has probability density function f(x, y) = 15x2y for 0 < x < y < 1. Find

  1. E(Y | X)
  2. E(X | Y)

Mathematical Exercise 21. A pair of fair dice are thrown, and the scores (X1, X2) recorded. Let Y = X1+ X2 denote the sum of the scores and U = min{X1, X2} the minimum scored. Find each of the following:

  1. E(Y | X1)
  2. E(U | X1)
  3. E(Y | U)
  4. E(X2| X1)

Mathematical Exercise 22. Suppose that X, Y, and Z are random variables with E(Y | X) = X3, E(Z | X) = 1 / (1 + X2). Find

E[exp(X) Y - sin(X) Z | X].

Conditional Probability

The conditional probability of an event A, given random vector X, is a special case of the conditional expected value. We define

P(A | X) = E(IA | X) where IA is the indicator variable of A.

The properties above for conditional expected value, of course, have special cases for conditional probability. In particular, the following exercise gives a special version of the law of total probability:

Mathematical Exercise 23. Show that P(A) = E[P(A | X)].

Mathematical Exercise 24. A box contains 10 coins, labeled 0 to 9. The probability of heads for coin i is i / 9. A coin is chosen at random from the box and tossed. Find the probability of heads. This problem is an example of Laplace's rule of succession,

The Best Predictor

The next two exercises show that, of all functions of X, E(Y | X) is the best predictor of Y, in the sense of minimizing the mean square error. This is fundamentally important in statistical problems where the predictor vector X can be observed but not the response variable Y.

Mathematical Exercise 25. Let u(X) = E(Y | X) and let v(X) be any other function of X. By adding and subtracting u(X), expanding, and using the result of Exercise 3, show that

E[(Y - v(X))2] = E[(Y - u(X))2] + E[(u(X) - v(X))2].

Mathematical Exercise 26. Use the result of the last exercise to show that if v is a function from S into R then

E{[E(Y | X) - Y]2} <= E{[v(X) - Y)2]

and equality holds if and only if v(X) = E(Y | X) (with probability 1).

Suppose that X is real-valued. In the section on covariance and correlation, we found that the best linear predictor of Y based on X is

Y* = aX + b where a = cov(X, Y) / var(X) and b = E(Y) - a E(X).

On the other hand, E(Y | X) is the best predictor of Y among all functions of X. It follows that if E(Y | X) happens to be a linear function of X then E(Y | X) must agree with Y*.

Mathematical Exercise 27. Using properties of conditional expected value, show directly that if E(Y | X) = aX + b, then a and b are as given above in the definition of Y*.

Mathematical Exercise 28. Suppose that (X, Y) has density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1.

  1. Find Y*, the best linear predictor of Y based on X.
  2. Find E(Y | X)
  3. Graph Y*(x) and E(Y | X = x), as functions of x, on the same axes.

Mathematical Exercise 29. Suppose that (X, Y) has density function f(x, y) = 2(x + y) for 0 < x < y < 1.

  1. Find Y*, the best linear predictor of Y based on X.
  2. Find E(Y | X)
  3. Graph Y*(x) and E(Y | X = x), as functions of x, on the same axes.

Mathematical Exercise 30. Suppose that (X, Y) has density function f(x, y) = 6x2y for 0 < x < 1, 0 < y < 1.

  1. Find Y*, the best linear predictor of Y based on X.
  2. Find E(Y | X)
  3. Graph Y*(x) and E(Y | X = x), as functions of x, on the same axes.

Mathematical Exercise 31. Suppose that (X, Y) has density function f(x, y) = 15x2y for 0 < x < y < 1.

  1. Find Y*, the best linear predictor of Y based on X.
  2. Find E(Y | X)
  3. Graph Y*(x) and E(Y | X = x), as functions of x, on the same axes.

The mean square error of the predictor E(Y | X) will be studied next.

Conditional Variance

The conditional variance of Y given X is naturally defined as follows:

var(Y | X) = E{[Y - E(Y | X)]2 | X}.

Mathematical Exercise 32. Show that var(Y | X) = E(Y2 | X) - [E(Y | X)]2.

Mathematical Exercise 33. Show that var(Y) = E[var(Y | X)] + var[E(Y | X)].

Let us return to the study of predictors of the real-valued random variable Y, and compare the three predictors we have studied in terms of mean square error. First, the best constant predictor of Y is

µ = E(Y),

with mean square error var(Y) = E[(Y - µ)2].

Next, if X is another real-valued random variable, then as we showed in the section on covariance and correlation, the best linear predictor of Y based on X is

Y* = E(Y) + [cov(X, Y) / var(X)][X - E(X)],

with mean square error E[(Y - Y*)] = var(Y)[1 - cor2(X, Y)].

Finally, if X is a general random variable, then as we have shown in this section, the best overall predictor of Y based on X is

E(Y | X)

with mean square error E[var(Y | X)] = var(Y) - var[E(Y | X)].

Mathematical Exercise 34. Suppose that (X, Y) has density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Continue Exercise 28 by finding

  1. var(Y)
  2. var(Y)[1 - cor2(X, Y)]
  3. var(Y) - var[E(Y | X)]

Mathematical Exercise 35. Suppose that (X, Y) has density function f(x, y) = 2(x + y) for 0 < x < y < 1. Continue Exercise 29 by finding

  1. var(Y)
  2. var(Y)[1 - cor2(X, Y)]
  3. var(Y) - var[E(Y | X)]

Mathematical Exercise 36. Suppose that (X, Y) has density function f(x, y) = 6x2y for 0 < x < 1, 0 < y < 1. Continue Exercise 30 by finding

  1. var(Y)
  2. var(Y)[1 - cor2(X, Y)]
  3. var(Y) - var[E(Y | X)]

Mathematical Exercise 37. Suppose that (X, Y) has density function f(x, y) = 15x2y for 0 < x < 1, 0 < y < 1. Continue Exercise 31 by finding

  1. var(Y)
  2. var(Y)[1 - cor2(X, Y)]
  3. var(Y) - var[E(Y | X)]

Mathematical Exercise 38. Suppose that X is uniformly distributed on (0, 1), and that given X, Y is uniformly distributed on (0, X). Find

  1. E(Y | X)
  2. var(Y | X)
  3. var(Y)

Random Sums of Random Variables

Suppose that X1, X2, ... are independent and identically distributed real-valued random variables. Denote the common mean, variance, and moment generating function of these variables as follows:

a = E(Xi), b2 = var(Xi), M(t) = E[exp(tXi)].

Suppose also that N is a random variable taking values in {0, 1, 2, ...}, independent of X1, X2, ... Denote the mean, variance, and probability generating function of N as follows:

c = E(N), d2 = var(N), G(t) = E(tN).

Now define

Y = X1 + X2 + ··· + XN (where Y = 0 if N = 0)

Note that Y is a random sum of random variables; the terms in the sum are random, and the number of terms is random. This type of variable occurs in many different contexts. For example, N might represent the number of customers who enter a store in a given period of time, and Xi the amount spent by the customer i.

Mathematical Exercise 39. Show that E(Y | N) = Na.

Mathematical Exercise 40. Show that E(Y) = ca.

Mathematical Exercise 41. Show that var(Y | N) = Nb2.

Mathematical Exercise 42. Show that var(Y) = cb2 + a2d2.

Mathematical Exercise 43. Show that E[exp(tY)] = G[M(t)].

Mathematical Exercise 44. In the die-coin experiment, a fair die is rolled and then a fair coin is tossed the number of times showing on the die. Let N denote the die score and X the number of heads.

  1. Find the conditional distribution of X given N.
  2. Find E(X | N).
  3. Find var(X | N).
  4. Find E(X).
  5. Find var(X).

Simulation Exercise 45. Run the die-coin experiment 1000 times, updating every 10 runs. Note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Mathematical Exercise 46. The number of customers entering a store in a given hour is a random variable with mean 20 and standard deviation 3. Each customer, independently of the others, spends a random amount of money with mean $50 and standard deviation $5. Find the mean and standard deviation of the amount of money spent during the hour.

Mixtures

Suppose that X1, X2, ... are real-valued random variables and that N is a random variable taking values in {1, 2, ..., }, independent of X1, X2, ... Denote the means, variances and moment generating functions as follows:

µi = E(Xi), di2 = var(Xi), Mi(t) = E[exp(tXi)] for each i.

Denote the density function of N by

pi = P(N = i) for i = 1, 2, ...

Now define a new random variable X by the condition

X = Xi if and only if N = i.

Recall that the distribution of X is a mixture of the distributions of X1, X2, ...

Mathematical Exercise 47. Show that E(X | N) = µN.

Mathematical Exercise 48. Show that E(X) = sumi = 1, 2, ... pi µi.

Mathematical Exercise 49. Show that var(X) = sumi = 1, 2, ... pi (di2 + µi2) - (sumi = 1, 2, ... pi µi)2.

Mathematical Exercise 50. Show that E[exp(tY)] = sumi = 1, 2, ... pi Mi(t).

Mathematical Exercise 51. In the coin-die experiment, a biased coin is tossed with probability of heads 1/3. If the coin lands tails, a fair die is rolled; if the coin lands heads, an ace-six flat die is rolled (faces 1 and 6 have probability 1/4 each, faces 2, 3, 4, 5 have probability 1/8 each). Find the mean and standard deviation of the die score.

Simulation Exercise 52. Run the coin-die experiment 1000 times, updating every 10 runs. Note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Projection

Recall that the set of real-valued random variables on a given probability space (that is, for a given random experiment), with finite second moment, forms an inner product space, with inner product given by

<U, V> = E(UV).

In this context, suppose that Y is a real-valued random variable and X a general random variable. Then E(Y | X) is simply the projection of Y on to the subspace of real-valued random variables that can be expressed as functions of X.