Virtual Laboratories > Expected Value > 1 2 [3] 4 5 6 7
Recall that by taking the expected value of various transformations of a random variable, we can measure many interesting characteristics of the distribution of the variable. In this section, we will study an expected value that measures a special type of relationship between two real-valued variables. This relationship is very important both in probability and statistics.
As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X and Y are real-valued random variables for the experiment with means E(X), E(Y) and variances var(X), var(Y), respectively (assumed finite). The covariance of X and Y is defined by
cov(X, Y) = E{[X - E(X)][Y - E(Y)]}
and (assuming the variances are positive) the correlation of X and Y is defined by
cor(X, Y) = cov(X, Y) / [sd(X) sd(Y)].
Correlation is a scaled version of covariance; note that the two parameters always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated. As these terms suggest, covariance and correlation measure a certain kind of dependence between the variables.
The following exercises give some basic properties of covariance. The main tool that you will need is the fact that expected value is a linear operation.
1. Show that cov(X,
Y) = E(XY) - E(X)E(Y)
2. Show that cov(X,
Y) = cov(Y, X).
3. Show that cov(X,
X) = var(X).
4. Show that cov(aX
+ bY, Z) = a cov(X, Z) + b cov(Y,
Z).
By Exercise 1, we see that X and Y are uncorrelated if and only if
E(XY) = E(X)E(Y).
In particular, if X and Y are independent, then they are uncorrelated. However, the converse fails with a passion, as Exercise 11 below shows.
5.
Suppose that Xj, j in J and Yk,
k in K are real-valued random variables for an experiment, and
that aj, j in J and bk, k
in K are constants (J and K are finite index sets).
Prove the following property (known as bi-linearity).
cov(j
in J aj Xj,
k
in K bk Yk) =
j
in J
k
in K aj bk cov(Xj,
Xk).
6.
Show that the correlation between X and Y is simply the
covariance of the corresponding standard scores:
cor(X, Y) = cov{[X - E(X)] / sd(X), [Y - E(Y)] / sd(Y)].
7. Suppose that (X,
Y) is uniformly distributed on the square R = {(x, y): -6 < x < 6, -6 < y
< 6}. Show that X and Y are independent and hence uncorrelated.
8. In the
bivariate uniform experiment, select the square in the list box. Run the
simulation 2000 times, updating every 10 runs. Note the value of the correlation and the
shape of the cloud of points in the scatterplot.
9. Suppose that (X,
Y) is uniformly distributed on the triangular region R = {(x, y): -6 < y < x < 6}.
Show that
cor(X, Y) = 1/2.
10. In the
bivariate uniform experiment, select the triangle in the list box. Run the
simulation 2000 times, updating every 10 runs. Note the value of the correlation and the
shape of the cloud of points in the scatterplot.
11. Suppose that (X,
Y) is uniformly distributed on the circular region R = {(x, y): x2 + y2
< 36}. Show that X and Y are dependent but still uncorrelated.
12. In the
bivariate uniform experiment, select the circle in the list box. Run the
simulation 2000 times, updating every 10 runs. Note the value of the correlation and the
shape of the cloud of points in the scatterplot.
13. Suppose that X
is uniformly distributed on the interval (-1, 1) and Y = X2.
Show that X and Y are uncorrelated even though Y depends functionally on
X (the strongest form of dependence).
14.
A pair of fair dice are thrown and the scores (X1, X2)
recorded. Let Y = X1 + X2 denote the
sum of the scores, U = min{X1, X2}
the minimum score, and V = max{X1, X2}
the maximum score. Find the covariance and correlation of each of the following
pairs of variables:
15.
Suppose that X and Y are random variables with cov(X, Y)
= 3. Find
cov(2X - 5, 4Y + 2).
16. Suppose
that (X, Y) has probability density function f(x, y) =
x + y for 0 < x
< 1, 0 < y < 1. Find
17. Suppose
that (X, Y) has probability density function f(x, y) =
2(x + y) for 0 < x
< y < 1. Find
18. Suppose
that (X, Y) has probability density function f(x, y) =
6x2y for 0 <
x < 1, 0 < y < 1. Find
19. Suppose
that (X, Y) has probability density function f(x, y) =
15x2y for 0 <
x < y < 1. Find
You will now show that the variance of a sum of variables is the sum of the pairwise covariances. Suppose that Xj, j in J is a collection of real-valued random variables for an experiment, where J is a finite index set.
20. Use Exercises
3, 5 to show that
var[j
in J Xi] =
j
in J
k
in K cov(Xj, Xk).
The result in the previous exercise can be very useful; it is used for example to compute the variance of the hypergeometric distribution and the matching distribution.
21. Suppose
that X1, X2, ..., Xn are
pairwise uncorrelated (this holds in particular if they are mutually independent). Show
that
var(X1 + X2 + ··· + Xn ) = var(X1) + var(X2) + ··· + var(Xn).
22.
Show that var(X + Y) + var(X - Y) = 2 var(X)
+ 2 var(Y).
23.
Suppose that var(X) = var(Y). Show that X + Y and X
- Y are uncorrelated.
24. Suppose X
and Y are random variables with var(X) = 5, var(Y) = 9,
cov(X, Y) = -3. Find var(2X + 3Y - 7).
25.
Suppose that X and Y are independent variables with var(X)
= 6, var(Y) = 8. Find var(3X - 4Y + 5).
26.
Suppose that X1, X2, ..., Xn
are independent and have a common distribution with mean µ and variance d2.
(Thus, the variables form a random sample
from the common distribution). Let Yn = X1 + X2 + ··· + Xn.
Show that
27. In the same setting as the previous exercise, let Mn
= Yn / n. Thus, Mn is
the sample mean. Show
that
Part (e) of the last exercise means that Mn
µ as n
in
probability. This is the weak
law of large numbers, one of the fundamental theorems of probability.
28.
Suppose that n fair dice are thrown.
29. In the dice
experiment, select the following random variables. In each case, increase the number of
dice and observe the size and location of the density function and the mean/standard deviation bar.
With n = 20 dice, run the experiment 1000 times, updating every 10
runs, and note the apparent convergence of the empirical moments to the
distribution moments.
30.
Suppose that I1, I2, ..., In
are independent indicator variables with P(Ij = 1) = p
for each j. The distribution of X = I1 + I2
+ ··· + In is the binomial
distribution with parameters n and p. Show that
Suppose that A and B are events in a random experiment. The covariance and correlation of A and B are defined to be the covariance and correlation, respectively, of their indicator random variables IA and IB.
31.
Show that
In particular, note that A and B are positively correlated, negatively correlated, or independent, respectively (as defined in the section on conditional probability) if and only if the indicator variables of A and B are positively correlated, negatively correlated, or uncorrelated, as defined in this section.
32.
Show that
33. Suppose that A
B. Show that
34.
Suppose that A and B are events in an experiment with P(A)
= 1/2, P(B) = 1/3, P(A
B) = 1/8. Find the covariance and correlation between A and B.
What linear function of X is closest to Y in the sense of minimizing mean square error? The question is fundamentally important in the case where random variable X (the predictor variable) is observable and random variable Y (the response variable) is not. The linear function can be used to estimate Y from an observed value of X. Moreover, the solution will show that covariance and correlation measure the linear relationship between X and Y. To avoid trivial cases, let us assume that var(X) > 0 and var(Y) > 0.
35.
Show that
36.
Use basic calculus to show that E{[Y - (aX + b)]2}
is minimized when
Thus, the best linear predictor of Y based on X is
Y* = E(Y) + [cov(X, Y) / var(X)][X - E(X)].
37.
Show that the minimum mean square error, among all linear functions of X,
is
E[(Y - Y*)2] = var(Y)[1 - cor2(X, Y)].
38.
From the last exercise, show that
These exercises show clearly that cov(X, Y) and cor(X, Y) measures the linear association between X and Y.
Recall that the best constant predictor of Y, in the sense of minimizing mean square error, is E(Y) and the minimum value of the mean square error for this predictor is var(Y). Thus, the difference between var(Y) and the mean square error in Exercise 35 is the reduction in the variance of Y when the linear term in X is added to the predictor.
39.
Show that var(Y) - E[(Y - Y*)2]
= var(Y)cor2(X, Y).
The fraction of the reduction is cor2(X, Y), and hence this quantity is called the (distribution) coefficient of determination. The line
y = E(Y) + [cov(X, Y) / var(X)][x - E(X)]
is known as the (distribution) regression line for Y based on X. Note that the regression line passes through (E(X), E(Y)), the center of the joint distribution. However, the choice of predictor variable and response variable is crucial.
40.
Show that regression line for Y based on X and the regression line
for X based on Y are not the same line, except in the trivial case where the variables are
perfectly correlated.
41. Suppose
that (X, Y) has probability density function f(x, y) = x + y for 0 < x
< 1, 0 < y < 1.
42. Suppose
that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x
< y < 1.
43. Suppose
that (X, Y) has probability density function f(x, y) = 6x2y for 0 <
x < 1, 0 < y < 1.
44. Suppose
that (X, Y) has probability density function f(x, y) = 15x2y for 0 <
x < y < 1.
45.
A pair of fair dice are thrown and the scores (X1, X2)
recorded. Let Y = X1 + X2 denote the
sum of the scores, U = min{X1, X2}
the minimum score, and V = max{X1, X2}
the maximum score. Find
46. Suppose that A and B are events in a random experiment
with 0 < P(A) < 1 and 0 < P(B)
< 1. Show that
The corresponding statistical problem of estimating a and b, when the distribution parameters in Exercise 34 are unknown, is considered in the section on Sample Covariance and Correlation. A natural generalization of the problem we are considering here is to find the function of X (using all reasonable functions, not just linear ones) that is closest to Y in the sense of minimizing square error. The solution is obtained in the section on Conditional Expected Value.
Covariance is closely related to key concepts in the theory of vector spaces. This connection can help illustrate many of the properties of covariance from a different point of view. First, if X and Y are real-valued random variables, define the inner product of X and Y by
<X, Y> = E(XY).
The following exercises are analogues of the basic properties of covariance given above, and show that this definition really does give an inner product on the vector space of random variable with finite second moment. (As usual, we identify two random variables that agree with probability 1).
47.
Show that <X, Y> = <Y, X>.
48.
Show that <X, X>
0.
49.
Show that <X, X> = 0 if and only if P(X = 0) =
1.
50.
Show that <aX, Y> = a <X, Y>.
51.
Show that <X, Y + Z> = <X, Z> +
<Y, Z>
Covariance and correlation can easily be expressed in terms of this inner product.
52.
Show that cov(X, Y) = <X - E(X), Y
- E(Y)>.
53.
Show that cor(X, Y) = <[X - E(X)] / sd(X),
[Y - E(Y)] / sd(Y)>.
Thus the covariance of X and Y is the inner product of the corresponding centered variables. The correlation of X and Y is the inner product of the corresponding standard scores.
The norm associated with the inner product is the 2-norm studied in the last section. This fact is a fundamental reason why the 2-norm plays such a special, honored role; of all of the k-norms, only the 2-norm corresponds to an inner product.
54.
Show that <X, X> = ||X||22 = E(X2).
Note that the best linear predictor of Y based on X derived above is simply the projection of Y onto the subspace of random variables of the form aX + b, where a and b are real numbers.
The next exercise gives Hölder's inequality, named for Otto Hölder.
55.
Suppose that j, k >1 with 1 / j + 1 / k = 1. Show
that <|X|, |Y|>
||X||j ||Y||k.
In the context of the last exercise, j, k are called conjugate exponents. If we let j = k = 2 in Hölder's inequality, then we get the Cauchy-Schwarz inequality, named for Augustin Cauchy and Karl Schwarz. In turn, this is equivalent to the inequalities in Exercise 36.
E(|XY|)
[E(X2)]1/2 [E(Y2)]1/2.
56.
Suppose that (X, Y) has density function f(x, y)
= x + y for 0 < x < 1, 0 < y < 1. Verify
Hölder's inequality in the following cases:
57.
Suppose that j and k are conjugate exponents.
The following exercise is an analogue of the result in Exercise 22.
58.
Prove the parallelogram rule:
||X + Y||22 + ||X - Y||22 = 2||X||22 + 2||Y||22.
The following exercise is an analogue of the result in Exercise 21.
59.
Prove the Pythagorean theorem, named for Pythagoras of
course: if X1,
X2, ..., Xn are random variables with
||X1 + X2 + ··· + Xn ||22 = ||X1||22 + ||X2||22 + ··· + ||Xn||22.