Virtual Laboratories > Expected Value > 1 2 [3] 4 5 6 7

3. Covariance and Correlation


Recall that by taking the expected value of various transformations of a random variable, we can measure many interesting characteristics of the distribution of the variable. In this section, we will study an expected value that measures a special type of relationship between two real-valued variables. This relationship is very important both in probability and statistics.

Definition

As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X and Y are real-valued random variables for the experiment with means E(X), E(Y) and variances var(X), var(Y), respectively (assumed finite). The covariance of X and Y is defined by

cov(X, Y) = E{[X - E(X)][Y - E(Y)]}

and (assuming the variances are positive) the correlation of X and Y is defined by

cor(X, Y) = cov(X, Y) / [sd(X) sd(Y)].

Correlation is a scaled version of covariance; note that the two parameters always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated. As these terms suggest, covariance and correlation measure a certain kind of dependence between the variables.

Properties

The following exercises give some basic properties of covariance. The main tool that you will need is the fact that expected value is a linear operation.

Mathematical Exercise 1. Show that cov(X, Y) = E(XY) - E(X)E(Y)

Mathematical Exercise 2. Show that cov(X, Y) = cov(Y, X).

Mathematical Exercise 3. Show that cov(X, X) = var(X).

Mathematical Exercise 4. Show that cov(aX + bY, Z) = a cov(X, Z) + b cov(Y, Z).

By Exercise 1, we see that X and Y are uncorrelated if and only if

E(XY) = E(X)E(Y).

In particular, if X and Y are independent, then they are uncorrelated. However, the converse fails with a passion, as Exercise 11 below shows.

Mathematical Exercise 5. Suppose that Xj, j in J and Yk, k in K are real-valued random variables for an experiment, and that aj, j in J and bk, k in K are constants (J and K are finite index sets). Prove the following property (known as bi-linearity).

cov(sumj in J aj Xj, sumk in K bk Yk) = sumj in J sumk in K aj bk cov(Xj, Xk).

Mathematical Exercise 6. Show that the correlation between X and Y is simply the covariance of the corresponding standard scores:

cor(X, Y) = cov{[X - E(X)] / sd(X), [Y - E(Y)] / sd(Y)].

Computational Exercises

Mathematical Exercise 7. Suppose that (X, Y) is uniformly distributed on the square R = {(x, y): -6 < x < 6, -6 < y < 6}. Show that X and Y are independent and hence uncorrelated.

Simulation Exercise 8. In the bivariate uniform experiment, select the square in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

Mathematical Exercise 9. Suppose that (X, Y) is uniformly distributed on the triangular region R = {(x, y): -6 < y < x < 6}. Show that

cor(X, Y) = 1/2.

Simulation Exercise 10. In the bivariate uniform experiment, select the triangle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

Mathematical Exercise 11. Suppose that (X, Y) is uniformly distributed on the circular region R = {(x, y): x2 + y2 < 36}. Show that X and Y are dependent but still uncorrelated.

Simulation Exercise 12. In the bivariate uniform experiment, select the circle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

Mathematical Exercise 13. Suppose that X is uniformly distributed on the interval (-1, 1) and Y = X2. Show that X and Y are uncorrelated even though Y depends functionally on X (the strongest form of dependence).

Mathematical Exercise 14. A pair of fair dice are thrown and the scores (X1, X2) recorded. Let Y = X1 + X2 denote the sum of the scores, U = min{X1, X2} the minimum score, and V = max{X1, X2} the maximum score. Find the covariance and correlation of each of the following pairs of variables:

  1. X1, X2.
  2. X1, Y.
  3. X1, U.
  4. U, V
  5. U, Y

Mathematical Exercise 15. Suppose that X and Y are random variables with cov(X, Y) = 3. Find

cov(2X - 5, 4Y + 2).

Mathematical Exercise 16. Suppose that (X, Y) has probability density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Find

  1. cov(X, Y)
  2. cor(X, Y).

Mathematical Exercise 17. Suppose that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x < y < 1. Find

  1. cov(X, Y)
  2. cor(X, Y).

Mathematical Exercise 18. Suppose that (X, Y) has probability density function f(x, y) = 6x2y for 0 < x < 1, 0 < y < 1. Find

  1. cov(X, Y)
  2. cor(X, Y).

Mathematical Exercise 19. Suppose that (X, Y) has probability density function f(x, y) = 15x2y for 0 < x < y < 1. Find

  1. cov(X, Y)
  2. cor(X, Y).

Variance of a Sum

You will now show that the variance of a sum of variables is the sum of the pairwise covariances. Suppose that Xj, j in J is a collection of real-valued random variables for an experiment, where J is a finite index set.

Mathematical Exercise 20. Use Exercises 3, 5 to show that

var[sumj in J Xi] = sumj in Jsumk in K cov(Xj, Xk).

The result in the previous exercise can be very useful; it is used for example to compute the variance of the hypergeometric distribution and the matching distribution.

Mathematical Exercise 21. Suppose that X1, X2, ..., Xn are pairwise uncorrelated (this holds in particular if they are mutually independent). Show that

var(X1 + X2 + ··· + Xn ) = var(X1) + var(X2) + ··· + var(Xn).

Mathematical Exercise 22. Show that var(X + Y) + var(X - Y) = 2 var(X) + 2 var(Y).

Mathematical Exercise 23. Suppose that var(X) = var(Y). Show that X + Y and X - Y are uncorrelated.

Mathematical Exercise 24. Suppose X and Y are random variables with var(X) = 5, var(Y) = 9, cov(X, Y) = -3. Find var(2X + 3Y - 7).

Mathematical Exercise 25. Suppose that X and Y are independent variables with var(X) = 6, var(Y) = 8. Find var(3X - 4Y + 5).

Mathematical Exercise 26. Suppose that X1, X2, ..., Xn are independent and have a common distribution with mean µ and variance d2. (Thus, the variables form a random sample from the common distribution). Let Yn = X1 + X2 + ··· + Xn. Show that 

  1. E(Yn) = nµ.
  2. var(Yn) = n d2.
  3. sd(Yn) = n1/2 d.

Mathematical Exercise 27. In the same setting as the previous exercise, let Mn = Yn / n. Thus, Mn is the sample mean. Show that 

  1. E(Mn) = µ.
  2. var(Mn) = d2 / n.
  3. sd(Mn) = d / n1/2.
  4. var(Mn) converges to 0 as n converges to infinity.
  5. P(|Mn - µ| > r) converges to 0 as n converges to infinity for any r > 0 (Hint: Use Chebyshev's inequality).

Part (e) of the last exercise means that Mn converges to µ as n converges to infinity in probability. This is the weak law of large numbers, one of the fundamental theorems of probability. 

Mathematical Exercise 28. Suppose that n fair dice are thrown. 

  1. Find the mean and standard deviation of the sum of the scores.
  2. Find the mean and standard deviation of the average of the scores.

Simulation Exercise 29. In the dice experiment, select the following random variables. In each case, increase the number of dice and observe the size and location of the density function and the mean/standard deviation bar. With n = 20 dice, run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the empirical moments to the distribution moments.

  1. The sum of the scores.
  2. The average of the scores.

Mathematical Exercise 30. Suppose that I1, I2, ..., In are independent indicator variables with P(Ij = 1) = p for each j. The distribution of X = I1 + I2 + ··· + In is the binomial distribution with parameters n and p. Show that

  1. E(X) = np
  2. var(X) = np(1 - p).

Events

Suppose that A and B are events in a random experiment. The covariance and correlation of A and B are defined to be the covariance and correlation, respectively, of their indicator random variables IA and IB.

Mathematical Exercise 31. Show that

  1. cov(A, B) = P(A B) - P(A)P(B)
  2. cor(A, B) = [P(A B) - P(A)P(B)] / [P(A)P(B)P(Ac)P(Bc)]1/2.

In particular, note that A and B are positively correlated, negatively correlated, or independent, respectively (as defined in the section on conditional probability) if and only if the indicator variables of A and B are positively correlated, negatively correlated, or uncorrelated, as defined in this section.

Mathematical Exercise 32. Show that

  1. cov(A, Bc) = -cov(A, B)
  2. cov(Ac, Bc) = cov(A, B)

Mathematical Exercise 33. Suppose that A subset B. Show that

  1. cov(A, B) = P(A)P(Bc)
  2. cor(A, B) = [P(A)P(Bc) / P(B)P(Ac)]1/2.

Mathematical Exercise 34. Suppose that A and B are events in an experiment with P(A) = 1/2, P(B) = 1/3, P(A B) = 1/8. Find the covariance and correlation between A and B.

The Best Linear Predictor

What linear function of X is closest to Y in the sense of minimizing mean square error? The question is fundamentally important in the case where random variable X (the predictor variable) is observable and random variable Y (the response variable) is not. The linear function can be used to estimate Y from an observed value of X. Moreover, the solution will show that covariance and correlation measure the linear relationship between X and Y. To avoid trivial cases, let us assume that var(X) > 0 and var(Y) > 0.

Mathematical Exercise 35. Show that

Mathematical Exercise 36. Use basic calculus to show that E{[Y - (aX + b)]2} is minimized when

  1. a = cov(X, Y) / var(X)
  2. b = E(Y) - a E(X)

Thus, the best linear predictor of Y based on X is

Y* = E(Y) + [cov(X, Y) / var(X)][X - E(X)].

Mathematical Exercise 37. Show that the minimum mean square error, among all linear functions of X, is

E[(Y - Y*)2] = var(Y)[1 - cor2(X, Y)].

Mathematical Exercise 38. From the last exercise, show that

  1. -1 cor(X, Y) 1
  2. -sd(X) sd(Y) cov(X, Y) sd(X) sd(Y)
  3. cor(X, Y) = 1 if and only if Y = aX + b with probability 1 for some constants a > 0 and b.
  4. cor(X, Y) = -1 if and only if Y = aX + b with probability 1 for some constants a < 0 and b.

These exercises show clearly that cov(X, Y) and cor(X, Y) measures the linear association between X and Y.

Recall that the best constant predictor of Y, in the sense of minimizing mean square error, is E(Y) and the minimum value of the mean square error for this predictor is var(Y). Thus, the difference between var(Y) and the mean square error in Exercise 35 is the reduction in the variance of Y when the linear term in X is added to the predictor.

Mathematical Exercise 39. Show that var(Y) - E[(Y - Y*)2] = var(Y)cor2(X, Y).

The fraction of the reduction is cor2(X, Y), and hence this quantity is called the (distribution) coefficient of determination. The line

y = E(Y) + [cov(X, Y) / var(X)][x - E(X)]

is known as the (distribution) regression line for Y based on X. Note that the regression line passes through (E(X), E(Y)), the center of the joint distribution. However, the choice of predictor variable and response variable is crucial.

Mathematical Exercise 40. Show that regression line for Y based on X and the regression line for X based on Y are not the same line, except in the trivial case where the variables are perfectly correlated.

Mathematical Exercise 41. Suppose that (X, Y) has probability density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1.

  1. Find the best linear predictor of Y based on X.
  2. Find the best linear predictor of X based on Y.
  3. Find the coefficient of determination.

Mathematical Exercise 42. Suppose that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x < y < 1.

  1. Find the best linear predictor of Y based on X.
  2. Find the best linear predictor of X based on Y.
  3. Find the coefficient of determination.

Mathematical Exercise 43. Suppose that (X, Y) has probability density function f(x, y) = 6x2y for 0 < x < 1, 0 < y < 1.

  1. Find the best linear predictor of Y based on X.
  2. Find the best linear predictor of X based on Y.
  3. Find the coefficient of determination.

Mathematical Exercise 44. Suppose that (X, Y) has probability density function f(x, y) = 15x2y for 0 < x < y < 1.

  1. Find the best linear predictor of Y based on X.
  2. Find the best linear predictor of X based on Y.
  3. Find the coefficient of determination.

Mathematical Exercise 45. A pair of fair dice are thrown and the scores (X1, X2) recorded. Let Y = X1 + X2 denote the sum of the scores, U = min{X1, X2} the minimum score, and V = max{X1, X2} the maximum score. Find

  1. The best linear predictor of Y based on X1.
  2. The best linear predictor of U based on X1.
  3. The best linear predictor of V based on X1.

Mathematical Exercise 46. Suppose that A and B are events in a random experiment with 0 < P(A) < 1 and 0 < P(B) < 1. Show that

  1. A and B have correlation 1 if and only P(A intersect Bc) = 0 and P(B intersect Ac) = 0 (That is, A = B with probability 1).
  2. A and B have correlation -1 if and only P(A intersect B) = 0 and P(Bc intersect Ac) = 0 (That is, A = Bc with probability 1).

The corresponding statistical problem of estimating a and b, when the distribution parameters in Exercise 34 are unknown, is considered in the section on Sample Covariance and Correlation. A natural generalization of the problem we are considering here is to find the function of X (using all reasonable functions, not just linear ones) that is closest to Y in the sense of minimizing square error. The solution is obtained in the section on Conditional Expected Value.

Inner Product

Covariance is closely related to key concepts in the theory of vector spaces. This connection can help illustrate many of the properties of covariance from a different point of view. First, if X and Y are real-valued random variables, define the inner product of X and Y by

<X, Y> = E(XY).

The following exercises are analogues of the basic properties of covariance given above, and show that this definition really does give an inner product on the vector space of random variable with finite second moment. (As usual, we identify two random variables that agree with probability 1).

Mathematical Exercise 47. Show that <X, Y> = <Y, X>.

Mathematical Exercise 48. Show that <X, X> >= 0.

Mathematical Exercise 49. Show that <X, X> = 0 if and only if P(X = 0) = 1.

Mathematical Exercise 50. Show that <aX, Y> = a <X, Y>.

Mathematical Exercise 51. Show that <X, Y + Z> = <X, Z> + <Y, Z>

Covariance and correlation can easily be expressed in terms of this inner product.

Mathematical Exercise 52. Show that cov(X, Y) = <X - E(X), Y - E(Y)>.

Mathematical Exercise 53. Show that cor(X, Y) = <[X - E(X)] / sd(X), [Y - E(Y)] / sd(Y)>.

Thus the covariance of X and Y is the inner product of the corresponding centered variables. The correlation of X and Y is the inner product of the corresponding standard scores.

The norm associated with the inner product is the 2-norm studied in the last section. This fact is a fundamental reason why the 2-norm plays such a special, honored role; of all of the k-norms, only the 2-norm corresponds to an inner product.

Mathematical Exercise 54. Show that <X, X> = ||X||22 = E(X2).

Note that the best linear predictor of Y based on X derived above is simply the projection of Y onto the subspace of random variables of the form aX + b, where a and b are real numbers.

The next exercise gives Hölder's inequality, named for Otto Hölder.

Mathematical Exercise 55. Suppose that j, k >1 with 1 / j + 1 / k = 1. Show that <|X|, |Y|> <= ||X||j ||Y||k.

  1. Show that g(x, y) = x1/j y1/k is concave on {(x, y) in R2: x >= 0, y >= 0}.
  2. Use (a) and Jensen's inequality to show that if U and V are nonnegative random variables then E(U1/j V1/k) <= [E(U)]1/j [E(V)]1/k.
  3. In (c), let U = |X|j, V = |Y|k.

In the context of the last exercise, j, k are called conjugate exponents. If we let j = k = 2 in Hölder's inequality, then we get the Cauchy-Schwarz inequality, named for Augustin Cauchy and Karl Schwarz. In turn, this is equivalent to the inequalities in Exercise 36.

E(|XY|) <= [E(X2)]1/2 [E(Y2)]1/2.

Mathematical Exercise 56. Suppose that (X, Y) has density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Verify Hölder's inequality in the following cases:

  1. j = k = 2
  2. j = 3, k = 3 / 2.

Mathematical Exercise 57. Suppose that j and k are conjugate exponents.

  1. Show that k = j / (j - 1).
  2. Show that k decreases to 1 as j increases to infinity.

The following exercise is an analogue of the result in Exercise 22.

Mathematical Exercise 58. Prove the parallelogram rule:

||X + Y||22 + ||X - Y||22 = 2||X||22 + 2||Y||22.

The following exercise is an analogue of the result in Exercise 21.

Mathematical Exercise 59. Prove the Pythagorean theorem, named for Pythagoras of course: if X1, X2, ..., Xn are random variables with <Xi, Xj> = 0 for distinct i and j then

||X1 + X2 + ··· + Xn ||22 = ||X1||22 + ||X2||22 + ··· + ||Xn||22.