Covariance and Correlation

3. Covariance and Correlation

Recall that by taking the expected value of various transformations of a random variable, we can measure many interesting characteristics of the distribution of the variable. In this section, we will study an expected value that measures a special type of relationship between two real-valued variables. This relationship is very important both in probability and statistics.

Definition

As usual, we start with a random experiment that has a sample space and a probability measure P. Suppose that X and Y are real-valued random variables for the experiment with means E(X), E(Y) and variances var(X), var(Y), respectively (assumed finite). The covariance of X and Y is defined by

cov(X, Y) = E{[X - E(X)][Y - E(Y)]}

and (assuming the variances are positive) the correlation of X and Y is defined by

cor(X, Y) = cov(X, Y) / [sd(X) sd(Y)].

Correlation is a scaled version of covariance; note that the two parameters always have the same sign (positive, negative, or 0). When the sign is positive, the variables are said to be positively correlated; when the sign is negative, the variables are said to be negatively correlated; and when the sign is 0, the variables are said to be uncorrelated. As these terms suggest, covariance and correlation measure a certain kind of dependence between the variables.

Properties

The following exercises give some basic properties of covariance. The main tool that you will need is the fact that expected value is a linear operation.

$Mathematical Exercise$ 1. Show that cov(X, Y) = E(XY) - E(X)E(Y)

$Mathematical Exercise$ 2. Show that cov(X, Y) = cov(Y, X).

$Mathematical Exercise$ 3. Show that cov(X, X) = var(X).

$Mathematical Exercise$ 4. Show that cov(aX + bY, Z) = a cov(X, Z) + b cov(Y, Z).

By Exercise 1, we see that X and Y are uncorrelated if and only if

E(XY) = E(X)E(Y).

In particular, if X and Y are independent, then they are uncorrelated. However, the converse fails with a passion, as Exercise 11 below shows.

$Mathematical Exercise$ 5. Suppose that X_j, j in J and Y_k, k in K are real-valued random variables for an experiment, and that a_j, j in J and b_k, k in K are constants (J and K are finite index sets). Prove the following property (known as bi-linearity).

cov(_{j
in J} a_j X_j, _{k
in K} b_k Y_k) = _{j
in J} _{k
in K} a_j b_k cov(X_j, X_k).

$Mathematical Exercise$ 6. Show that the correlation between X and Y is simply the covariance of the corresponding standard scores:

cor(X, Y) = cov{[X - E(X)] / sd(X), [Y - E(Y)] / sd(Y)].

Computational Exercises

$Mathematical Exercise$ 7. Suppose that (X, Y) is uniformly distributed on the square R = {(x, y): -6 < x < 6, -6 < y < 6}. Show that X and Y are independent and hence uncorrelated.

8. In the bivariate uniform experiment, select the square in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

$Mathematical Exercise$ 9. Suppose that (X, Y) is uniformly distributed on the triangular region R = {(x, y): -6 < y < x < 6}. Show that

cor(X, Y) = 1/2.

10. In the bivariate uniform experiment, select the triangle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

$Mathematical Exercise$ 11. Suppose that (X, Y) is uniformly distributed on the circular region R = {(x, y): x² + y² < 36}. Show that X and Y are dependent but still uncorrelated.

12. In the bivariate uniform experiment, select the circle in the list box. Run the simulation 2000 times, updating every 10 runs. Note the value of the correlation and the shape of the cloud of points in the scatterplot.

$Mathematical Exercise$ 13. Suppose that X is uniformly distributed on the interval (-1, 1) and Y = X². Show that X and Y are uncorrelated even though Y depends functionally on X (the strongest form of dependence).

$Mathematical Exercise$ 14. A pair of fair dice are thrown and the scores (X₁, X₂) recorded. Let Y = X₁+ X₂ denote the sum of the scores, U = min{X₁, X₂} the minimum score, and V = max{X₁, X₂} the maximum score. Find the covariance and correlation of each of the following pairs of variables:

X₁, X₂.
X₁, Y.
X₁, U.
U, V
U, Y

$Mathematical Exercise$ 15. Suppose that X and Y are random variables with cov(X, Y) = 3. Find

cov(2X - 5, 4Y + 2).

$Mathematical Exercise$ 16. Suppose that (X, Y) has probability density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Find

cov(X, Y)
cor(X, Y).

$Mathematical Exercise$ 17. Suppose that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x < y < 1. Find

cov(X, Y)
cor(X, Y).

$Mathematical Exercise$ 18. Suppose that (X, Y) has probability density function f(x, y) = 6x²y for 0 < x < 1, 0 < y < 1. Find

cov(X, Y)
cor(X, Y).

$Mathematical Exercise$ 19. Suppose that (X, Y) has probability density function f(x, y) = 15x²y for 0 < x < y < 1. Find

cov(X, Y)
cor(X, Y).

Variance of a Sum

You will now show that the variance of a sum of variables is the sum of the pairwise covariances. Suppose that X_j, j in J is a collection of real-valued random variables for an experiment, where J is a finite index set.

$Mathematical Exercise$ 20. Use Exercises 3, 5 to show that

var[_{j
in J} X_i] = _{j
in J}_{k
in K} cov(X_j, X_k).

The result in the previous exercise can be very useful; it is used for example to compute the variance of the hypergeometric distribution and the matching distribution.

$Mathematical Exercise$ 21. Suppose that X₁, X₂, ..., X_n are pairwise uncorrelated (this holds in particular if they are mutually independent). Show that

var(X₁ + X₂ + ЗЗЗ + X_n ) = var(X₁) + var(X₂) + ЗЗЗ + var(X_n).

$Mathematical Exercise$ 22. Show that var(X + Y) + var(X - Y) = 2 var(X) + 2 var(Y).

$Mathematical Exercise$ 23. Suppose that var(X) = var(Y). Show that X + Y and X - Y are uncorrelated.

$Mathematical Exercise$ 24. Suppose X and Y are random variables with var(X) = 5, var(Y) = 9, cov(X, Y) = -3. Find var(2X + 3Y - 7).

$Mathematical Exercise$ 25. Suppose that X and Y are independent variables with var(X) = 6, var(Y) = 8. Find var(3X - 4Y + 5).

$Mathematical Exercise$ 26. Suppose that X₁, X₂, ..., X_n are independent and have a common distribution with mean Е and variance d². (Thus, the variables form a random sample from the common distribution). Let Y_n = X₁ + X₂ + ЗЗЗ + X_n. Show that

E(Y_n) = nЕ.
var(Y_n) = n d².
sd(Y_n) = n^1/2 d.

$Mathematical Exercise$ 27. In the same setting as the previous exercise, let M_n = Y_n / n. Thus, M_n is the sample mean. Show that

E(M_n) = Е.
var(M_n) = d² / n.
sd(M_n) = d / n^1/2.
var(M_n) 0 as n .
P(|M_n- Е| > r) 0 as n for any r > 0 (Hint: Use Chebyshev's inequality).

Part (e) of the last exercise means that M_n Е as n in probability. This is the weak law of large numbers, one of the fundamental theorems of probability.

$Mathematical Exercise$ 28. Suppose that n fair dice are thrown.

Find the mean and standard deviation of the sum of the scores.
Find the mean and standard deviation of the average of the scores.

29. In the dice experiment, select the following random variables. In each case, increase the number of dice and observe the size and location of the density function and the mean/standard deviation bar. With n = 20 dice, run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the empirical moments to the distribution moments.

The sum of the scores.
The average of the scores.

$Mathematical Exercise$ 30. Suppose that I₁, I₂, ..., I_n are independent indicator variables with P(I_j = 1) = p for each j. The distribution of X = I₁ + I₂ + ЗЗЗ + I_n is the binomial distribution with parameters n and p. Show that

E(X) = np
var(X) = np(1 - p).

Events

Suppose that A and B are events in a random experiment. The covariance and correlation of A and B are defined to be the covariance and correlation, respectively, of their indicator random variables I_A and I_B.

$Mathematical Exercise$ 31. Show that

cov(A, B) = P(A B) - P(A)P(B)
cor(A, B) = [P(A B) - P(A)P(B)] / [P(A)P(B)P(A^c)P(B^c)]^1/2.

In particular, note that A and B are positively correlated, negatively correlated, or independent, respectively (as defined in the section on conditional probability) if and only if the indicator variables of A and B are positively correlated, negatively correlated, or uncorrelated, as defined in this section.

$Mathematical Exercise$ 32. Show that

cov(A, B^c) = -cov(A, B)
cov(A^c, B^c) = cov(A, B)

$Mathematical Exercise$ 33. Suppose that A B. Show that

cov(A, B) = P(A)P(B^c)
cor(A, B) = [P(A)P(B^c) / P(B)P(A^c)]^1/2.

$Mathematical Exercise$ 34. Suppose that A and B are events in an experiment with P(A) = 1/2, P(B) = 1/3, P(A B) = 1/8. Find the covariance and correlation between A and B.

The Best Linear Predictor

What linear function of X is closest to Y in the sense of minimizing mean square error? The question is fundamentally important in the case where random variable X (the predictor variable) is observable and random variable Y (the response variable) is not. The linear function can be used to estimate Y from an observed value of X. Moreover, the solution will show that covariance and correlation measure the linear relationship between X and Y. To avoid trivial cases, let us assume that var(X) > 0 and var(Y) > 0.

$Mathematical Exercise$ 35. Show that

E{[Y - (aX + b)]²} = var(Y) + [E(Y)]² + a² {var(X) + [E(X)]²} +
b² -2a[cov(X, Y) + E(X)E(Y)] + 2ab E(X) - 2b E(Y)

$Mathematical Exercise$ 36. Use basic calculus to show that E{[Y - (aX + b)]²} is minimized when

a = cov(X, Y) / var(X)
b = E(Y) - a E(X)

Thus, the best linear predictor of Y based on X is

Y* = E(Y) + [cov(X, Y) / var(X)][X - E(X)].

$Mathematical Exercise$ 37. Show that the minimum mean square error, among all linear functions of X, is

E[(Y - Y*)²] = var(Y)[1 - cor²(X, Y)].

$Mathematical Exercise$ 38. From the last exercise, show that

-1 cor(X, Y) 1
-sd(X) sd(Y) cov(X, Y) sd(X) sd(Y)
cor(X, Y) = 1 if and only if Y = aX + b with probability 1 for some constants a > 0 and b.
cor(X, Y) = -1 if and only if Y = aX + b with probability 1 for some constants a < 0 and b.

These exercises show clearly that cov(X, Y) and cor(X, Y) measures the linear association between X and Y.

Recall that the best constant predictor of Y, in the sense of minimizing mean square error, is E(Y) and the minimum value of the mean square error for this predictor is var(Y). Thus, the difference between var(Y) and the mean square error in Exercise 35 is the reduction in the variance of Y when the linear term in X is added to the predictor.

$Mathematical Exercise$ 39. Show that var(Y) - E[(Y - Y*)²] = var(Y)cor²(X, Y).

The fraction of the reduction is cor²(X, Y), and hence this quantity is called the (distribution) coefficient of determination. The line

y = E(Y) + [cov(X, Y) / var(X)][x - E(X)]

is known as the (distribution) regression line for Y based on X. Note that the regression line passes through (E(X), E(Y)), the center of the joint distribution. However, the choice of predictor variable and response variable is crucial.

$Mathematical Exercise$ 40. Show that regression line for Y based on X and the regression line for X based on Y are not the same line, except in the trivial case where the variables are perfectly correlated.

$Mathematical Exercise$ 41. Suppose that (X, Y) has probability density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1.

Find the best linear predictor of Y based on X.
Find the best linear predictor of X based on Y.
Find the coefficient of determination.

$Mathematical Exercise$ 42. Suppose that (X, Y) has probability density function f(x, y) = 2(x + y) for 0 < x < y < 1.

Find the best linear predictor of Y based on X.
Find the best linear predictor of X based on Y.
Find the coefficient of determination.

$Mathematical Exercise$ 43. Suppose that (X, Y) has probability density function f(x, y) = 6x²y for 0 < x < 1, 0 < y < 1.

Find the best linear predictor of Y based on X.
Find the best linear predictor of X based on Y.
Find the coefficient of determination.

$Mathematical Exercise$ 44. Suppose that (X, Y) has probability density function f(x, y) = 15x²y for 0 < x < y < 1.

Find the best linear predictor of Y based on X.
Find the best linear predictor of X based on Y.
Find the coefficient of determination.

$Mathematical Exercise$ 45. A pair of fair dice are thrown and the scores (X₁, X₂) recorded. Let Y = X₁+ X₂ denote the sum of the scores, U = min{X₁, X₂} the minimum score, and V = max{X₁, X₂} the maximum score. Find

The best linear predictor of Y based on X₁.
The best linear predictor of U based on X₁.
The best linear predictor of V based on X₁.

$Mathematical Exercise$ 46. Suppose that A and B are events in a random experiment with 0 < P(A) < 1 and 0 < P(B) < 1. Show that

A and B have correlation 1 if and only P(A B^c) = 0 and P(B A^c) = 0 (That is, A = B with probability 1).
A and B have correlation -1 if and only P(AB) = 0 and P(B^c A^c) = 0 (That is, A = B^c with probability 1).

The corresponding statistical problem of estimating a and b, when the distribution parameters in Exercise 34 are unknown, is considered in the section on Sample Covariance and Correlation. A natural generalization of the problem we are considering here is to find the function of X (using all reasonable functions, not just linear ones) that is closest to Y in the sense of minimizing square error. The solution is obtained in the section on Conditional Expected Value.

Inner Product

Covariance is closely related to key concepts in the theory of vector spaces. This connection can help illustrate many of the properties of covariance from a different point of view. First, if X and Y are real-valued random variables, define the inner product of X and Y by

<X, Y> = E(XY).

The following exercises are analogues of the basic properties of covariance given above, and show that this definition really does give an inner product on the vector space of random variable with finite second moment. (As usual, we identify two random variables that agree with probability 1).

$Mathematical Exercise$ 47. Show that <X, Y> = <Y, X>.

$Mathematical Exercise$ 48. Show that <X, X> 0.

$Mathematical Exercise$ 49. Show that <X, X> = 0 if and only if P(X = 0) = 1.

$Mathematical Exercise$ 50. Show that <aX, Y> = a <X, Y>.

$Mathematical Exercise$ 51. Show that <X, Y + Z> = <X, Z> + <Y, Z>

Covariance and correlation can easily be expressed in terms of this inner product.

$Mathematical Exercise$ 52. Show that cov(X, Y) = <X - E(X), Y - E(Y)>.

$Mathematical Exercise$ 53. Show that cor(X, Y) = <[X - E(X)] / sd(X), [Y - E(Y)] / sd(Y)>.

Thus the covariance of X and Y is the inner product of the corresponding centered variables. The correlation of X and Y is the inner product of the corresponding standard scores.

The norm associated with the inner product is the 2-norm studied in the last section. This fact is a fundamental reason why the 2-norm plays such a special, honored role; of all of the k-norms, only the 2-norm corresponds to an inner product.

$Mathematical Exercise$ 54. Show that <X, X> = ||X||₂² = E(X²).

Note that the best linear predictor of Y based on X derived above is simply the projection of Y onto the subspace of random variables of the form aX + b, where a and b are real numbers.

The next exercise gives Hіlder's inequality, named for Otto Hіlder.

$Mathematical Exercise$ 55. Suppose that j, k >1 with 1 / j + 1 / k = 1. Show that <|X|, |Y|> ||X||_j ||Y||_k.

Show that g(x, y) = x^1/^j y^1/k is concave on {(x, y) in R²: x 0, y 0}.
Use (a) and Jensen's inequality to show that if U and V are nonnegative random variables then E(U^1/^jV^1/k) [E(U)]^1/^j[E(V)]^1/k.
In (c), let U = |X|^j, V = |Y|^k.

In the context of the last exercise, j, k are called conjugate exponents. If we let j = k = 2 in Hіlder's inequality, then we get the Cauchy-Schwarz inequality, named for Augustin Cauchy and Karl Schwarz. In turn, this is equivalent to the inequalities in Exercise 36.

E(|XY|) [E(X²)]^1/2 [E(Y²)]^1/2.

$Mathematical Exercise$ 56. Suppose that (X, Y) has density function f(x, y) = x + y for 0 < x < 1, 0 < y < 1. Verify Hіlder's inequality in the following cases:

j = k = 2
j = 3, k = 3 / 2.

$Mathematical Exercise$ 57. Suppose that j and k are conjugate exponents.

Show that k = j / (j - 1).
Show that k decreases to 1 as j increases to .

The following exercise is an analogue of the result in Exercise 22.

$Mathematical Exercise$ 58. Prove the parallelogram rule:

||X + Y||₂² + ||X - Y||₂² = 2||X||₂² + 2||Y||₂².

The following exercise is an analogue of the result in Exercise 21.

$Mathematical Exercise$ 59. Prove the Pythagorean theorem, named for Pythagoras of course: if X₁, X₂, ..., X_n are random variables with <X_i, X_j> = 0 for distinct i and j then

||X₁+ X₂ + ЗЗЗ + X_n ||₂² = ||X₁||₂² + ||X₂||₂² + ЗЗЗ + ||X_n||₂².