Virtual Laboratories > Random Samples > 1 2 3 4 5 6 7 [8] 9

8. Probability Plots


Derivation of the Test

Suppose that we observe real-valued data

x1, x2, ..., xn

from a random sample of size n. We are interested in the question of whether the data could reasonably have come from a continuous distribution (taking values in an interval) with distribution function F.

First, we order that data from smallest to largest (these are the observed values of the order statistics)

x(1) < x(2) < ··· < x(n).

Mathematical Exercise 1. Show that x(i) is the sample quantile of order i / (n + 1). .

Mathematical Exercise 2. Show that the distribution quantile of order i / (n + 1) is

yi = F-1[i / (n + 1)]

If the data really do come from the distribution, then we would expect the points

(x(i), yi); i = 1, 2, ..., n

to be close to the diagonal line y = x; conversely, strong deviation from this line is evidence that the distribution did not produce the data. The plot of these points is referred to as a probability plot.

In the following exercises, we will explore probability plots for the normal, exponential, and uniform distributions.

Simulation Exercise 3. In the probability plot experiment, set the sampling distribution to the standard normal distribution and the sample size to n = 20. For each test distribution given below, run the experiment 50 times and note the geometry of the probability plot.

  1. Standard normal
  2. Uniform (0, 1)
  3. Exponential (1)

Simulation Exercise 4. In the probability plot experiment, set the sampling distribution to the uniform (0, 1) distribution and the sample size to n = 20. For each test distribution given below, run the experiment 50 times and note the geometry of the probability plot.

  1. Standard normal
  2. Uniform (0, 1)
  3. Exponential (1)

Simulation Exercise 5. In the probability plot experiment, set the sampling distribution to the exponential (1) distribution and the sample size to n = 20. For each test distribution given below, run the experiment 50 times and note the geometry of the probability plot.

  1. Standard normal
  2. Uniform (0, 1)
  3. Exponential (1)

Location-Scale Families

Usually, we are not trying to fit the data to a particular distribution, but rather to a parametric family of distributions (such as the normal, uniform, or exponential families). We are usually forced into this situation because we don't know the parameters; indeed the next step, after the goodness of fit, may be to approximate the parameters. Fortunately, the probability plot method has a simple extension for any location-scale family of distributions.

Suppose that G is a given distribution function. Recall that the location-scale family associated with G has distribution function

F(x) = G[(x - a) / b],

where a is the location parameter and b > 0 is the scale parameter.

Mathematical Exercise 6. For p in (0, 1), let zp denote the quantile of order p for G and yp the quantile of order p for F. Show that

yp = a + b zp.

From Exercise 6, it follows that if the probability plot constructed with distribution function F is nearly linear (an in particular, if it is close to the diagonal line), then the probability plot constructed with distribution function G will be nearly linear. Thus, we can use the distribution function G without having to know the location and scale parameters.

Simulation Exercise 7. In the probability plot experiment, set the sampling distribution to normal distribution with mean 5 and standard deviation 2. Set the sample size to n = 20. For each of the following test distributions, run the experiment 50 times and note the geometry of the probability plot:

  1. Standard normal
  2. Uniform (0, 1)
  3. Exponential (1)

Simulation Exercise 8. In the probability plot experiment, set the sampling distribution to the uniform distribution on (4, 10) Set the sample size to n = 20. For each of the following test distributions, run the experiment 50 times and note the geometry of the probability plot:

  1. Standard normal
  2. Uniform (0, 1)
  3. Exponential (1)

Simulation Exercise 9. In the probability plot experiment, Set the sampling distribution to the exponential distribution with parameter 3. Set the sample size to n = 20. For each of the following test distributions, run the experiment 50 times and note the geometry of the probability plot:

  1. Standard normal
  2. Uniform (0, 1)
  3. Exponential (1)

Data Analysis Exercises

Data Analysis Exercise 10. Draw the normal probability plot with Michelson's velocity of light data. Interpret the results.

Data Analysis Exercise 11. Draw the normal probability plot with Cavendish's density of the earth data. Interpret the results.

Data Analysis Exercise 12. Draw the normal probability plot with Short's parallax of the sun data. Interpret the results.

Data Analysis Exercise 13. Draw the normal probability plot for the petal length variable in Fisher's iris data, using the following cases. Compare the results.

  1. All cases
  2. Setosa only
  3. Verginica only
  4. Versicolor only

Interpreting the Results

From your experiments, we hope that you have reached a few general conclusions. First, the probability plot method is of very little use with small samples. With just five points, for example, it is essentially impossible to judge the linearity of the probability plot. Even with large samples, the results can be rather subtle. For example, a sample from a normal distribution frequently seems to fit the uniform distribution rather well. Experience with a variety of distributions helps in making the fine judgments.