Virtual Laboratories > Hypothesis Testing > [1] 2 3 4 5 6 7
As usual, our starting point is a random experiment with a sample space and a probability measure P. In the basic statistical model, we have an observable random variable X taking values in a set S. In general, X can have quite a complicated structure. For example, if the experiment is to sample n objects from a population and record various measurements of interest, then
X = (X1, X2, ..., Xn)
where Xi is the vector of measurements for the i'th object. The most important special case occurs when X1, X2, ..., Xn, are independent and identically distributed. In this case, we have a random sample of size n from the common distribution.
A statistical hypothesis is a statement about the distribution of the data variable X; equivalently, a statistical hypothesis specifies a set of possible distributions of X. In hypothesis testing, the goal is to see if there is sufficient statistical evidence to reject a presumed null hypothesis in favor of a conjectured alternative hypothesis. The null hypothesis is usually denoted H0 while the alternative hypothesis is usually denoted H1. A hypothesis that specifies a single distribution for X is called simple; a hypothesis that specifies more than one distribution for X is called composite.
An hypothesis test is a statistical decision; the conclusion will either be to reject the null hypothesis in favor of the alternative, or to fail to reject the null hypothesis. The decision that we make must, of course, be based on the data vector X. Thus, we will find a subset R of the sample space S and reject H0 if and only if X in R. The set R is known as the rejection region or the critical region. Usually, the critical region is defined in terms of a statistic W(X), known as a test statistic.
The ultimate decision may be correct or may be in error. There are two types of errors, depending on which of the hypotheses is actually true:
Similarly, there are two ways to make a correct decision: we could reject the null hypothesis when it is false or we could fail to reject the null hypothesis when it is true. The possibilities are summarized in the following table:
Hypothesis Test | Decision | ||
---|---|---|---|
Fail to reject H0 | Reject H0 | ||
State | H0 True | Correct | Type 1 error |
H0 False | Type 2 error | Correct |
If H0 is true (that is, the distribution of X
is specified by H0), then P(X R) is the probability of a type
1 error for this distribution. If H0 is composite, then H0
specifies a variety of different distributions for X and thus
there is a set of type1 error probabilities. The maximum probability of
a type 1 error is known as the significance level of the test or the size
of the critical region, which we will denote by r. Usually, the rejection
region is constructed so that the significance level is a prescribed, small value
(typically 0.1, 0.05, 0.01).
If H1 is true (that is, the distribution of X
is specified by H1), then P(X Rc) is the
probability of a type 2 error for this distribution. Again, if H1 is
composite then H1 specifies a variety of different distributions for
X,
and thus there will be a set of type 2 error probabilities. Generally, there is a tradeoff
between the type 1 and type 2 error probabilities. If we reduce the probability of a type
1 error, by making the rejection region R smaller, we necessarily increase the
probability of a type 2 error because Rc is larger.
If H1 is true (that is, the distribution of X
is specified by H1), then P(X R), the probability of rejecting
H0 (and thus making a correct decision), is known as the power
of the test for the distribution.
Suppose that we have two tests, corresponding to rejection regions R1 and R2, respectively, each having significance level r. The test with region R1 is uniformly more powerful than the test with region R2 if
P(X R1)
P(X
R2) for any distribution of X
specified by H1.
Naturally, in this case, we would prefer the first test. Finally, if a test has significance level r and is uniformly more powerful than any other test with significance level r, then the test is said to be a uniformly most powerful test at level a. Clearly, such a test is the best we can do.
In most cases, we have a general procedure that allows us to construct a test (that is,
a rejection region Rr) for any given significance level r.
Typically, Rr decreases (in the subset sense) as a decreases.
In this context, the p-value of the data variable X, denoted p(X)
is defined to be the smallest r for which X is in Rr;
that is, the smallest significance level for which H0 would be
rejected, given X. Knowing p(X)
allows us to test H0 at any significance level, for the given data: If
p(X) r, then we would reject H0 at significance level
r; if p(X) > r, we would fail to
reject H0 at significance level r. Note that p(X)
is a statistic.
Hypothesis testing is a very general concept, but an important special class occurs
when the distribution of the data variable X depends on a parameter a, taking values in a parameter space A.
Recall that typically, a is a vector of real parameters, so that A
Rk
for some k. The hypotheses generally take the form
H0: a A0 versus H1:
a
A - A0
where A0 is a prescribed subset of A. In this setting, the probabilities of making an error or a correct decision depend on the true value of a. If R is the rejection region, then the power function is given by
Q(a) = P(X R | a)
for a
A.
1.
Show that
2.
Show that
Suppose that we have two tests, corresponding to rejection regions R1 and R2, respectively, each having significance level r. The test with region R1 is uniformly more powerful than the test with region R2 if
QR1(a) QR2(a)
for a
A - A0.
Most hypothesis tests of an unknown real parameter a fall into three special cases:
where a0 is a specified value. Case 1 is known as the two-sided test; case 2 is known as the left-tailed test, and case 3 is known as the right-tailed test (named after the conjectured alternative). There may be other unknown parameters besides a (known as nuisance parameters).
There is an equivalence between hypothesis tests and interval estimates for a real parameter a.
3.
Suppose that [L(X), U(X)]
is a 1 - r level confidence interval for a. Show that the test below has
significance level r for the hypothesis H0: a
= a0
versus H1: a
a0.
Reject H0 if and only if a0 < L(X) or a0 > U(X).
4.
Suppose that U(X) is a 1 - r level confidence
upper bound for a. Show that the test below has significance level r
for
the hypothesis H0 : a
a0 versus H1: a
< a0.
Reject H0 if and only if a0 > U(X).
5.
Suppose that L is a 1 - r confidence lower bound for a. Show
that the test below has significance level r for the hypothesis H 0
: a
a0
versus H1: a > a0.
Reject H0 if and only if a0 < L(X).
Summarizing, we fail to reject H0 at significance level r if and only if a0 is in the corresponding 1 - r level confidence interval.