This week we're looking at hypothesis testing. We started out using the

Wilcoxon rank-sum test (also known as the Mann-Whitney U test) to test whether samples were drawn from different populations.

The world is full of statistical (hypothesis) tests. Each one generates a test statistic. The key to understanding a test is understanding what the distribution of the test statistic would be if the null hypothesis was true.

The test statistic of the rank sum test is U: the sum of the ranks minus a sample size correction factor.

For the rank sum test, the null hypothesis is (approximately) that two samples are drawn from populations with the same mean. The following figures show the distribution of U, assuming the null hypothesis is true. The area of the shaded region sums to alpha. The vertical red lines show our critical values of U. Values of U that are more extreme than these critical values are unlikely due to chance

*if the null hypothesis is true*. Thus, if we observe U values this extreme, we can

*reject* the null hypothesis.

If we lower alpha, we see the area in the tails get smaller.

For larger sample size, we see the value of U gets much larger, but the same pattern holds.

For a devil's advocate view of what p-values

*mean*, we turn to the internet:

What the p-value
The Mann-Whitney U test (also known as the Wilcoxon rank sum test) is a

*non-parametric* test: it makes no assumptions about the distribution of the data. Most common statistical tests are parametric, and usually assume the data (or something about the data) is normally distributed. The

t-test is the parametric sibling of the rank sum test. It assumes the data is normally distributed.

This video describes hypothesis tests in general, and walks through the t-test.

What is a t-test?
By the end of this week, this comic should make sense.

XKCD: Significant