Return to Statistics

# The Wilcoxon Two Sample Test

*Example:*

**
#A = 15
#B = 15,
W = 0,
p <= 0
**

*Characteristics:*

A *most* usefull test to see whether the values in two samples differ in
size. It resembles the
Median-Test in scope, but it is *much* more sensitive. In fact, for
large numbers it is almost as sensitive as the
Two Sample Student t-test. For small numbers with unknown distributions
this test is even *more* sensitive than the Student t-test.

As it is only on rare occasions that we do know that values are Normal
distributed, this test is to be preferred over the Student t-test.
*H0:*

The populations from which the two samples are taken have identical
*median* values. To be complete, the two populations have identical
distributions.

*Assumptions:*

None realy.

*Scale:*

Ordinal.

*Procedure:*

Rank order all *N* = *m* + *n* values from both samples
(*m* and *n*) combined. Sum the ranks of the smallest sample
(Wsmallest). This value is used to determine the level of significance.

*Level of Significance:*

Look up the level of significance in a table using Wsmallest, *m*
and *n*.

Calculating the exact level of significance is based on calculating all
possible permutations of ranks over both samples. This is computationally
demanding if *n* and *m* are larger than 7.

*Approximation:*

If *m*>10 and *n*>10,

Z = ( Wsmallest - 0.5 - *m* * ( *m* + *n* + 1 ) / 2 ) /
sqrt( *m* * *n* * ( *m* + *n* + 1 ) / 12 )

is approximately
Normal distributed.

(Use *Wsmallest* **-** 0.5 if *Wsmallest* >
*N**(*N*+1)/4, else use
*Wsmallest* **+** 0.5)

*Remarks:*

### Recently (summer 2006), a user of this web-site has discovered
a bug in the calculations of the normal approximation.
If the sum of the ranks of the sample with fewer observations was greater
than the sum of the ranks of the sample with more observations, the
script calculates the p-value using the smallest sum but then uses the
smaller sample number as m in the normal approximation.
This has been corrected as of November 2006.

In this example, exact probabilities are calculated for
*m* <= 10 or *n* <= 10. If both are larger than 7
this can take more time than is available within this system (the number
of calculations grows as N!/(m!*n!), with N!=N*(N-1)*(N-2)*...*1).
Therefore, if it is anticipated that the calculations take too much time,
the Normal approximation is used (ie, too many permutations to check).
However, the resulting values are
unreliable and this will be indicated with a *. You are advised to check
the level of significance in a table.

For *m* > 10 and *n* > 10 the Normal approximation is used.
A perl script of the test is available
here.
A minimalist Windows version (with dosperl interpreter) is available
here
(<500 kB).

Return to: Statistics