Return to Statistics
The Wilcoxon Two Sample Test
#B = ,
W = ,
A most usefull test to see whether the values in two samples differ in
size. It resembles the
Median-Test in scope, but it is much more sensitive. In fact, for
large numbers it is almost as sensitive as the
Two Sample Student t-test. For small numbers with unknown distributions
this test is even more sensitive than the Student t-test.
As it is only on rare occasions that we do know that values are Normal
distributed, this test is to be preferred over the Student t-test.
The populations from which the two samples are taken have identical
median values. To be complete, the two populations have identical
Rank order all N = m + n values from both samples
(m and n) combined. Sum the ranks of the smallest sample
(Wsmallest). This value is used to determine the level of significance.
Level of Significance:
Look up the level of significance in a table using Wsmallest, m
Calculating the exact level of significance is based on calculating all
possible permutations of ranks over both samples. This is computationally
demanding if n and m are larger than 7.
If m>10 and n>10,
Z = ( Wsmallest - 0.5 - m * ( m + n + 1 ) / 2 ) /
sqrt( m * n * ( m + n + 1 ) / 12 )
(Use Wsmallest - 0.5 if Wsmallest >
N*(N+1)/4, else use
Wsmallest + 0.5)
Recently (summer 2006), a user of this web-site has discovered
a bug in the calculations of the normal approximation.
If the sum of the ranks of the sample with fewer observations was greater
than the sum of the ranks of the sample with more observations, the
script calculates the p-value using the smallest sum but then uses the
smaller sample number as m in the normal approximation.
This has been corrected as of November 2006.
In this example, exact probabilities are calculated for
m <= 10 or n <= 10. If both are larger than 7
this can take more time than is available within this system (the number
of calculations grows as N!/(m!*n!), with N!=N*(N-1)*(N-2)*...*1).
Therefore, if it is anticipated that the calculations take too much time,
the Normal approximation is used (ie, too many permutations to check).
However, the resulting values are
unreliable and this will be indicated with a *. You are advised to check
the level of significance in a table.
For m > 10 and n > 10 the Normal approximation is used.
A perl script of the test is available
A minimalist Windows version (with dosperl interpreter) is available
Return to: Statistics