Return to Statistics

The Wilcoxon Two Sample Test

Example:
#A = #B = , W = , p <=

The observation sequences
A B

Characteristics:
A most usefull test to see whether the values in two samples differ in size. It resembles the &B=> Median-Test in scope, but it is much more sensitive. In fact, for large numbers it is almost as sensitive as the &B=> Two Sample Student t-test. For small numbers with unknown distributions this test is even more sensitive than the Student t-test.
As it is only on rare occasions that we do know that values are Normal distributed, this test is to be preferred over the Student t-test.

H0:
The populations from which the two samples are taken have identical median values. To be complete, the two populations have identical distributions.

Assumptions:
None realy.

Scale:
Ordinal.

Procedure:
Rank order all N = m + n values from both samples (m and n) combined. Sum the ranks of the smallest sample (Wsmallest). This value is used to determine the level of significance.

Level of Significance:
Look up the level of significance in a table using Wsmallest, m and n.
Calculating the exact level of significance is based on calculating all possible permutations of ranks over both samples. This is computationally demanding if n and m are larger than 7.

Approximation:
If m>10 and n>10,
Z = ( Wsmallest - 0.5 - m * ( m + n + 1 ) / 2 ) / sqrt( m * n * ( m + n + 1 ) / 12 )
is approximately Normal distributed.
(Use Wsmallest - 0.5 if Wsmallest > N*(N+1)/4, else use Wsmallest + 0.5)

Remarks:
In this example, exact probabilities are calculated for m <= 10 or n <= 10. If both are larger than 7 this can take more time than is available within this system (the number of calculations grows as N!/(m!*n!), with N!=N*(N-1)*(N-2)*...*1). Therefore, if it is anticipated that the calculations take too much time, the Normal approximation is used. However, the resulting values are unreliable and this will be indicated with a *. You are advised to check the level of significance in a table.
For m > 10 and n > 10 the Normal approximation is used.


Return to: Statistics