Return to Statistics

The Rank Correlation coefficient

Example:
/dev/null> /dev/null>

The observation pairs (x, y)

Characteristics:
The Rank Correlation test is a distribution free test that determines whether there is a monotonic relation between two variables ( x , y ). A monotonic relation exists when any increase in one variable is invariably associated with either an increase or a decrease in the other variable. In equation form, for the pairs (X1, Y1) and (X2, Y2):
If X2 > X1 then Y2 >= Y1 for a monotonic increase
If X2 > X1 then Y2 <= Y1 for a monotonic decrease
The monotonic relation is expressed using rank-order numbers instead of the values. This also makes the Rank Correlation a test distribution free test. Although the Rank Correlation coefficient can be interpreted as indicating the "strength" of the monotonic association, quantifying this strength is so complex that for all practical purposes this is a non-parametric test.

H0:
There is no monotonic relation between the variables.

Assumptions:
None realy

Scale:
Ordinal

Procedure:
Rank order all x and y values seperately. Determine the differences between the ranks of both variables V = Rank(x) - Rank(y). Sum the squares of the differences in rank order numbers (i.e., Sum( V**2 ) ).
The Spearman Rank Correlation Coefficient is:
Rs = 1 - 6 * Sum( V**2 ) / ( N * ( N**2 - 1 ))

Level of Significance:
Look up the values of Rs and N in a table. The level of significance is determined by checking all permutations of ranks in the sample and counting the fraction for which the Rs' is more extreme than the Rs found. As the number of permutations grows proportional to N! (the factorial of N), this is not very practical for large values of N. For N > 10 this example uses only an approximation (i.e., only a random subset of the permutations is actualy checked).

Approximation:
If N > 30, the distribution of Z = Rs * sqrt( N - 1 ) can be approximated by a > Standard Normal Distribution.

Remarks:
This example uses the > Standard Normal approximation for N > 30. For N < 11 the exact value is calculated. For all other values of 10 < N < 31, p is calculated from a random subset of the possible permutations. This latter value is not very exact.
As a statistical test to check whether a relation between two variables exists, this test is better than the standard >correlation coefficient because the latter will only work when there is a linear relation between the variables. In practical situations, assuming a linear relation will very often be unrealistic.
This test is also usefull to check whether matched pairs are realy matched. If they are, their rank correlation should be statistically significant.


Return to: Statistics