RANK SUM TEST THE MANN WHITNEY U - TEST
Mann Whitney u test is an alternative to the samples test. This test is based on the ranks of the observation of two samples put together. This test is more powerful. When compared to the sign tests. The alternative name for this test is rank sum test.
The sign test for comparing two population distributions ignores the actual magnitude of the paired observation and thereby discards information that would be useful in detecting a departure from the null hypothesis.
Rank sum test is a whole family of tests. Here we shall discuss only one of the types the Mann Whitney u test. With this test we can test the null hypothesis μ=μ0 without assuming whether the population sampled have roughly the shape of normal distribution.
This test helps us to determine whether two sample shave come from identical populations. If it is true that the samples have come from the same population it is reasonable to assume that the means of the ranks assigned to the values of the two sample are more or less the same. The alternative hypothesis is that the means of the population are not equal and if this is the case most of the smaller rank will go to the values of one samples while most of the higher ranks will go to those of the other sample.
The test of the null hypothesis that the two samples come from identical population may either be based on R1, the sum of the ranks of the values of first sample or on R_{2} the sum of the ranks of the values of the second sample. It may be noted that in practice it does not matter which sample we call sample 1 and which we call sample 2.
If the sample sizes are n_{1} and n_{2} the sum of R_{1} and R_{2} simply the sum of first n_{1} + n_{2} positive integers which is known to be.
(n_{1}+ N_{2}) ( n_{1}+ n_{2} + 1) /2
This formula enables us to find R_{2} if we known R_{1} and vice versa.
When the use of the sums was first proposed as a nonparametric alternative to the two sample t test the decision was based on R_{1} or R_{2} but now the decision is usually based on either of the related statistics:
U _{1} = n_{1} n_{2} + n_{1}( n_{1}+ 1) / 2 - R_{1}
U = n_{1} n_{2} + n_{2}( n_{2}+ 1) / 2 - R_{1}
Where n1 and n2 are the size of the samples and R1 and R2 are the rank sums of the corresponding samples. For small samples if both n and n2 are less than 10 (some statisticians say 8.) special tables must be used and if U is smaller than the critical value H0 can be related to the standard normal curve by the statistic.
Z= u - n_{1} n_{2}/ 2 / √n_{1} n_{2} ( n_{1}+ n_{2}) / 12
It is unimportant whether the larger or smaller value obtained from the formulae is used. The values for Z will be numerically equal but opposite in sign.