Monday, 2 September 2013

Mann-Whitney U test

The Mann-Whitney U (MWU) test is a non-parametric statistical test that is widely used in many fields (e.g in medical statistics and psychology), and tests whether two groups (a.k.a datasets or populations) are the same (the null hypothesis) against the alternative hypothesis that they are not so.

The following assumptions need to be made:

  • The independent variable is the two groups. 
  • The dependent variable are the observations which are either continuous or ordinal (i.e. they can be sorted by some criterion, such as a level).
  • All observations from both groups are independent of one another. For example, if patients' blood pressure is being measured, each observation should come from a distinct patient. 
  • The data need not be normally distributed.

The MWU test is more robust than the Unpaired Student's t-test for data that is not normally distributed. We can think of the Unpaired Student's t-test as the parametric counterpart to the MWU test.

The basic idea is that that the data for each group is ranked (from 1,2,..) in ascending order of magnitude, regardless of which group the data belongs to. If the populations/groups are very different, say Group 2 has samples much larger than that of population 1, then the sum of the ranks for Group 2 will be much larger than that for Group 1. A U statistic is calculated, which can be intuitively thought of as the difference between the total rank and the larger of the group rank sum. The smaller this difference, the greater the disparity in rank sums between the two groups, and the less likely it is that the population rank differences are due to chance.

Consider Group 1 with $N1$ samples, and Group 2 with $N2$ samples.

Let $T1$ be the sum of ranks for Group 1, and $T2$ be the sum of ranks for Group 2.

We denote $TX$ as the larger of $T1$ and $T2$, and $NX$ as the number of samples of the group with the larger sum of ranks.

The U-statistic is given by:-

$\large U=(N1 \times N2) + \frac{NX(NX+1)}{2} - TX$

The value of $U$ is compared against a critical value $Ucrit$, and if $U$ is less than the critical value, the result is considered significant (note that many tests have a statistic that has to exceed a critical value for the result to be considered significant - the MWU test is almost the odd one out!).

The value of $Ucrit$ is found out using a U-test table, and is determined by the values of $N1$ and $N2$ and the confidence level (say 5%), and whether we want a two-tailed test or a one-tailed one.

If the values of $N1$ and $N2$ are high, we can use the z-test, as the U statistic can be reasonably well approximated by a Normal/Gaussian distribution  with mean $\frac{N1N2}{2}$ and variance $\frac{N1N2(N1+N2+1)}{12}$.

If there are tied ranks, the variance needs to be slightly modified to

$\Large \frac{N1N2[(N1+N2+1)-\frac{\sum_{i=1}^{L}(t_i^3-t_i)}{(N1+N2)(N1+N2-1)}]}{12}$,

where $L$ is the number of groups of ties, and $t_i$ the number of ties for the $ith$ group.


Worked Example

Suppose we have two groups, with the following samples :

Group 1: 1,4,6,7,8,3,2,1

Group 2: 3,3,3,8,10,16,18,70,30

So $N1=8$, $N2=9$.

We first rank all the data as follows (ignore ties for the moment- they are highlighted in blue):-

Rank        Data            Group
 1               1                  1
 2               1                  1
 3               2                  1
 4               3                  2
 5               3                  2
 6               3                  2
 7               3                  1
 8               4                  1
 9               6                  1
10              7                  1
11              8                  1
12              8                  2
13            10                  2
14            16                  2
15            18                  2
16            30                  2
17            70                  2

We deal with the ties by calculating the average of the ranks for the ties:-

Rank        Data            Group

 1.5            1                  1
 1.5            1                  1
 3               2                  1
 5.5            3                  2
 5.5            3                  2
 5.5            3                  2
 5.5            3                  1
 8               4                  1
 9               6                  1
10              7                  1
11.5           8                  1
11.5           8                  2
13            10                  2
14            16                  2
15            18                  2
16            30                  2
17            70                  2

Lets colour code by Group membership:-

Rank        Data            Group

 1.5            1                  1
 1.5            1                  1
 3               2                  1
 5.5            3                  2
 5.5            3                  2
 5.5            3                  2
 5.5            3                  1
 8               4                  1
 9               6                  1
10              7                  1
11.5           8                  1
11.5           8                  2
13            10                  2
14            16                  2
15            18                  2
16            30                  2
17            70                  2

The sum of ranks for Group 1 is:-
1.5 + 1.5 + 3 + 5.5 + 8 + 9 + 10 + 11.5 = 50

We can easily calculate the sum of ranks for Group 2. As we have 17 samples ($N1+N2$), the total of the ranks for the entire data is $\frac{(N1 + N2)(N1+N2+1)}{2}$, which is $17\times 18 \times 0.5$, i.e. 153.

So the Group 2 sum of ranks is $153-50=103$, which is greater than that of Group 1.

Thus $NX=9$, and $TX=103$, i.e. the number of samples and sum of ranks for the larger sum of ranks  Group (i.e. Group 2).

We are now in a position to calculate U as follows:-

$U=(9 \times 8) + \frac{9(9+1)}{2} - 103 = 14$

Suppose we are interested in an alpha of 5%, for a two-tail test - going through a table (say one at http://www.lesn.appstate.edu/olson/stat_directory/Statistical%20procedures/Mann_Whitney%20U%20Test/Mann-Whitney%20Table.pdf), for  $N1=8$ and $N2=9$, we find that $Ucrit=15$.

As $U<Ucrit$ we reject the null hypothesis that the groups are the same - the result is significant, but only just!

Next we come to the z-score calculation - which will be inaccurate as we only have 8 and 9 samples for Group 1 and 2 respectively.

We need to transform $U$ to a z-score, which will be a standard normal distribution with zero mean and unity variance.

For SciStatCalc, the following technique is used (which does not account for the ties for a more conservative result - also the results from GNU Octave were used to baseline against). We take the sum of ranks for Group 1 ($50$), and subtract $N1\times (N1+N2+1)\times(0.5) =8(18)(0.5)=72$, resulting in $-22$. We need to normalise this by the standard deviation, which is $\sqrt{\frac{N1\times N2\times (N1+N2+1)}{12}}=\sqrt{\frac{9\times 8\times (18)}{{12}}}=10.3923$.

So, the z-score is $-22/10.3923$, which is $-2.11695$. The p-value for this z-score is $0.034264$, which is less than our critical value of $0.05$, vindicating that our result is indeed marginally significant.

If the correction to the variance based on ties were to be applied, the standard deviation would have been $\sqrt{\frac{9\times 8\times (18 - (72/(16 \times 17)))}{{12}}}=10.3156$, because there are three groups of ties with 2, 4 and 2 ties, leading to $(4^3) - 4 + (2^3) -2 + (2^3) - 2 = 72$, so that the correction term is $72/((9 + 8)\times(9+8-1))$. The z-score would then be $-22/10.31526=-2.1327$, corresponding to a p-value of $0.03295$, which is in agreement with the results obtained from using R. Nevertheless, the p-value is still less than our 5% level, so the result is still significant.

Below is a screenshot of the Mann-Whitney U-test on SciStatCalc:-



There is a Mann Whitney U-test Calculator in this blog, which can be found here.

Calculating the p-value for a z-score

To calculate the p-value for a z-score, you need to take the absolute value of the z-score, and calculate the area under a standard Gaussian/Normal distribution (mean 0, variance 1) from $abs(zscore)$ to infinity - i.e. calculate the area under the tail. Given the symmetric nature of the standard Normal distribution, you would end up with the same result from -infinity to $-abs(zscore)$. For a two-tailed result you need to multiply this result by 2.

A final note - the MWU test is also known as the Wilcoxon rank sum test.

7 comments:

  1. All thanks to Dr Raymond for helping me to get my penis longer and bigger with his cream in just two week his cream is very active and also it work fast for everyone who have used it, the best thing about his herbal cream is that it does not have side effects and the results is permanent. Note he can also cure the below sicknesses and disease PREMATURE EJACULATIONLOW SPERM COUNTERECTILE DYSFUNCTIONHIV/AIDS CUREHERPES CUREINFECTION CUREPILESTROKEHEART FAILUREYou can email him if you need his help via: drraymondherbalcenter@gmail.com or WhatsApp him via: +2348116744524

    ReplyDelete
    Replies
    1. Just wanna say a big thank you Neme Amber for introducing me to Dr Emmanuel the great HERBALIST that helped me prepare home remedies that cured my herpes (HSV 2).
      I was infected with HSV 2 for the past two years and i was unable to get a better job cos all the company i was to get employed checked our blood test and found out that i was positive to GENITAL HERPES and i loosed employment.
      So i was desperate to get a cure so that i can live normal and get my job training.
      i earlier made some research and i contacted some doctors online but they keep on asking for money for courier after that they'll tell you that tax and so more so i became broke and frustrated.
      One day i was less busy so i decided to make latest research on herpes cure and i found a site  were everyone was talking about DR Emmanuel and herbs ability to cure herpes.
      So i discussed with Neme Amber and she explained to me that its very easy working with Dr Emmanuel so i contacted DR Emmanuel via email ( nativehealthclinic@gmail.com ) and he helped me just as he has helped others now im cured and different medical centers has tested me and approved me cured.
      so i decided to thank Neme Amber cos she made it possible for me.
      and i can also assure you that he can also help you. so if you need the service of DR emmanuel, ill put his details here so that you can easily get in touch wif him.
      his email: nativehealthclinic@gmail.com or WhatsApp him at +2348140073965.
      my Name is Grace from US,once again thanks to Neme Amber. im fucking hapy....  

      Delete
  2. God is Good! I promised God that I would share my testimony on this blog. I had all the signs of STD Virus but I was not too sure as to which one. I did a lot of online research and scared myself straight for a whole week before going to see the nurse. She took one look at my genital part and first said that it could just be the anatomy of my body, then she said it looked like genital warts and that I may have herpes. I was devastated. She gave me some medicine for the herpes and some cream for the warts. I was also tested for every single STD including herpes. I went home and cried searching the web for all sorts of cures for herpes and awaiting my results. I saw a post whereby Dr. Oyagu cured Herpes and other diseases, I copied his contacts out and added him on whats app via (+2348101755322). The next day my test result was ready and i confirmed Herpes positive. I told Dr.Oyagu about my health problems and he assured me of cure. He prepared his herbal medicine and sent it to me. I took it for 14 days (2 weeks). Before the completion of the 14 days in which I completed the dose, the Blisters and Warts that were on my body was cleared. I went back for check-up and I was told I'm free from the virus. Dr. Oyagu cures all types of diseases and viruses with the help of his herbal medicine. You can reach Dr. Oyagu via his email address on (oyahuherbalhome@gmail.com) or WhatsApp him on (+2348101755322) Visit His website on https://oyaguspellcaster.wixsite.com/oyaguherbalhome

    ReplyDelete
  3. I got diagnose of herpes virus last year, and i was taking some drugs prescribed for me by my family doc the drugs could not work and the herpes in my system was very terrifying that i was so depressed the good news is that i never gave up in searching for natural cure for herpes virus ,cause i believe so much in herbal medication.. One faithful morning i read a comment from a lady called Destiny Hudson on how she was cured with natural herbs made by DR.OYAGU from somewhere is West Africa, i immediately copied out his contact email via oyaguherbalhome@gmail.com and explain all my herpes problem to him via his email. the big truth is that it took just two week for his herbal medication to cure me completely without side affect. Till date am herpes virus free and all thanks to Dr OYAGU for his good deeds for me, Once again am very happy to share this great testimony of DR OYAGU cure hurry up now and contact him via his email address (oyaguherbalhome@gmail.com Call & WhatsApp him on (+2348101755322 ) and see what he can do

    ReplyDelete
  4. Life is always beautiful when you have good health.i have be in pain for almost 1 year had HSV 1&2 and I was lonely and sad, luckily I was directed to a very kind and Great Dr Oliha who helped me cure my HSV 1&2 and today I am free from the virus and very healthy thank you so much Dr Oliha .Email
    oliha.miraclemedicine@gmail.com
    you can also whatsapp/Call him; +2349038382931

    ReplyDelete
  5. I contacted this herbal Doctor on email and explain my problem to him and he told me that he is going to prepare a herbal medicine for me which he did and he sent it to me through UPS service, when i received this herbal medicine, he gave me instructions on how to use it, after taken the medicine as instructed, i went for check up and the result shows negative and i was cured of this deadly disease within 2 weeks, I am now free from Herpes. You can contact Robinson Buckler on his email …………Robinsonbuckler11 @ {{gmail}} com !!!……………https://www.robinsonbuckler. com………..💁👌🎍😍💁👌🎍😍💁👌🎍😍💁👌🎍😍
    -GENITAL AND ORAL HERPES
    -HPV
    -DIABETES
    -WEAK ERECTION
    -VIRGINAL PROBLEM
    -MISSCARIAGE
    – HEPATITIS A,B AND C
    -COLD SORE
    -LOWER RESPIRATORY INFECTION
    -LOW SPERM COUNT
    -STAPHYLOCOCCUS AUREUS
    -STROKE
    -IMPOTENCE
    -PILE
    -HYPERTENSION
    -MENOPAUSE DISEASE
    -CANCER
    -SHINGLES
    -FIBROID
    -BARENESS/INFERTILITY..

    ReplyDelete
  6. Thank to Dr Gerald for bring back my lover in just 24 hours, when my lover left me i was sad and unhappy i wanted to kill my self because i love him so much i try to do what i can to get him but it did not work than I reach my friends and beg for them to help me beg him that am so sorry for what i have done, but it never work when i was browsing on my face-book. i saw some testimony of the great man and how he help people to bring back their lost lover back that's when i contacted him on Whats-App +14242983869 in just 24 hours my story change, my lover that left me come back and beg after five month i was so happy that my lover is back to me all my appreciation goes to Dr Gerald for bring back my lover back contact him if you are having any similar problem like this are any problem you may have Dr Gerald is the solution man, Whats-App him on +14242983869

    ReplyDelete

Logistic Regression Calculator and ROC Curve Plotter

This blog post implements a Logistic Regression calculator for a binary output. Consider a binary outcome response variable \(Y\...