If there are two independent datasets that are drawn from a normal distribution, but have the same variance, we can test whether they are different (in particular whether their means are different) using the Unpaired Student's t-test which is a parametric test. Note that a more robust test that is equivalent to this test, and which does not require the assumption of Normality, is the non-parametric Mann-Whitney U test (see blog posting dated 02/09/2013).

Let $N_1$ be the number of samples in the first dataset (with samples $x_1(1),x_1(2),...,x_1(N_1)$), and $N_2$ be the number of samples in the second dataset (with samples $x_2(1),x_2(2),...,x_2(N_2)$).

We calculate the mean of the first dataset, $\mu_1$ as follows

Let $N_1$ be the number of samples in the first dataset (with samples $x_1(1),x_1(2),...,x_1(N_1)$), and $N_2$ be the number of samples in the second dataset (with samples $x_2(1),x_2(2),...,x_2(N_2)$).

We calculate the mean of the first dataset, $\mu_1$ as follows

$\Large \mu_1=\frac{1}{N_1}\sum_{n=1}^{N_1}x_1(n)$

We calculate the mean of the second dataset, $\mu_2$ as follows

$\Large \mu_2=\frac{1}{N_2}\sum_{n=1}^{N_2}x_2(n)$

The variance for the first dataset is calculated as follows

$\Large \sigma_1^2=\frac{1}{N_1-1}\sum_{n=1}^{N_1}(x_1(n)-\mu_1)^2$

The variance for the second dataset is calculated as follows

$\Large \sigma_2^2=\frac{1}{N_2-1}\sum_{n=1}^{N_2}(x_2(n)-\mu_2)^2$

Given the means and variances of the two datasets, the $t$ value is derived as follows

$\Large t=\frac{\mu_1-\mu_2}{\sqrt{\frac{1}{N_1}+\frac{1}{N_2}}\sqrt{\frac{\sigma_1^2(N_1-1)+\sigma_2^2(N_2-1)}{N_1+N_2-2}}}$

Once the $t$ value is calculated, the $p$ value (for a two tailed test) is evaluated as follows:-

If $t > 0$,

$p= 2 \times (1 - student\_t\_cdf(t,N_1+N_2-2))$

where $student\_t\_cdf(t,N_1+N_2-2)$ is the Cumulative Distribution Function of the Student's-t probability distribution, with degree of freedom $N_1+N_2-2$ and integrating from $-\infty$ to $t$. The CDF is the following integral, which is evaluated using numerical methods (usually read off a table - if you have an iPhone, you can always download my free app SciStatCalc to evaluate the CDF!)

$\Large \int_{-\infty}^t\frac{\Gamma(\frac{N_1+N_2-1}{2})}{\sqrt{(N_1+N_2-2)(\pi)}\Gamma(\frac{N_1+N_2-2}{2})}(1+\frac{x^2}{N_1+N_2-2})^{-\frac{N_1+N_2-1}{2}}$

If $t<0$

$p= 2 \times (student\_t\_cdf(t,N_1+N_2-2))$

We can think of the $p$ value as the probability that the differences in means between the two datasets are due to chance alone, and that the means are indeed similar (this is our

For the two-tailed test, and a significance level of 5%, we can obtain the range of values which covers the 95% confidence level, where the lower limit is

$low\_limit = (\mu_1 - \mu_2) - inv\_t\_cdf(0.975,N_1+N_2-2) \times \sqrt{\frac{1}{N_1}+\frac{1}{N_2}}\sqrt{\frac{(N_1-1)\sigma_1^2 + (N_2-1)\sigma_2^2}{N_1+N_2-2}}$

and the upper limit is

$upp\_limit = (\mu_1 - \mu_2)+inv\_t\_cdf(0.975,N_1+N_2-2) \times \sqrt{\frac{1}{N_1}+\frac{1}{N_2}}\sqrt{\frac{(N_1-1)\sigma_1^2 + (N_2-1)\sigma_2^2}{N_1+N_2-2}}$

the function $inv\_t\_cdf$ is the inverse Student t CDF or quantile function - as the test is two-tailed, for a 5% level, each of the tail will be 2.5% (0.025), so that the quantile is calculated for 1-0.025=0.975.

It is worth noting that when the variances of the two datasets are different (which is rarely used in practice, but nevertheless could apply to certain situations), we use Welch's test, where the $t$ value is given by

$\Large t=\frac{\mu_1-\mu_2}{\sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}}}$

The degree of freedom ($df$) for Welch's test is

$\Large df=\frac{(\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2})^2}{\frac{\sigma_1^4}{N_1^2(N_1-1)}+\frac{\sigma_2^4}{N_2^2(N_2-1)}}$

Substituting the $t$ and degree of freedom $df$ values into the Student-t CDF will yield the desired $p$ value.

Dataset 1: 1,2,3,4

Dataset 2: 6,7,7,8,9

The number of degrees of freedom is $4+5-2=7$.

The mean and variance of the first dataset is

$\large \mu_1=\frac{1}{4}(1+2+3+4)=2.5$

$\large \sigma_1^2=\frac{1}{3}((1-2.5)^2+(2-2.5)^2+(3-2.5)^2+(4-2.5)^2)=1.667$

The mean and variance of the second dataset is

The overall standard deviation is $\large \sqrt{\frac{(1.667)(3)+(1.3)(4)}{4+5-2}}$, which is $1.20712$.

The denominator of $t$ is thus $\large 1.20712\times \sqrt{0.45}=0.80976$ - this will be used to evaluate the lower and upper limit of the confidence interval.

Given the $t$ value, the $p$ value (using an appropriate software package) is found

$p= 2 \times (student\_t\_cdf(t,N_1+N_2-2))= 2 \times (student\_t\_cdf(-6.051,7))=0.000515$

Thus, we obtain a very low value of $p$ indicating that the result is very significant. This is not really surprising, when you look at the datasets - the second dataset is much larger than the first, to the extent that there is no overlap.

To calculate the 95% confidence interval, we first need to find the quantile $inv\_t\_cdf(0.975,7)$ using an appropriate table or statistical software - using Octave's

The overall standard deviation is given by

$\Large \sqrt{\frac{\sigma_1^2(N_1-1)+\sigma_2^2(N_2-1)}{N_1+N_2-2}}$

$\Large \sqrt{\frac{\sigma_1^2(N_1-1)+\sigma_2^2(N_2-1)}{N_1+N_2-2}}$

Once the $t$ value is calculated, the $p$ value (for a two tailed test) is evaluated as follows:-

If $t > 0$,

$p= 2 \times (1 - student\_t\_cdf(t,N_1+N_2-2))$

where $student\_t\_cdf(t,N_1+N_2-2)$ is the Cumulative Distribution Function of the Student's-t probability distribution, with degree of freedom $N_1+N_2-2$ and integrating from $-\infty$ to $t$. The CDF is the following integral, which is evaluated using numerical methods (usually read off a table - if you have an iPhone, you can always download my free app SciStatCalc to evaluate the CDF!)

$\Large \int_{-\infty}^t\frac{\Gamma(\frac{N_1+N_2-1}{2})}{\sqrt{(N_1+N_2-2)(\pi)}\Gamma(\frac{N_1+N_2-2}{2})}(1+\frac{x^2}{N_1+N_2-2})^{-\frac{N_1+N_2-1}{2}}$

If $t<0$

$p= 2 \times (student\_t\_cdf(t,N_1+N_2-2))$

We can think of the $p$ value as the probability that the differences in means between the two datasets are due to chance alone, and that the means are indeed similar (this is our

*Null Hypothesis*). The lower this value, the less likely it is that the difference is due to chance alone, and the more significant the result. The more significant the result is, the more likely that the means of the two datasets are different. We arbitrarily apply some threshold value (the significance value), such as 0.05 (5%), and if the calculated value of $p$ is

*less*than this value, we consider the result to be significant, and reject the Null Hypothesis.

For the two-tailed test, and a significance level of 5%, we can obtain the range of values which covers the 95% confidence level, where the lower limit is

$low\_limit = (\mu_1 - \mu_2) - inv\_t\_cdf(0.975,N_1+N_2-2) \times \sqrt{\frac{1}{N_1}+\frac{1}{N_2}}\sqrt{\frac{(N_1-1)\sigma_1^2 + (N_2-1)\sigma_2^2}{N_1+N_2-2}}$

and the upper limit is

$upp\_limit = (\mu_1 - \mu_2)+inv\_t\_cdf(0.975,N_1+N_2-2) \times \sqrt{\frac{1}{N_1}+\frac{1}{N_2}}\sqrt{\frac{(N_1-1)\sigma_1^2 + (N_2-1)\sigma_2^2}{N_1+N_2-2}}$

the function $inv\_t\_cdf$ is the inverse Student t CDF or quantile function - as the test is two-tailed, for a 5% level, each of the tail will be 2.5% (0.025), so that the quantile is calculated for 1-0.025=0.975.

__Datasets with unequal variance__It is worth noting that when the variances of the two datasets are different (which is rarely used in practice, but nevertheless could apply to certain situations), we use Welch's test, where the $t$ value is given by

$\Large t=\frac{\mu_1-\mu_2}{\sqrt{\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2}}}$

The degree of freedom ($df$) for Welch's test is

$\Large df=\frac{(\frac{\sigma_1^2}{N_1}+\frac{\sigma_2^2}{N_2})^2}{\frac{\sigma_1^4}{N_1^2(N_1-1)}+\frac{\sigma_2^4}{N_2^2(N_2-1)}}$

Substituting the $t$ and degree of freedom $df$ values into the Student-t CDF will yield the desired $p$ value.

__Worked Example__**Consider two datasets, comprising 4 and 5 samples respectively.**

Dataset 1: 1,2,3,4

Dataset 2: 6,7,7,8,9

The number of degrees of freedom is $4+5-2=7$.

The mean and variance of the first dataset is

$\large \mu_1=\frac{1}{4}(1+2+3+4)=2.5$

$\large \sigma_1^2=\frac{1}{3}((1-2.5)^2+(2-2.5)^2+(3-2.5)^2+(4-2.5)^2)=1.667$

The mean and variance of the second dataset is

$\large \mu_2=\frac{1}{5}(6+7+7+8+9)=7.4$

$\large \sigma_2^2=\frac{1}{4}((6-7.4)^2+(7-7.4)^2+(7-7.4)^2+(8-7.4)^2+(9-7.4)^2)=1.3$

We can calculate the $t$ value as follows

$\Large t=\frac{\mu_1-\mu_2}{\sqrt{\frac{1}{N_1}+\frac{1}{N_2}}\sqrt{\frac{\sigma_1^2(N_1-1)+\sigma_2^2(N_2-1)}{N_1+N_2-2}}}=\frac{2.5-7.4}{\sqrt{0.25 + 0.2}\sqrt{\frac{(1.667)(3)+(1.3)(4)}{4+5-2}}}=-6.0512$We can calculate the $t$ value as follows

The overall standard deviation is $\large \sqrt{\frac{(1.667)(3)+(1.3)(4)}{4+5-2}}$, which is $1.20712$.

The denominator of $t$ is thus $\large 1.20712\times \sqrt{0.45}=0.80976$ - this will be used to evaluate the lower and upper limit of the confidence interval.

Given the $t$ value, the $p$ value (using an appropriate software package) is found

$p= 2 \times (student\_t\_cdf(t,N_1+N_2-2))= 2 \times (student\_t\_cdf(-6.051,7))=0.000515$

Thus, we obtain a very low value of $p$ indicating that the result is very significant. This is not really surprising, when you look at the datasets - the second dataset is much larger than the first, to the extent that there is no overlap.

To calculate the 95% confidence interval, we first need to find the quantile $inv\_t\_cdf(0.975,7)$ using an appropriate table or statistical software - using Octave's

*tinv()*function, we obtain $2.3646$. Thus the lower limit is $2.5-7.4-(2.3646\times 0.80976)=-6.815$, and the upper limit is $2.5-7.4+(2.3646\times 0.80976)=-2.985$.

Good luck to anyone reading this true life story of mine, I Was Diagnosed With type 2 Herpes Virus Last year, And I Was Looking For Solution To Be Cured Luckily I Saw Testimonies On How Dr OYAGU Cure Herpes Virus I Decided To Contact Dr OYAGU I Contacted Him He Prepared A Herbal Medicine Portion And Sent It To Me, I Started The Herbal Medicine For My Health. He Gave Me Step By Step Instructions On How To Apply It, When I Applied It As Instructed, I Was Cured Of This Deadly Herpes Within 2 weeks, I Am Now Herpes Negative. My Brother And Sister I No That There Are So Many People That Have The Same Herpes Virus Please contact Dr OYAGU To Help You Too, And Help Me To Thank Dr OYAGU For Cure Me, I’m Cured By Dr. OYAGU Herbal Medicine, His Contact Email:oyaguherbalhome@gmail.com or visit his website https://oyaguspellcaster.wixsite.com/oyaguherbalhome Or Cell Whatsapp Number +2348101755322 thank you

ReplyDeleteINDEED DR SAYO HERBAL HEALER HAS THE CURE FOR FOLLOWING HERPES VIRUS, HIV/AIDS, CANCER ALL TYPES, DIABETES, HEPATITIS A/B, PAD, ETC......FOR MORE INFORMATION VISIT HIS WEBSITE: http://sayoherbalhome.com/ I was once a victim of the herpes virus with many symptoms on my body, I was cured of the virus thanks to the help of Dr Sayo herbal medicine. I saw most of his patient testimonials on blogs and social media pages, cured of various disease/virus and illnesses. I contacted him on his email address to which he replied and asked me a few questions and the preparation method for which I paid, he sent me the herbal medicines and instructions on how to take it I apply the herbal medicine for 4 weeks and the symptoms on my body were healed after taking the herbal medicine treatment, I went for test and my result was negative with know trace of virus in my blood and am completely cured. You can also be cured, contact him on his email: sayoherbalhealer@gmail.com or WhatsApp him on +2349012175679 or visit his facebook page: https://www.facebook.com/SayoHerbalHealer

DeleteINDEED DR SAYO HERBAL HEALER HAS THE CURE FOR FOLLOWING HERPES VIRUS, HIV/AIDS, CANCER ALL TYPES, DIABETES, HEPATITIS A/B, PAD, ETC......FOR MORE INFORMATION VISIT HIS WEBSITE: http://sayoherbalhome.com/ I was once a victim of the herpes virus with many symptoms on my body, I was cured of the virus thanks to the help of Dr Sayo herbal medicine. I saw most of his patient testimonials on blogs and social media pages, cured of various disease/virus and illnesses. I contacted him on his email address to which he replied and asked me a few questions and the preparation method for which I paid, he sent me the herbal medicines and instructions on how to take it I apply the herbal medicine for 4 weeks and the symptoms on my body were healed after taking the herbal medicine treatment, I went for test and my result was negative with know trace of virus in my blood and am completely cured. You can also be cured, contact him on his email: sayoherbalhealer@gmail.com or WhatsApp him on +2349012175679 or visit his facebook page: https://www.facebook.com/SayoHerbalHealer

ReplyDelete