## Saturday, 2 November 2013

### Home

This blog is dedicated predominantly to various aspects of probability and statistics, with some aspects of digital signal processing and machine learning. This blog reflects my interests and specialties outside of my work as an Electronic Engineer.

The blog arose from my need to create a support website for an iOS App I developed back in 2013, hence the name "SciStatCalc", which is the name of my App. I had opted for the path of least resistance, and judged that using a blogging site would be the easiest route to setting up my support website. Whilst my activity on the App has waned (there is a drop-down menu "SciStatCalc" that gives a timeline for the evolution of the App), this blog has taken a life of its own and hopefully will remain a continual work in progress.

1. Statistical hypothesis testing, with online implementations of many commonly used tests
2. The calculation of the Cumulative Density Function (CDF) and Quantiles (the inverse CDF) of various probability density functions.
3. Data visualisation tools, including graph, histogram and CDF plotting, with zoom facility and multiple data series plot capability.
4. Fast Fourier Transform (FFT) calculator.
5. k-means Clustering calculator.
You can navigate through this blog by a variety of means, including using the above drop-down menus.

The FFT is used as the basis of my online Power Spectral Density Estimator, which can be found on another Engineering-centric blog I have created dspfpgatools.blogspot.co.uk.

Statistics Theory

There are a few blog posts, including What is hypothesis testing?, and tables of Density Equations and CDFs for a variety of distribution functions in here.

Statistical Test Procedures

So far the blog includes descriptions and step-by-step guides to the following statistical tests:-

1. Shapiro-Wilk test for Normality
2. Bartlett's test for equality/homogeneity of variances for three or more groups.
3. Mann-Whitney U test: a non-parametric test for two independent groups of data.
4. Wilcoxon Signed Rank test: a non-parametric test for two matched groups of data.
5. Unpaired Student's t test: a parametric test for two independent groups of data.
6. Paired Student's t test: a parametric test for two matched groups of data.
7. Linear Regression test: includes a matrix-based derivation of the Ordinary Least Squares algorithm - as applied to a single dataset.
8. Pearson (Product moment) Correlation: a parametric correlation test for two matched datasets/groups
9. Spearman Rank Correlation: a non-parametric rank based correlation test for two matched datasets/groups

Online Calculators

There are javascript based calculators that can be grouped into six categories :-

1. CDF and Quantile Calculators for a variety of Probability Density Functions (PDF) and Probability Mass functions (PMF).
2. Statistical test calculators
3. Critical value calculators
4. Medical Diagnostic calculator
5. Digital Signal Processing (DSP) calculator
6. Machine learning calculators

CDF and Quantile Calculators

For the PDFs and PMFs, you need to fill in all the relevant parameter fields. For the PDFs you must fill in any two out of the following three fields: Lower Limit,Upper Limit and Probability - pressing the calculate button will result in the single missing field being filled in. As for the PMFs, you must fill in one of the following two fields:Upper Limit and Probability - the missing field will be updated.

Online CDF and Quantile Calculators for the following PDFs have been implemented:-

1. Gaussian Distribution: includes error and inverse error function calculator near the top of the blog post.
2. Log-normal Distribution
3. Gamma Distribution: includes evaluation of the Gammma function ($\Gamma(x)$) near the bottom of the blog post.
4. Student's t-Distribution
5. Beta Distribution
6. F Distribution (also known as Snedecor's F)
7. Chi-Squared Distribution
8. Exponential Distribution
9. Logistic Distribution
10. Laplace Distribution
11. Cauchy Distribution (also known as the Cauchy-Lorentz Distribution)
12. Rayleigh Distribution
13. Weibull Distribution

For the Gaussian/Normal, Student-t, F and Chi-squared distributions, (1 - probability) and 2$\times$(1-probability) are calculated as well - this is useful for calculating the one and two tail probabilities associated with various Statistical Tests. The Gaussian Distribution is used for calculating the p-value from the z-score, whilst the Student-t distribution is used for the (parametric) Student's t-test. The F distribution is used for many tests, ANOVA being one of the most widely known test.

Online CDF and Quantile Calculators for the following PMFs have been implemented:-

All the CDF and Quantile Calculators have plots of the PDF/PMF encompassing the limits specified by the user. The upshot of this is that you can investigate the effect of varying various parameters of a particular distribution on the shape of that distribution, whilst keeping the limits the same.

Statistical Tests Calculators

Calculators for the following Statistical Tests have been implemented:-

1. Shapiro-Wilk Test
2. Levene's Test
3. Bartlett's Test
4. Two-Sample Kolmogorov-Smirnov Test
5. Chi-Squared Test for Independence
6. Linear Regression
7. Pearson Correlation
8. Spearman Rank Correlation
9. Mann-Whitney U Test
10. Wilcoxon Signed Rank Test
11. Unpaired Student's t Test: includes option of implementing Welch's test for unequal variances.
12. Paired Student's t Test
13. Fisher's Exact Test (2 $\times$ 2 contingency table)
14. Barnard's Test (2 $\times$ 2 contingency table)
15. McNemar's Test (2 $\times$ 2 contingency table)
16. Cochran's Q Test
17. Kruskal-Wallis Test: applicable to three or more groups
18. One-way ANOVA Test: applicable to three or more groups - also includes post-hoc analysis for a significant result
19. Two-way ANOVA Test with replication: applicable to three or more groups, examining the effect of two independent variables and the interaction between them
20. Two-way ANOVA Test without replication: applicable to three or more groups, examining the effect of two independent variables

The Statistical Calculators have been designed for ease of use, with the aim of yielding useful results with minimal effort on the part of the user. All the calculators take in raw data as inputs, which can be entered directly in the relevant textareas as comma separated numbers.

Alternatively you can load in a CSV file by pressing the "Choose File" button - the calculators can parse out a comment line (starting with character #, for example), if this occurs as the first line in the CSV file. In addition, there are either histograms or scatter plots for many of the tests. The purpose of these forms of data visualisation is two-fold: (i) to yield useful information not present in, say, the p-value of a test, (ii) to act as a sanity check on the p-value calculated.

For tests that require three or more datasets (such as the ANOVA tests, and the Kruskal Wallis tests for example), the method of dynamic textboxes is implemented, where clicking on a link adds an extra text entry field. This gives a lot of flexibility in terms of the number of datasets the user wishes to process.

Critical Value Calculators

So far three critical value calculators have been implemented, whereby the relevant values are calculated based on the user specified significance level.

Medical Diagnostic Calculator

So far a single calculator has been implemented.

DSP Calculator

So far a single calculator has been implemented.

Machine learning Calculators

So far two calculators implementing k-means clustering, and a Cosine similarity calculator have been implemented.

Data Visualisation

In addition, there are Data Visualisation tools (all with Zoom capabilities, and capable of representing multiple datasets) implementing the following functions:-

1. Online Graph plotter: You can select between line and scatter plots for representing multiple datasets. In addition you can select subsets of the datasets to plot by entering dataset/group indices in two fields of a table labelled X-axis group index and Y-axis group index. Using the X-axis field and Y-axis field, it is possible to plot 2-D data as a scatter plot - this could be useful for a basic visual cluster analysis.
2. Online Histogram plotter
3. Online Empirical Cumulative Density Function (CDF) plotter
4. Online Quantile-Quantile (Q-Q) plotter for the Gaussian Distribution
For each dataset/group, the Histogram and CDF plotters display a summary statistics table containing the following information:-
1. Number of samples
2. Minimum
3. Maximum
4. Mean
5. Geometric Mean (if all samples are greater than zero)
6. Harmonic Mean (if all samples are greater than zero)
7. Variance
8. Standard Deviation
9. Median
10. Skewness
11. Excess Kurtosis

If you are a researcher in the medical field, a data scientist or statistician , or work in the social sciences area, such as Psychology, I hope that you find some of the entries in this blog useful and interesting. If you are simply curious about probability and statistics, you are more than welcome.

All the online calculators are free to use, and the javascript source code is clearly accessible for the curious. Some testing has been performed on most of the CDF/Quantile Calculators, benchmarking against results generated by GNU Octave (I have endeavoured to achieve double precision accuracy). As regards the Statistical Test Calculators, I have checked my results against other online Calculators available on the web where possible, and against R. However, I will assume no responsibility for the accuracy of the results - use the calculators at your own risk.

1. Are Looking for tools to support your spreader bar lifting calculations? Then SMART Rigger Software is the best solution for you. The providingspreader lifting beam calculators.

2. """Thanks for the article! It has useful information! Recently I started learning about Online jobs from home. and I’m really excited because I started my own blog. I have no experience and knowledge of how to build a blog but it’s really interesting! Yesterday I saw an article and it seems really honest. If you can check it out and give me your opinion I will be thankful! I will start right now following your tips!
Thanks again!"

3. Very informative and well written post! Quite interesting and nice topic chosen for the post Nice Post keep it up.Excellent post. I want to thank you for this informative post. I really appreciate sharing this great post. Keep up your work.
genetic testing kit

4. 10 famous female surgeons As a doctor your opinion is precious for medical, pharmaceutical, healthcare IT, medical devices and market research companies, if you have few minutes to answer online surveys they will be happy to pay you for that.

5. 10 famous female surgeons As a doctor your opinion is precious for medical, pharmaceutical, healthcare IT, medical devices and market research companies, if you have few minutes to answer online surveys they will be happy to pay you for that.