SciStatCalc: December 2013

Monday, 23 December 2013

FFT calculator

This blog post implements a Fast Fourier Transform (FFT) or an Inverse Fast Fourier Transform (IFFT) on a complex input, dependent on the checkbox setting below. You can specify the sampling frequency in arbitrary units (e.g. Hz) in the appropriately labelled text area below (a default of 100 is used).

For discrete time-domain input samples $x[n]$ for $n={0,1,2,..,N-1}$ the FFT (at bin $k$ for $k={0,1,2,..,N-1})$ is defined by equation

$X[k]=\sum_{n=0}^{N-1}x[n]exp(-j2\pi\frac{nk}{N})$

while for discrete frequency-domain input bins $X[k]$ for $k={0,1,2,..,N-1}$ the IFFT (at time index $n$ for $n={0,1,2,..,N-1}$) is defined by equation

$x[n]=\frac{1}{N}\sum_{k=0}^{N-1}X[k]exp(+j2\pi\frac{nk}{N})$

where $N=2^g$ for integer $g$.

Please enter the numbers in the text areas below - one number per line, for each of the Real and Imaginary input textareas (the textareas have already been filled in with some numbers for illustration purposes). There must be no new line after the last number.

Note that if the input is real only, the imaginary input textarea can be left empty (rather than having to fill it with the same number of zeros as there are real inputs, which can be a bit more cumbersome). Conversely, if the input is imaginary only, the real input textarea can be left empty (rather than having to fill it with the same number of zeros as there are imaginary inputs).

Alternatively you can choose to load a CSV file, which must be either a single column of numbers (for a real only input) or two comma-separated columns of numbers - the first line can be a comment line, starting with the character #.

To perform the FFT/IFFT, please press the button labelled "Perform FFT/IFFT" below - the results will populate the textareas below labelled "Real Output" and "Imaginary Output", as well as a textarea at the bottom that will contain the real and imaginary output joined using a comma - this is suitable for copying and pasting the results to a CSV file.

In addition, graphical outputs of the FFT are displayed below. These include a graph of FFT magnitude (using the drop-down menu below, you can select the units of this parameter) and a graph of the phase response (units of either radian or degrees also selectable by a drop-down menu below). When a unit is altered, you would need to perform the FFT again by pressing the calculate button for the changes to take effect.

At the bottom of this blog post, the Decimation In Time (DIT) twiddle Q factors will be displayed, as defined in https://www.dsprelated.com/showarticle/107.php. For an $N$ point FFT, there are $log_2(N)$ stages, and for each stage there are $N/2$ twiddle factors that do not equal the $-1$ term - these are the ones that will be printed.

If you change inputs to a smaller number of samples, please press the calculate button twice for the results to take effect. Alternatively, you can simply reload the page, then fill in the input textareas.

As the FFT operates on inputs that contain an integer power of two number of samples, the input data length will be augmented by zero padding the real and imaginary data samples to satisfy this condition were this not to hold.

You can find an FFT based Power Spectral Density (PSD) Estimator here.

Real Input Imaginary Input

Select FFT Magnitude output units for graphical display

Select FFT Phase output Units for graphical display

Check for IFFT - uncheck for FFT.

Sampling frequency:-

FFT size...

Real Output Imaginary Output

Real and Imaginary output concatenated on each line:-

	FFT Magnitude
Complex (Linear)

	Frequency
Complex (Linear)
	Frequency

	FFT Phase Response
Radians

	Frequency
Radians
	Frequency

FFT twiddle factors...

Friday, 20 December 2013

Interpreting the two-way ANOVA test

In this blog post, I will try to explain how to interpret the two-way ANOVA test using a simple example.

Suppose we were testing the yield of a crop plant based on seed types and which field they were planted in, so we have two factors: Seed Type, and Field Type. The yield could be the number of grains in a plant. For the first factor, let us assume we have three seed types, which we call Seed 1, Seed 2 and Seed 3. As for the second factor, let us assume we have two field types, which we denote as Field 1 and Field 2. For each field type and seed type, let us assume we have three samples (also known as replicates). We can represent the results in a table as below, where entries $a_{ij}(k)$ are the number of grains in a plant.

	Seed 1	Seed 2	seed 3
Field 1	$a_{11}(1),a_{11}(2),a_{11}(3)$	$a_{12}(1),a_{12}(2),a_{12}(3)$	$a_{13}(1),a_{13}(2),a_{13}(3)$
Field 2	$a_{21}(1),a_{21}(2),a_{21}(3)$	$a_{22}(1),a_{22}(2),a_{22}(3)$	$a_{23}(1),a_{23}(2),a_{23}(3)$

Now, in a two-way ANOVA test, we calculate the F statistic for factor 1, factor 2 and the interaction. Based on the F-statistic, we calculate the p-value for factor 1, factor 2 and the interaction. What do we mean by these values?

A very low p-value for factor 1 (Seed Type) (i.e. the result is significant for the first factor), arises when the the mean values of the seed yields are different for each Seed Type. Suppose this is indeed the case, where Seed 3 has the highest yield followed by Seed 2, then Seed 1. The mean values of the yield for field 1 could look as follows.

	Replicate means for field 1
Seed yield means
	Seed type number

Now let us look at the second factor, Field type, and suppose the p-value for this is very low as well (i.e. the result is significant for the second factor). This tells us that the plant yields are different for different field types, and suppose that Field 2 has the lower yield plants, as it has poorer irrigation than field 1. Supposing we plotted the means of the three seed types for the two fields, and we obtain the result below

	Replicate means for fields 1 and 2
Seed yield means
	Seed type number

Examining the plot above, we are in a position to describe what p-value the interaction will take. Note that the mean plots are parallel - the difference in means for all three seed types between field 1 and field 2 are the same. The p-value for interaction will thus tend to 1, and so there will be no significant interaction.

A worthwhile question to pose would be what if there was significant interaction? In such a scenario, the difference in yields for each of the seed type between field 1 and field 2 would not be the same. For example, the difference in means for Seed type 3 could be much lower, resulting in the plot below. This means that there is interaction between seed type and field type - seed 3 appears to more resistant to lower water supply, for example.

	Replicate means for fields 1 and 2
Seed yield means
	Seed type number

Wednesday, 18 December 2013

Type I and Type II Errors in Hypothesis Testing

In hypothesis testing, mention is made of Type I and Type II Errors.

A Type I error is when a Null Hypothesis is incorrectly rejected, and is also known as a false positive. In such a scenario, the p-value calculated is below the significance level (0.05 being a common value used), when in fact there is no significant effect and the p-value should have been higher.

A Type II error is when a Null Hypothesis is incorrectly accepted, and is also known as a false negative. In such a scenario, the p-value calculated is above the significance level (0.05 being a common value used), when in fact there is a significant effect and the p-value should have been below the significance level. Such an error can have more serious ramifications than a Type I error, for example, when one is screening for the presence of potentially malignant cells in a patient and the Null Hypothesis is that there are no malignant cells present.

The Power of a test is one minus the probability of a Type II error, and should ideally be one. If there are two statistical tests for testing the same Null Hypothesis, the test with greater power will yield a lower p-value, and so the chances of rejecting the Null Hypothesis for this test will be greater - in other words, the chances of incorrectly accepting the Null Hypothesis will be lower for the more powerful test.

Sunday, 15 December 2013

Chi-squared test for independence Calculator

Please click to add a row.

This blog post implements an online calculator for Pearson's Chi-squared test for independence. For a discussion on this test, you can have a look here.

Simply click on the link near the top to add text boxes. Each text box stores a single row of data and needs to be filled in with comma separated numbers. All rows need to have the same number of samples, which is equal to the number of columns.

Alternatively, you can choose two file entry methods:-

Select multiple single column CSV files to populate the text boxes by repeatedly pressing the Choose File button - there must be one distinct (and differently named) file for each text box i.e. one file per group. Each file can have a different number of samples.
Select a single multi-column CSV file by pressing the Choose File button once, where the number of columns equals the number of groups - all groups need to have the same number of samples.

In addition, a table of standardised residuals is calculated. A negative value of a particular element means that the observed frequency is lower than the expected frequency, whereas a positive value implies that the observed frequency is greater than the expected frequency. This is useful to examine which element has a significant difference between the observed and expected frequencies - an absolute value greater than 1.96 can be considered significant for the 0.05 level.

Results pending...