Saturday, 20 December 2014

k-means clustering using algorithm AS 136

This blog post implements k-means clustering using algorithm AS 136, as devised by Hartigan and Wong. The implementation was made considerably easier by the work of J Burkardt here, who has translated the original Fortran to C,C++ and Matlab, as well as other flavours of Fortran.

Please enter the numbers in the text areas below - either one number per line or two or more comma separated numbers per line. There must be no new line after the last number.

Alternatively you can choose to load a CSV file, which can be one or more columns of numbers only (the number of columns is equal to the spatial dimension). Before loading the CSV file, you need to fill in the number of spatial dimensions.

To perform the k-means clustering, please enter the number of clusters and the maximum number of iterations in the appropriate fields, then press the button labelled "Perform k-means clustering" below - the results will populate the textareas below labelled "Output" and "Centroid values". The "Output" textarea will list the sample values and the cluster/centroid index each sample belongs to, while the "Centroid values" textarea will list the centroid index and the value of the centroids (or cluster centres). Note that the first index of the cluster centres starts at 0.

If the number of spatial dimensions is either 1 or 2, then the data points will be plotted below and coloured according to cluster membership.

Should the algorithm not converge within the maximum number of iterations specified, an alert will be generated to this effect.


Enter number of spatial dimensions:-



Input




Enter number of clusters (k value):-

Enter maximum number of iterations:-




Output:-
Centroid values:-



Cluster Visualisation for spatial dimensions 1 or 2
Value
Samples

Friday, 24 January 2014

k-means clustering calculator

This blog post implements a basic k-means clustering algorithm, which can be applied to either a scalar number or 2-d data (x and y component). Graphs of the clustered data and algorithm convergence (as measured by the changes in cluster membership of the data samples between consecutive iterations) are displayed below.

The cluster centres (or centroids) are initialised using the k-means++ algorithm as proposed by David Arthur and Sergei Vassilvitski in 2007.

Please enter the numbers in the text areas below - either one number per line or two comma separated numbers per line. There must be no new line after the last number.

Alternatively you can choose to load a CSV file, which must be either a single column of numbers (for a real only input) or two comma-separated columns of numbers - the first line can be a comment line, starting with the character #.

To perform the k-means clustering, please enter the number of clusters and the number of iterations in the appropriate fields, then press the button labelled "Perform k-means clustering" below - the results will populate the textareas below labelled "Output" and "Centroid values". The "Output" textarea will list the sample values and the cluster/centroid index each sample belongs to, while the "Centroid values" textarea will list the centroid index and the value of the centroids (or cluster centres).

Note that the k-means algorithm can converge to a local minimum, and also exhibit degeneracy, whereby one of the clusters has no members. Should these scenarios occur, simply re-run the algorithm.



Input




Enter number of clusters (k value):-

Enter number of iterations:-





Output:-
Centroid values:-



Cluster Visualisation
Value
Samples
Algorithm convergence
Change
Iteration number

Summary statistics pending..