Wednesday, 21 July 2021

Logistic Regression Calculator and ROC Curve Plotter

This blog post implements a Logistic Regression calculator for a binary output.

Consider a binary outcome response variable \(Y\in\{0,1\}\) and let \(p\) be the probability that \(Y\) is \(1\), i.e. \(p=P(Y=1)\).
The logistic model is formally given by:-

The calculator estimates the value of model parameters \(\beta_0, \beta_1,..,\beta_p\) given samples of response variables \(Y^{(n)}\) and the corresponding predictor variables \(x_1^{(n)}, x_2^{(n)},...,x_p^{(n)}\), where \(n\) is the sample index. The Newton-Raphson algorithm is used to maximise the log likelihood function with respect to the model parameters - it is possible to specify the number of iterations to run in the textfield below (a default of 10 is given).

In the two texareas below, the first (narrow) one on the left has the response variable \(Y\), while the second one has the corresponding comma separated predictor variable entries \(x_k\). Example values have been entered (2000 samples), which can be altered as appropriate. In addition, it is possible to load a CSV format file by clicking on the "Choose File" button - the first column has to be the response variable (taking either 1 or 0 as a value), while column 2 onwards are the real valued predictor variables.

To run the algorithm once the values have been entered in the textareas, simply click on the "Estimate model parameters" button. A plot indicating algorithm convergence will be updated (showing the increase of the log likelihood function), and could be useful in specifying the number of iterations needed for the Newton-Raphson algorithm. Finally, the Receiver Operating Characteristic (ROC) plot will be generated. For the ROC, a black line of gradient 1 will be generated as reference.

Enter number of iterations:-

Results pending...

Algorithm convergence
Iteration number

Receiver Operating Characteristic
1 - Specificity