Binary Image Classifier using Logistic Regression

One of the most common problems in Video Processing is image recognition algorithm.

Simply put, given an image , to be able to say if the image contains a cat,  a person, a car or just some random background.

This problem have be

683 × 1024 –

en widely studied, and it’s at the base of, and progresses in paralel with, the Large Scale Visual Recognition Challenge (ILSVRC).

In this short tutorial we will learn how to make a binary classifier able to say if in the given image we have a person or not.

Résultat de recherche d'images pour "tree image"Résultat de recherche d'images pour "full person image"
Ex : Not a Person                                   vs        Person

Logistic regression it’s probably the simplest binary classifier  available.

The logical steps of this application would be :

  1. Prepare the input data features : images usually come as arrays of (nx, ny,3) , where nx and ny are the horizontal and vertical sizes of the image, and 3 is the number of color channels (R,G,B) .  We convert these to a column vector v = np.reshape(1,nx*ny*3) then add them to the X input matrix X.append(v)
  2. Prepare the Y truth labels as a (1,no_examples) vector, containing 1 for “person” and “0” for “not a person
  3. Our model would then be y_pred = W*X +b
  4. The cost function for logistic regression is cost = -(1/m)*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
  5.  We apply gradient descent to adjust W,b such as to  minimize the cost function

Finally , the results are :

For 200 train images and 50 test images  of size (60,60) we obtain after 2400 gradient descent iterations a training accuracy of 100% and a test accuracy of 70% – not bad for such a simple classifier.

For 20k train images and  1k test images of size (100,60) we obtain after 30000 gradient descent iteration a training accuracy and a test accuracy of 80% – similar to a result obtained by a 5 layer Deep Neural Network.

Why the results are so good ?First – because we train a binary clasifier – and logistic regression excels at this, second – because we have a very large number of features.

Here you can find the python code for this tutorial.


A second, better option would be a Neural Network

We model and train using Tensorflow a simple NN with 3 layers – 25 , 12 and 2 neurons

We have :24000 input features(60 px*100 py *4 channels) and 2 output classes

The weights matrixes will thus have the shapes
W1 : [25, 24000] ,                         b1 : [25, 1]
W2 : [12, 25] ,                         b2 : [12, 1]
W3 : [2, 12] ,                         b3 : [2, 1]

Running the training for this network for 150 epochs with an ADAM optimiser and a learning rate of 1E-4 yields promising results :

Train Accuracy: 0.951913
Test Accuracy: 0.945743

Inline image 1


Leave a Reply

Your email address will not be published. Required fields are marked *