Binary Image Classifier using Logistic Regression

One of the most common problems in Video Processing is image recognition algorithm.

Simply put, given an image , to be able to say if the image contains a cat,  a person, a car or just some random background.

This problem have be

683 × 1024 – vogue.com

en widely studied, and it’s at the base of, and progresses in paralel with, the Large Scale Visual Recognition Challenge (ILSVRC).

In this short tutorial we will learn how to make a binary classifier able to say if in the given image we have a person or not.

Résultat de recherche d'images pour "tree image"Résultat de recherche d'images pour "full person image"
Ex : Not a Person                                   vs        Person

Logistic regression it’s probably the simplest binary classifier  available.

The logical steps of this application would be :

  1. Prepare the input data features : images usually come as arrays of (nx, ny,3) , where nx and ny are the horizontal and vertical sizes of the image, and 3 is the number of color channels (R,G,B) .  We convert these to a column vector v = np.reshape(1,nx*ny*3) then add them to the X input matrix X.append(v)
  2. Prepare the Y truth labels as a (1,no_examples) vector, containing 1 for “person” and “0” for “not a person
  3. Our model would then be y_pred = W*X +b
  4. The cost function for logistic regression is cost = -(1/m)*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
  5.  We apply gradient descent to adjust W,b such as to  minimize the cost function

Finally , the results are :

For 200 train images and 50 test images  of size (60,60) we obtain after 2400 gradient descent iterations a training accuracy of 100% and a test accuracy of 70% – not bad for such a simple classifier.

For 20k train images and  1k test images of size (100,60) we obtain after 30000 gradient descent iteration a training accuracy and a test accuracy of 80% – similar to a result obtained by a 5 layer Deep Neural Network.

Why the results are so good ?First – because we train a binary clasifier – and logistic regression excels at this, second – because we have a very large number of features.

Here you can find the python code for this tutorial.

 

A second, better option would be a Neural Network

We model and train using Tensorflow a simple NN with 3 layers – 25 , 12 and 2 neurons

We have :24000 input features(60 px*100 py *4 channels) and 2 output classes

The weights matrixes will thus have the shapes
W1 : [25, 24000] ,                         b1 : [25, 1]
W2 : [12, 25] ,                         b2 : [12, 1]
W3 : [2, 12] ,                         b3 : [2, 1]

Running the training for this network for 150 epochs with an ADAM optimiser and a learning rate of 1E-4 yields promising results :

Train Accuracy: 0.951913
Test Accuracy: 0.945743

Inline image 1

 

Leave a Reply

Your email address will not be published. Required fields are marked *