One of the most common problems in Video Processing is image recognition algorithm.
Simply put, given an image , to be able to say if the image contains a cat, a person, a car or just some random background.
This problem have be
In this short tutorial we will learn how to make a binary classifier able to say if in the given image we have a person or not.
Logistic regression it’s probably the simplest binary classifier available.
The logical steps of this application would be :
- Prepare the input data features : images usually come as arrays of (nx, ny,3) , where nx and ny are the horizontal and vertical sizes of the image, and 3 is the number of color channels (R,G,B) . We convert these to a column vector v = np.reshape(1,nx*ny*3) then add them to the X input matrix X.append(v)
- Prepare the Y truth labels as a (1,no_examples) vector, containing 1 for “person” and “0” for “not a person
- Our model would then be y_pred = W*X +b
- The cost function for logistic regression is cost = -(1/m)*np.sum(Y*np.log(A)+(1-Y)*np.log(1-A))
- We apply gradient descent to adjust W,b such as to minimize the cost function
Finally , the results are :
For 200 train images and 50 test images of size (60,60) we obtain after 2400 gradient descent iterations a training accuracy of 100% and a test accuracy of 70% – not bad for such a simple classifier.
For 20k train images and 1k test images of size (100,60) we obtain after 30000 gradient descent iteration a training accuracy and a test accuracy of 80% – similar to a result obtained by a 5 layer Deep Neural Network.
Why the results are so good ?First – because we train a binary clasifier – and logistic regression excels at this, second – because we have a very large number of features.
Here you can find the python code for this tutorial.
A second, better option would be a Neural Network
We model and train using Tensorflow a simple NN with 3 layers – 25 , 12 and 2 neurons
We have :24000 input features(60 px*100 py *4 channels) and 2 output classes
The weights matrixes will thus have the shapes
W1 : [25, 24000] , b1 : [25, 1]
W2 : [12, 25] , b2 : [12, 1]
W3 : [2, 12] , b3 : [2, 1]
Running the training for this network for 150 epochs with an ADAM optimiser and a learning rate of 1E-4 yields promising results :
Train Accuracy: 0.951913
Test Accuracy: 0.945743