In today’s post, you will learn how you can build an image classifier using computer vision in less than 10 minutes. This time around it might take you just a little longer, but after this, you will be able to build classifiers in less than 10 minutes, I promise!
We will start simple and create an image classifier to classify dragons, why? Because dragons are cool, that’s why.
From TensorFlow’s webpage:
Our brains make vision seem easy. It doesn’t take any effort for humans to tell apart a lion and a Jaguar, read a sign, or recognize a human’s face. But these are actually hard problems to solve with a computer: they only seem easy because our brains are incredibly good at understanding images.
It’s been found that convolutional neural networks are great for the job, proving to be as good or better than humans in some domains. As it turns out TensorFlow is several steps ahead and has trained models that can be reused for free. The most advanced seems to be Inception-v3.
Inception-v3 is trained for the ImageNet Large Visual Recognition Challenge using the data from 2012. This is a standard task in computer vision, where models try to classify entire images into 1000 classes, like “Zebra”, “Dalmatian”, and “Dishwasher”.
Whereas that’s incredibly good in itself, there are more than 1000 classes of thing in the world, and the more precise you want to become, the more classes you are going to need.
For instance, if you wanted to classify all the insects in the world – or even of a specific ecosystem – there are millions of insects, the pre-trained model would certainly be able to classify them as “insects” or at least as “animals”, but for this specific use case, that’s just not good enough, you want to know exactly which insect is in the image.
That’s when the concept of transfer learning comes in handy. You can take Inception-v3 and enhance it to your needs, you no longer need to teach the model that a chair, or a ball, or a rhino isn’t an insect, it already knows that (this is only a half-truth, but lets stick to this to make things simpler to understand), all you need to do is teach the specifics of your domain.
As I said before, the model we’re building will be able to classify dragons, Inception-v3 is already capable of identifying “Komodo dragon”, “dragon lizard” and “dragonfly”, not quite the dragons we want, so the first thing we will need is a large dataset of dragon images. Luckily, google images is full of images of dragons.
Now I said that after this tutorial you would be able to create image classifiers in less than 10 minutes, so to keep my promise the last thing I need is to have you right-clicking and saving this bunch of images. To speed things up, download Fatkun Batch Download Image, this is a chrome extension that downloads all images within a page.
After installing the extension, search for “dragons” on google, then click on the extension on the menu bar, it will automatically detect all images within the page and ready them to download, all you need to do is to remove the ones you don’t want, in this case there’s a few like the Google logo that we want to get rid of:
Save this images in a folder, we will use them soon enough.
Let’s step back a little. Like I said, Inception-V3 is already trained to classify images in a thousand classes, so before we enhance it, let’s test it to understand how good it actually is. This Python script is an example from TensorFlow on how to use Inception-V3. If you copy it and run on your computer, you should already be able to classify images.
I tested the above using Python 2.7 and was able to identify these basic requirements:
When you first run the above, it will download the Inception-v3 model and from then on, it won’t require downloading it again, also if you don’t specify an image to classify, it will default to a panda image and sure enough, it will classify it as a panda:
Pretty good huh? So let’s see how it behaves when I ask it to classify the dragon below. You can specify an image by adding the flag –image_file when running the above. E.g: python imagenet_classify_image.py –image_file=dragon.jpg
And, Nop… It’s got no idea what a dragon is, but it did think with little confidence, that it could potentially be one of the animals above.
So let’s teach this thing what a dragon is. To do so, we can use another example from TensorFlow available here. This script loads the pre-trained Inception-v3 model we used above, removes the old top layer and trains a new one on the photos of dragons you downloaded.
To reiterate, we are dealing with a convolutional neural network that by nature contains several layers, each of them responsible for a specific thing, in the case of Inception-v3, the first few layers are training the model on doing edge detection, then shape detection, getting increasingly more abstract up until the end. The last layers are high-level detection for whole objects, so basically we just want to retrain the last layers on features of dragons.
One “limitation” we will face is that to teach the model that dragons are dragons, we need to also tell it what aren’t dragons. But instead of classifying things as “dragon” and “not a dragon”, the model we’re training will be able to recognize…
That’s it, right there. The model is going to be able to identify my two favorite things in the world. That means you will have to go through the same process as last time and download several pictures of hot dogs.
Assuming you already downloaded the example, create a folder called “images” in the same folder you saved the script, inside the images folder, create two folders: dragon and hotdog. Then place the images you’ve downloaded in their respective folders.
To start training, or should I say, retraining, use this command:
python retrain.py –image_dir=images –output_graph=graph.pb –output_labels=labels.txt
This could take up to 15 minutes, depending on your computer’s speed. So since you have time to kill, how about using it to learn something else? Conveniently enough I’ve written these:
At this point, you should have a file called graph.pb and another called labels.txt. The first one is your newly generated model, capable of identifying dragons and hotdogs, and the other ones are the labels for them.
Now it’s just a matter of using the model for predictions:
The code is simple enough, importing dependencies then up until line 6 we’re getting the image and the labels file ready to be used later on.
Then from line 8 to 11, we’re importing the model into the TensorFlow instance.
Line 13 to 15 creates a session and runs the prediction, at this point, we already know whether we’re dealing with a dragon or a hotdog and we can take action based on it, be it call the Targaryens or eat it, whichever is more appropriate.
Game of Thrones jokes aside, on line 17 we’re just ordering the results by relevance.
And afterward, we’re just presenting it to the user in a readable manner.
To run this, download a picture dragon or of a hotdog, I used the one of the dragon I used in this post, then run:
python predict.py <jpg_image_here>
And sure enough:
This my friends, is just all kinds of confusing.
Stay up to date!