Tallan Blog

Tallan’s Experts Share Their Knowledge on Technology, Trends and Solutions to Business Challenges

Classify Your Wardrobe Using Azure Custom Vision

Artificial Intelligence (AI) has become a technology that is used in our daily lives, but understanding how machine learning works is a completely different story. Typically, machine learning has been developed by people who are experts in the AI field and have access to high computing power. Microsoft has removed these barriers by providing Azure Cognitive Services. These services are available to anyone with an Azure cloud subscription and make it easy for developers to add AI features into their own applications. Today, I will be discussing Microsoft’s easy-to-use image classifier service Custom Vision.

I first came across Custom Vision while developing an application to generate fashionable outfits. The idea was to be able to upload an image of an outfit, (found on Pinterest or a fashion blog) and then have the application find similar items at a handful of websites I typically shop at. Fashion is not an easy subject to teach an AI model, but an image classifier was the perfect solution to pick out key features of an outfit and use these features to find similar products.

So how does Custom Vision work? Well, the best part about Custom Vision is that you don’t need to know. Custom Vision provides a simple UI and API for creating, training, and testing a model and applying it to an application.

Let’s learn just how easy it is to build a model. The process can be broken down into four steps:

  1. Collect training images
  2. Upload and tag images
  3. Train the model
  4. Analyze and test the model

For the purposes of this post, I will create a shoe classifier that classifies heels, sneakers, and boots. First things first, you need to set up your project. You can refer to Microsoft’s documentation to do so. After your project has been set up, you can start collecting and training images.

1. Collect Training Images

To train your model, you need to provide it with images. The images you train your model with are essential to its success. Common issues with image classifiers are underfitting and overfitting. Underfitting is when the model has not learned enough and performs poorly on both the training and testing images. Overfitting is the opposite and occurs when the model has learned too much. The model has learned the training images so well that it “memorizes” aspects of the training images that are not actually important. For example, if every photo of sneakers has green grass in the background the model may associate green grass with the tag “sneakers”. When the model is tested with new images it may tag all the images with grass in the background as “sneakers.” This will also cause the model to perform poorly on the testing images. Here are some key points to keep in mind to avoid underfitting and overfitting:

The more the better!

Generally, the more data provided to the model, the better. Microsoft encourages users to upload at least 50 images per tag to ensure the model has a strong understanding of that tag.

Balanced dataset

When training the model on specific tags, it is important to upload a balanced number of images per tag. If I uploaded 800 pictures of heels and 100 pictures of boots, I would not expect the model to have a strong understanding of both tags. A more balanced dataset will ensure the model can accurately predict every tag.

Diverse set of images

It is essential that the model is trained with images that vary in angle, lighting, background, etc. This will make the model much more accurate so it can determine which parts of the image are most important and which parts are just background or noise.

2. Upload and tag images

Now it is time to upload the training images to the model and tag them. Assuming you have already set up a new project in Custom Vision your screen will look like this:

Custom Vision Screen

Figure 1. Once you create a new project in Custom Vision your screen will look like this.

Tags are what the model will use to identify the image. For each image, the model will determine how likely that tag applies to the image. You can add as many tags as you want, but keep in mind each tag should be sufficiently represented in your training images.

You have the option to add tags and then upload images, or you can upload images and tag them individually. For this demo I am going to create all the tags first. This will make it easy for me to upload images in bulk for each tag. On the left side of the screen is the option to add tags.

By clicking on the “+” button a window to add new tags will appear.  I added heels, sneakers, and boots as my three tags for my model.

Figure 2. This window will appear when you click to add a new tag. Add the tag name here and click Save.

Figure 2. This window will appear when you click to add a new tag. Add the tag name here and click Save.

Next, upload the training images. By clicking “Add Images” I can upload pictures in bulk and assign them all one or more tags. Below you can see I added 50 pictures of heels and tagged them all as “Heels.” I repeated this process with 50 images of boots and 50 images of sneakers.

Figure 3. Your screen will look like this after you have chosen the images you want to upload. You can add tags to all the images at once and then click "Upload".

Figure 3. Your screen will look like this after you have chosen the images you want to upload. You can add tags to all the images at once and then click “Upload.”

3. Train the Model

After the images are uploaded it is time to train the model. Just click “Train” in the top of the window and Custom Vision does all the work for you

Figure 4. This is the menu bar at the top of your screen. When all the images are uploaded and tagged click "Train".

Figure 4. This is the menu bar at the top of your screen. When all the images are uploaded and tagged click “Train.”

4. Analyze and test the model

When training is complete, Custom Vision provides three metrics regarding the performance of your model: precision, recall, and AP. It is important to understand what each of these numbers mean:

Figure 5. When training is complete you will see these three metrics. Performance Recall and AP are all results of how well you model performed during training.

Figure 5. When training is complete you will see these three metrics. Performance Recall and AP are all results of how well you model performed during training.

  • Precision is the percent of identified classifications that were correct. If my model identified 100 images to be heels and only 90 of those images were actually heels, it would have a precision of 90%.
  • Recall is the percent of actual classifications that were correct. In other words, if my model identified 80 images to be heels but there were actually 100 total images of heels in the training data, it would have a recall of 80%.
  • AP is an overall measure of the model’s performance. It averages the precision of the model over different probability thresholds.

What is the probability threshold? You can it find on the left side of your screen and it should automatically be set to 50%.

Figure 6. In the performance tab on the left side of your screen is the probability threshold. By default, it is set to 50%.

Figure 6. In the performance tab on the left side of your screen is the probability threshold. By default, it is set to 50%.

This means that when the model is 50% sure that a tag applies to an image it will classify the image with that tag. So, as you increase the probability threshold the model will tend to be more accurate as it will only classify when it is more confident in the tag. As a result, the precision of the model will be higher. The recall of the model will decrease because the model is strictly tagging photos it is most confident in, and many images will go undetected. On the other hand, decreasing the probability threshold will decrease the precision of your model, but the model will have a much higher recall as it classifies more photos.

Finally, it is time for the fun part which is testing the trained model. In the top of screen click “Quick Test”

Figure 7. This is the menu bar at the top of your screen. After your model is trained you can click “Quick Test” to see how it performs on a new image.

Figure 7. This is the menu bar at the top of your screen. After your model is trained you can click “Quick Test” to see how it performs on a new image.

This will direct you to a screen where you can upload images to test your model. To get accurate results, you must use images the model has never seen before, therefore you should not use any image from the training data.

Figure 8. Your screen will look like this after you click "Quick Test". Here you can test your model with an image using a URL or an image saved to your device.

Figure 8. Your screen will look like this after you click “Quick Test”. Here you can test your model with an image using a URL or an image saved to your device.

First, I tested an image of heels, sneakers, and boots.

Heels Test:

Figure 9. Test results for heels. The model has determined it is 99.7% sure the picture is heels.

Figure 9. Test results for heels. The model has determined it is 99.7% sure the picture is heels.

Sneakers test:

Figure 10. Test results for sneakers. The model has determined it is 99.9% sure the picture is sneakers.

Figure 10. Test results for sneakers. The model has determined it is 99.9% sure the picture is sneakers.

Boots test:

Test Boots

Figure 11. Test results for boots. The model has determined it is 99.9% sure the picture is boots.

The Custom Vision model has identified each style of shoe very accurately. It can clearly see the difference between a pair of Louis Vuitton stilettos, New Balance running sneakers, and Steve Madden boots. What would happen if we were to present it with a shoe that looked like a combination of both boots and heels?

Boots with heels test:

Figure 12. Test results for boots with a heel. The model has determined it is 52.1% sure the pictures is boots and 47.5% the picture is heels.

Figure 12. Test results for boots with a heel. The model has determined it is 52.1% sure the pictures is boots and 47.5% the picture is heels.

This test is especially impressive because it was able to identify characteristics of both heels and boots in the image, however we never trained the model with a pair of shoes like this.

In just four simple steps I was able to create an accurate image classifier for different shoe styles using Azure’s Custom Vision service. A task that would have traditionally taken many hours of programming and a large amount of data was achieved using only 50 images for each style and one iteration of training. So, whether you’re classifying images to help you with your next online purchase or you have a new application in mind that would benefit from an image classifier, Custom Vision can help you reach your goal.

Share this post:

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

\\\