Transfer Learning Tutorial

Last Updated: April, 2019

Code for this project is available here.

What is Transfer Learning?

Transfer learning is the practice of taking pretrained models and applying them to new datasets and problems. In this tutorial, we are going to be using a model that was trained on the imagenet dataset. The imagenet dataset contains more than 1.2 million images, of 1000 different categories. Various models have been created to challenge the imagenet competition, which are now available online. These models have been trained extensively and optimized for their task. Our goal is to take one of these pretrained models and use it on a different dataset (specifically the STL-10 dataset).

STL-10 dataset

STL-10 dataset

How to implement Transfer Learning?

One obvious problem with transfer learning is that the model has learned to recognize data of a specific shape and output predictions of another shape. For example, most imagenet models will crop the input images to a shape of 256x256 pixels and then output a 1000 output softmax vector that corresponds to the predictions for each of the imagenet classes. The data in the STL-10 dataset, however, is of shape 96x96 and instead of 1000 classes we are only using 10. So clearly we can't use this model as is to make predictions on STL-10. Instead we remove the dense component of the model, as seen below.

model with dense layers

Model with Dense Layers

Model with dense layers removed

Model with Dense Layers Removed

Then we can add a new dense neural network classifier that takes the output of the convolutional network component as input. The beauty of convolutional neural networks is that they are not dependent on the input shape. They are able to adapt of input of different sizes, albeit with a different resulting output. Therefore, even though the images we are using are of a much smaller size, we can use the same trained weights. The dense layers that we add to the convolutional just form a final mapping step from the outputs of the convolution to the image labels.

Using the STL-10 Dataset

The homepage for the STL-10 dataset can be accessed here. For python implementations, the site recommends using this pre-made helper class to access the data. One slight change needs to be made to this code, however, and that is to add a directory argument to the save_images method by changing

This allows us to create separate folders for our train, validation, and test sets. Lastly, make sure your working directory contains an __init__.py file so that you can import the helper methods from our main class.

Preparing the Data

Now that we have the stl10_input module ready to be used we can download and prepare our data. Once our data is downloaded we read the images and labels and then split the data into training, validation, and testing sets before saving it as png files.

Because we have a large amount of image data it may be unrealistic to store all of our data in RAM. Therefore we will use the keras ImageDataGenerator to feed data into our model dynamically. To do this we create several generator objects:

Creating the Model

For this project we are going to use the DenseNet121 model with weights pretrained on the imagenet dataset. If you would like you use a different model, feel free to do so (it should be easy to switch to another model provided by the keras api. To import this model we use:

where include_top=False indicates that we want only the convolutional component of the model. Next we need to add our Dense layers to the base model.

One notable part of this code is that it sets the `trainable` attribute of the base layers to False (shown as red below). This means that when we train our model later, it will only update the weights for dense network (shown in green) and exclude the convolutional component in training. This is useful because it makes training run a lot faster, and the pre-trained weights will suffice for our purposes. If you were attempting to get very high accuracy with this model, you could allow training on the convolutional layers after the dense layers have been trained for a bit to fine tune the model. However, this is far more computationally expensive and therefore won't be covered in this tutorial.

Trainable

Only the dense layers will be updated during training

Training

To train this model, the Adam Optimizer will be used to minimize the loss function. Keras makes this very easy for us by providing implementations. In addition, we will use the model.fit_generator() function to train our model.

Making Predictions

To test our model, we make predictions on the test set. This data has never been seen by our model, so the predictions it makes are representative of its true performance. The class labels can be retrieved from their file and mapped to from the prediction outputs. It is important to note that the ImageDataGenerator labels the images with its own labels which may not correspond to the label names shown in directory file structure. We must take this into account when creating the mapping.

Then to make predictions on data we call model.predict() on a batch of test data.

Output: Accuracy: 0.877

To view some of the results we can use matplotlib and our label mapping to display our predictions:

Predicted Labels

As you can see the model performs fairly well, after minimal amounts of training. This is the magic of transfer learning, the ability to have a model produce reasonable predictions on image data without the challenge of training a massive convolutional model.


To try this model out yourself or see the full solution please visit the repository here.