Deep Transfer Learning in Computer Vision: A Practical Guide | by Paras Varshney

Step-by-Step Approach and Tips for Building and Fine-Tuning Models using PyTorch

Image: Transfer Learning analogy from the real world. A tree with nodes and connections shares its knowledge to the outside world. — Image: Transfer Learning analogy from the real world

In this article, I will share my practical experience and insights gained from working on a computer vision project utilizing deep transfer learning. Through a step-by-step approach and useful tips, I aim to provide a comprehensive guide for building and fine-tuning a transfer learning model. From establishing a strong baseline to implementing advanced techniques such as blending and test-time augmentation, this article will cover all the key aspects of deep transfer learning in computer vision using PyTorch. Let’s dive in!

“Why re-invent the wheel when you can transfer learn from a model that’s already done the hard work?”

Flexing our AI Muscles: KaggleDays Yoga Pose Competition

I would be using an example from the Kaggle Competition “Don’t stop until you drop” hosted by KaggleDays as part of the KaggleDays x Z by HP Championship 2022. The competition was about using computer vision techniques to classify images of different yoga poses into 6 different classes. The goal was to achieve a high mean F1 score by creating algorithms that can perform precise classification. The dataset provided included images of different yoga poses and the class to which they belong. The competition aimed to find solutions to the problem of correctly identifying yoga poses, which can be challenging due to the complex nature of some postures.

Competition Link: https://www.kaggle.com/competitions/dont-stop-until-you-drop

“Transfer learning for Yoga pose: Because who has time to train a model from scratch when you could be perfecting your downward dog?”

Mastering Transfer Learning for Image Classification

Transfer learning is a powerful technique in deep learning that allows a model to utilize knowledge learned from one task and apply it to a different but related task. This can be especially useful in computer vision, where collecting and annotating large amounts of data can be time-consuming and expensive. In this article, we will explore practical tips for using transfer learning in computer vision, specifically focusing on image classification.

Image Source: https://commons.wikimedia.org/wiki/File:Transfer_Learning.png

One of the first things to consider when using transfer learning is the dataset. It is important to have a large and diverse dataset to train the model on. One way to achieve this is to use a pre-existing dataset, such as the Kaggledays championship first competition dataset, which we talked about earlier in this article, which is used for image classification with an F1 metric. Yoga pose detection had everything needed to build and test my baseline model.

“Because sometimes the only way to achieve enlightenment in deep learning is to learn from the masters.”

Setting the foundation: Baseline model building for DTL

The first step in deep transfer learning (DTL) is to establish a good baseline model. This is important because it allows us to iterate and experiment quickly. When establishing a baseline model, it is important to choose an appropriate image size, backbone, batch size, learning rate, and a number of epochs. For example, a good starting point for image size is 224 x 224, a good starting point for the backbone is Efficientnet V2 B0, a good starting point for batch size is 32, a good starting point for learning rate is 0.001, and a good starting point for the number of epochs is 5. Why? Coz it works well based on the empirical experiments over the years for the initial baseline, which you can further tune according to the problem at hand.

“Transfer learning is like baking a cake, you need a solid foundation (baseline model) before you can add your own unique ingredients (tuning, augmentations, etc.) to make it perfect.”

Tuning Learning Rate and Epochs

Once you have established a good baseline model, the next step is to tune the learning rate and the number of epochs. This is the most important step in deep transfer learning, as it has a significant impact on the performance of the model. The learning rate and the number of epochs should be chosen based on the backbone and data. A good starting range for the learning rate is between 0.0001 and 0.001, and a good starting range for the number of epochs is between 2 and 10.

“In the world of deep learning, transfer learning is like having a cheat code, it helps you achieve the high scores without putting in all the extra work”

Image: The effects of different learning rates. Credits: cs231n course.

Improving Performance with Image Augmentations

After tuning the learning rate and the number of epochs, the next step is to augment the training images. Augmentations are used to randomly change the training images, which helps to improve the performance of the model. Common augmentations include horizontal and vertical flipping, resizing, rotating, shifting, cutout, Cutmix, and Mixup. Tip: It is also important to increase the number of epochs when using augmentations to ensure that the model has enough time to learn from the augmented data.

“You can’t transform your model’s performance without a little bit of image manipulation… just like a good Instagram filter can transform a selfie!”

Albumentations: Fast and Flexible Image Augmentations Paper

Adjusting Model and Input Complexity

The next step is to adjust the model and input complexity. This can be done by increasing or decreasing the complexity of the model or by changing the backbone of the same family. For example, you can try using a different version of Efficientnet, such as V3 or V4, or you can try using a different architecture altogether, such as ResNext or NFNet. This step is important because it allows you to find the optimal model for your specific task and data. It is also important to re-iterate with previous tuning, as you might need more augmentations and regularization to improve the performance of the model.

“Simplifying your model is like taking off your favorite pair of skinny jeans, it’s tight but you know it’s worth it for that performance boost!”

Image: Model Complexity vs the Prediction Error

Fine-Tuning the Model for Optimal Performance

After adjusting the model and input complexity, the next step is to further tune the model. This can be done by increasing the image size, trying different backbones, or experimenting with different architectures. For example, you can try using a larger image size, such as 512 x 512, or you can try using a different architecture, such as VIT. This step is important because it allows you to find the best-performing model for your specific task and data.

“Fine-tuning the model is like giving your car a tune-up, it may not be broken but it will run better”

Retrain and Blend for Optimal Model Performance

The final step is to retrain the best models on the full training data and to blend the models. This is important because the more data the model is trained on, the better it will perform. Blending is a technique that involves combining multiple models, which can help to improve the performance of the model. It is important to use different seeds for the same settings and to use a diversity of models, such as different backbones, augmentations, epochs, and image sizes.

“Retraining your model is like hitting the gym, sometimes you need to switch up your routine to see results.”

Inference tricks for model performance improvement

In addition to these steps, there are also several inference tricks that can be used to improve the performance of the model. One such trick is test-time augmentation (TTA), which is a technique that involves applying augmentations to the test data to improve the performance of the model. Another trick is to increase the image size during inference, which can help to improve the performance of the model. Finally, post-processing and 2nd stage models can also be used to further improve the performance of the model.

“Blending models is like mixing a smoothie, a little bit of this, a little bit of that, and voila! Delicious performance”

Image: Test Time Augmentation (TTA) used to improve model performance during prediction

Conclusion: Tips for Implementing Deep Transfer Learning

In conclusion, deep transfer learning is a powerful technique that can be used to improve the performance of computer vision models. Following the tips and tricks outlined in this article, you can quickly and effectively implement deep transfer learning in your projects. Remember to establish a good baseline model, tune the learning rate and the number of epochs, use augmentation, and retrain and blend models. Additionally, use inference tricks such as TTA, increasing image size during inference, and post-processing and 2nd stage models. With the right approach, deep transfer learning can help you achieve state-of-the-art results in your computer vision projects.

In this series, I will soon be releasing a detailed article with coding tips and tricks to build a strong baseline for deep transfer learning and optimize it for computer vision tasks, along with strategies for applying DTL across different domains where it is most effective.

Review Website

Deep Transfer Learning in Computer Vision: A Practical Guide | by Paras Varshney | Jan, 2023