đź’»
đź§ 002. Transfer Learning in Convolutional Models
TL;DR
- Step-by-step guide to fine-tune pre-trained MobileNetV2, EfficientNetB0, and ResNet50 on a mask detection dataset.
- Preparation of data pipelines with Keras ImageDataGenerator for training and validation splits.
- Custom F1-score metric, comparative analysis, and real-time webcam demo.
Have you ever wondered how to train a computer vision model to detect whether someone is wearing a mask? In this article, you’ll discover, step by step and without complication, how to leverage pre-trained architectures — MobileNetV2, EfficientNetB0, and ResNet50 — to tackle this challenge using Keras. By the end, you’ll know not only which decisions to make, but also why to choose one model over another based on your data.
We start by setting up the environment and preparing the data: downloading a dataset of mask and non-mask images, organizing folders, and using Keras’s ImageDataGenerator
to normalize and split your images (80% training, 20% validation). We also resize all images to 128×128 pixels, so the model “sees” everything equally and won’t get confused by different resolutions.
Next, I introduce three pre-trained Keras models: MobileNetV2 (lightweight and fast), EfficientNetB0 (great balance of accuracy and efficiency), and ResNet50 (very deep with residual connections). With a simple createModel
function, we add global pooling, dropout, and a sigmoid output to convert them into binary classifiers. Initially, we freeze their base layers, then unfreeze them for full fine-tuning to extract more from their capabilities.
To compare, we implement a custom F1-score metric — essential when classes are imbalanced — and train each model for 10 epochs. You’ll see charts of loss, accuracy, and F1-score for both training and validation. While MobileNetV2 and ResNet50 reach around 70% accuracy, their F1-scores are very low. In contrast, EfficientNetB0 shines with nearly 77% accuracy and a much higher F1-score, achieving a better balance between precision and recall.
With EfficientNetB0 as our favorite, we perform fine-tuning by first unlocking 2 layers and then 7, allowing you to evaluate the impact on generalization. Unlocking a few layers improves performance without overfitting, but opening too many can lead to memorizing the training set.
Finally, we test each model on an external dataset and even run a live webcam demo, so you can see the practical potential and limitations of real-time mask detection.
Ready to try it out in your project? Let’s dive in!