How we optimized our computer vision model

Rej Šafranko

Dec 6, 20242 min read

Introduction

Convolutional neural networks (CNNs) are a powerful deep learning architecture widely used for computer vision tasks. Recently, we faced a binary classification challenge using a custom CNN. However, our initial model struggled with slow convergence and overfitting due to its large number of parameters. Here’s how we addressed these issues.

Transforming Dataset Images to Grayscale

Our first optimization was converting RGB images to grayscale, reducing input dimensions from three channels to one. This simple step cut down the number of parameters in the first convolutional layer, without compromising key features necessary for classification. The result? Lower memory usage and faster computations during training and inference.

Implementing Depthwise Separable Convolutions

Next, we replaced traditional convolutional layers with depthwise separable convolutions, inspired by architectures like MobileNet and Xception. These layers significantly reduce parameter counts while retaining the model’s ability to learn expressive features.

Depthwise separable convolutions split the operation into two parts:

Depthwise convolution applies a filter to each input channel individually.
Pointwise convolution combines these outputs across channels using 1×1 convolutions.

Here’s an example implementation using PyTorch, a Python-based deep learning framework:

Leveraging Pre-training with Selective Transfer Learning

We used the power of transfer learning by initializing the first two convolutional layers with weights from the VGG16 model pre-trained on ImageNet. These layers excel at capturing general patterns like edges and textures, which are transferable across tasks. This approach saved us from training early layers from scratch and allowed the model to focus it’s learning capacity on higher-level, task-specific features.

Conclusion

Through these optimizations—grayscale image conversion, depthwise separable convolutions, and selective transfer learning—we built a CNN that was not only computationally efficient but also highly effective for our binary classification problem.