Image Classification with Deep Neural Networks

Abstract: This tutorial introduces MMPreTrain for multi-class image classification tasks. From environment setup to model configuration and optimization, it guides users through the process with practical examples. By following this tutorial, users can efficiently classify images using MMPreTrain.

Keywords: MMPreTrain, image classification, multi-class classification, deep learning, tutorial, model configuration, optimization

Introduction

In a world inundated with visual information, image classification emerges as a critical technology. From diagnosing medical conditions to enabling self-driving cars, the ability to automate image interpretation has revolutionized industries. This article is a comprehensive guide to image classification, focusing on the transformative role of deep neural networks.

Understanding Image Classification

Image classification entails the automatic assignment of labels to images based on their content. It bridges the gap between human visual perception and computational analysis. Whether it’s distinguishing between cat and dog photos or recognizing handwritten digits, image classification forms the foundation of modern computer vision.

Evolution of Image Classification Techniques

The journey of image classification techniques spans several eras. It started with hand-crafted features like edges and textures that required human expertise. Then came machine learning algorithms such as Support Vector Machines and Random Forests, which demanded manual feature engineering. These methods were limited in their ability to handle complex, unstructured data like images.

Introduction to Deep Neural Networks (DNNs)

Deep neural networks (DNNs) marked a breakthrough by allowing computers to learn directly from raw data. A neural network consists of interconnected layers, including input, hidden, and output layers. Weights and biases quantify the strength of connections between neurons. Activation functions introduce non-linearity, enabling DNNs to model complex relationships within data.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are designed to process grid-like data, such as images. They emulate the human visual system’s hierarchical feature detection process. CNNs employ convolutional layers to extract local features and pooling layers to reduce spatial dimensions, leading to a powerful representation of an image’s content.

Training DNNs for Image Classification

Training a DNN involves feeding it labeled data and iteratively adjusting its weights to minimize prediction errors. Loss functions quantify the discrepancy between predicted and actual labels. Optimization algorithms like gradient descent determine the optimal weight adjustments. Backpropagation is the process by which gradients flow backward through the network, enabling weight updates. Training occurs over epochs, with mini-batches for efficiency.

Transfer Learning and Pretrained Models

Transfer learning leverages knowledge gained from large datasets to solve new tasks efficiently. Pretrained models, already trained on extensive datasets, serve as a starting point. Fine-tuning these models with domain-specific data accelerates training and boosts performance. This approach has democratized deep learning adoption, making it accessible to smaller datasets and domains.

Case Studies and Success Stories

Real-world examples highlight DNNs’ impact on image classification. In social media, facial recognition algorithms power photo tagging. Medical imaging benefits from CNNs that detect diseases like diabetic retinopathy. Autonomous vehicles use image classification to identify pedestrians, vehicles, and road signs, enhancing safety.

Challenges and Future Directions

While DNNs have achieved remarkable success, challenges remain. Overfitting, where models memorize training data, can hinder generalization. Biases present in training data may lead to unfair predictions. Adversarial attacks exploit model vulnerabilities. Future directions include attention mechanisms, which focus on relevant image regions, and GANs that generate synthetic training data.

Ethical Considerations

Image classification raises ethical concerns. Privacy issues emerge when personal images are used without consent. Bias in training data can perpetuate societal inequalities. Deploying AI-driven decisions, such as criminal sentencing, demands transparency and accountability to avoid unjust consequences.

A Practical Guide: Multi-Class Image Classification using MMPretrain

In this tutorial, we will explore how to perform multi-class image classification using the mmpretrain library.

MMPretrain

MMPretrain is an open-source pre-training toolbox based on PyTorch. It is a part of the OpenMMLab project. It provides multiple powerful pre-trained backbones and supports different pre-training strategies. MMPretrain originated from the famous open-source projects MMClassification and MMSelfSup, and is developed with many exciting new features.

The pre-training stage is essential for vision recognition currently. With the rich and strong pre-trained models, we are currently capable of improving various downstream vision tasks.

  • Supports multiple pre-training strategies, including supervised pre-training, self-supervised pre-training, and semi-supervised pre-training.
  • Provides multiple powerful pre-trained backbones, including ResNet, ResNeXt, ViT, and Swin Transformer.
  • Easy to use and extend. MMPretrain is built on top of MMCV, which provides a unified and comprehensive infrastructure for computer vision.
  • Well-documented. The documentation of MMPretrain is clear and concise, making it easy for users to get started.

Specially, for the image classification tasks:

  • MMPretrain supports multiple image classification datasets, including ImageNet, CIFAR-10, and MS COCO.
  • MMPretrain can be used to fine-tune the pre-trained models on a specific dataset to improve the performance on that dataset.
  • MMPretrain can be used to train new image classification models from scratch.
  • MMPretrain can be used to transfer learning, which is the process of using a pre-trained model as a starting point for training a new model on a different task.

To use mmpretrain for multi-class image classification, you can follow these steps:

  1. Create a conda environment and install the necessary packages.
  2. Download the dataset.
  3. Import the predefined configuration.
  4. Modify the configuration to specify the number of classes and other parameters.
  5. Configure the model, data loaders, validation evaluator, and optimization wrapper.
  6. Train/Fine-tune the model on the dataset.
  7. Evaluate the model on the validation set.

Hands-on Implementation

Source code: Original codes can be downloaded from GitHub.

Trained Model: The trained model uploaded on Hugging Face. It is available to test and/or download.

Setup

Creating a Conda environment and activate it

Commands
$ conda create --name openmmlab python=3.10 -y
$ conda activate openmmlab

Install the necessary packages

Commands
$ conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia
$ sudo reboot
$ git clone https://github.com/open-mmlab/mmpretrain.git
$ cd mmpretrain
$ pip install -U openmim && mim install -e .

Downloading Dataset

Stanford Cars data set can be downloaded from Kaggle.

Importing Predefined Configurations

Some popular pre-trained CNN architectures include ResNet, ResNeXt, ViT, and Swin Transformer. You could also choose to use a custom CNN architecture, such as the MobileNetV2 architecture. In this tutorial, we’ll use the EfficientNetV2_b0 architecture for image classification. mmpretrain provides predefined configurations for various models. Import the EfficientNetV2_b0 configuration:


_base_ = [
    'mmpretrain::efficientnet_v2/efficientnetv2-b0_8xb32_in1k.py'
]

Loading pretrained model (if necessary)

  • You can load a pretrained model from a checkpoint file.
  • You can load a pretrained model from a URL.
  • You can load a pretrained model from a local file.

Here we load the pretrained model from a URL:


load_from = "https://download.openmmlab.com/mmclassification/v0/efficientnetv2/efficientnetv2-b0_3rdparty_in1k_20221221-9ef6e736.pth"

Update the Model Configuration

  • Number of classes: The number of classes will depend on the dataset that you are using. For example, if you are using the Stanford Cars dataset, the number of classes would be 196.

num_classes = 196
data_preprocessor = dict(
    num_classes=num_classes)

model = dict(
    head=dict(
        num_classes=num_classes,
    ))

Update the Optimizer Configuration

  • Learning rate: You can adjust the learning rate up or down depending on how well the model is learning.
  • Optimizer: The AdamW optimizer is a good choice for image classification tasks. You could also use the SGD optimizer, but you may need to adjust the learning rate more frequently.
  • Scheduler: The StepLR scheduler is a good choice for image classification tasks. You could also use the ReduceLROnPlateau scheduler, but this may be more effective if the model is prone to overfitting.
  • Mixed precision: Mixed precision can be enabled if you have a GPU that supports it. This can improve the performance of the model without sacrificing accuracy.

warmup_epochs = 10
base_lr = 5e-4

optim_wrapper = dict(
    _delete_=True,
    type='AmpOptimWrapper',
    optimizer=dict(type='AdamW', lr=base_lr, weight_decay=0.001),
    # specific to vit pretrain
    paramwise_cfg=dict(custom_keys={
        '.cls_token': dict(decay_mult=0.0),
        '.pos_embed': dict(decay_mult=0.0)
    }),
)

param_scheduler = [
    # warm up learning rate scheduler
    dict(
        type='LinearLR',
        start_factor=1e-4,
        by_epoch=True,
        end=warmup_epochs,
        # update by iter
        convert_to_iter_based=True),
    # main learning rate scheduler
    dict(
        type='CosineAnnealingLR',
        eta_min=1e-5,
        by_epoch=True,
        begin=warmup_epochs)
]

Update the Data Loaders Configuration

  • Batch size: A good starting point for the batch size is 32. You can adjust the batch size up or down depending on the amount of memory that you have available.
  • Image size: A good starting point for the image size is 224×224. You can adjust the image size up or down depending on the computational resources that you have available.
  • Augmentation: Some popular augmentation techniques include random cropping, random flipping, and color jittering. You can use a combination of these techniques to improve the performance of the model.

data_root = "/path/to/Datasets/Stanford_Cars_by_class_folder/car_data/car_data/" 
train_image_folder = "train"
val_image_folder = "test" 
IMAGENET_CATEGORIES = ["AM General Hummer SUV 2000", "Acura RL Sedan 2012", ... ]  # (this list is truncated for brevity)
METAINFO = {'classes': IMAGENET_CATEGORIES}

train_dataloader = dict(
    dataset=dict(
        metainfo=METAINFO,
        data_root=data_root,
        data_prefix=train_image_folder
        )
)

val_dataloader = dict(
    dataset=dict(
        metainfo=METAINFO,
        data_root=data_root,
        data_prefix=val_image_folder
        )
)

Update the Validation Evaluator Configuration

  • Evaluator: The accuracy metric is a good choice for evaluating the performance of an image classification model. You could also use the loss, the precision, or the recall metrics.
  • Interval: A good starting point for the interval is 1 epoch. You can adjust the interval up or down depending on how often you want to evaluate the model.

val_evaluator = dict(type='Accuracy', topk=(1, 5))
test_evaluator = val_evaluator

Update the Training Configuration

  • Number of epochs: A good starting point for the number of epochs is 30. You can adjust the number of epochs up or down depending on how well the model is learning.
  • Early stopping: Early stopping can be enabled if you want to prevent the model from overfitting.

max_epochs = 30
train_cfg = dict(by_epoch=True, max_epochs=max_epochs, val_interval=1)
val_cfg = dict()
test_cfg = dict()

# local path to saving the models and logs
work_dir = "./out"

# configure default hooks
default_hooks = dict(

    # save checkpoint per epoch.
    checkpoint=dict(type='CheckpointHook', max_keep_ckpts=1),
)

Load the configuration and train the model using mmengine

Create a separate Pyhton script for importing the main configuration and starting the training process. 


from mmengine.config import Config
from mmengine.runner import Runner
import argparse

def main(args):
    config = Config.fromfile(args.config_path)
    config.launcher = "pytorch"
    runner = Runner.from_cfg(config)
    runner.train()
    
if __name__ == "__main__":
    parser = argparse.ArgumentParser(description='Get Config Path.')
    parser.add_argument('config_path', type=str, help='path to the config file')
    args = parser.parse_args()
    main(args)

Run training from terminal

Start training by feeding the configuration (efficientnetv2_b0_config.py) into the main mmengine running script (main_train_mmengine.py).

Command
$ torchrun --nnodes 1 --nproc_per_node=3 main_train_mmengine.py efficientnetv2_b0_config.py

Log Analysis (Visualizing training/validation results)

Command
$ python mmpretrain/tools/analysis_tools/analyze_logs.py plot_curve ./out/path/to/scalars.json --keys accuracy/top1 accuracy/top5 --legend top1 top5 --out accuracy.jpg --title EfficientNetV2_b0

Log Analysis (Plot hyper-parameter scheduler of the optimizer, learning rate)

Command
$ python mmpretrain/tools/analysis_tools/analyze_logs.py plot_curve ./out/path/to/scalars.json --keys lr --legend lr --out lr.jpg --title EfficientNetV2_b0

Model Complexity Analysis (Get the FLOPs and params)

Command
$ python mmpretrain/tools/analysis_tools/get_flops.py /path/to/configuration/efficientnetv2_b0_config.py

Conclusion

Image classification with deep neural networks has transformed how we interact with visual data. From its historical roots to the present breakthroughs, DNNs have reshaped industries and opened new possibilities. As technology advances, the responsible development and deployment of image classification systems will play a pivotal role in shaping our AI-driven future.

References

Disclaimer

This tutorial is intended for educational purposes only. It does not constitute professional advice, including commercial or legal advice. Any application of the techniques discussed in real-world scenarios should be done cautiously and with consultation from relevant experts.

Scroll to Top