A Survey and New Approaches to Data Augmentation in Classification and Segmentation

Khaled Alomar

doi:10.37421/2167-0919.2022.11.360

Commentary - (2022) Volume 11, Issue 12

A Survey and New Approaches to Data Augmentation in Classification and Segmentation

Khaled Alomar^*

^*Correspondence: Khaled Alomar, Department of Computer Science, University of Southampton, Southampton SO17 1BJ, UK, Email:

Author information

Department of Computer Science, University of Southampton, Southampton SO17 1BJ, UK

Received: 05-Dec-2022, Manuscript No. jtsm-23-91453; Editor assigned: 06-Dec-2022, Pre QC No. P-91453; Reviewed: 19-Dec-2022, QC No. Q-91453; Revised: 23-Dec-2022, Manuscript No. R-91453; Published: 29-Dec-2022 , DOI: 10.37421/2167-0919.2022.11.360
Citation: Alomar, Khaled. “A Survey and New Approaches to Data Augmentation in Classification and Segmentation." J Telecommun Syst Manage 11 (2022): 360.
Copyright: © 2022 Alomar K. This is an open-access article distributed under the terms of the creative commons attribution license which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Description

Deep neural networks, particularly convolutional neural networks, have revolutionised computer vision over the last decade. However, in order to produce satisfactory results, all deep learning models may necessitate a large amount of data. Unfortunately, sufficient amounts of data for real-world problems are not always available, and it is well understood that a lack of data easily leads to over fitting. This problem can be addressed in a variety of ways, one of which is data augmentation. We survey existing data augmentation techniques in computer vision tasks, such as segmentation and classification, and propose new strategies in this paper. In particular, we introduce a way of implementing data augmentation by using local information in images. Rotation is a straightforward geometric data enhancement technique. The images are rotated by a specified angle, and the newly created images are used as training samples alongside the originals. Rotation has the disadvantage of potentially causing information loss at the image boundary. There are several possible solutions, e.g. Random nearest neighbour rotation, random reflect rotation and random wrap rotation, to fix the boundary problem of the rotated images. To fill in the gaps, the RNR technique, in particular, repeats the nearest pixel values, whereas the RRR technique uses a mirror-based approach and the RWR technique uses the periodic boundary strategy. These geometric data augmentation techniques have been shown to be highly effective in increasing data quantity and improving diversity.

Deep neural networks, like convolutional neural networks (CNNs), have been used in computer vision for a variety of research purposes, including action recognition, object detection and localization, face recognition, and image characterization. They have outperformed traditional approaches in a variety of difficult computer vision tasks. Nonetheless, flaws such as large-scale data requirements, long training times, over fitting, and performance slumps due to data scarcity may impede their generalization and effectiveness. The CNN models' fruitful results encourage researchers to pursue higher accuracy models. These outcomes are typically obtained by constructing more complex architectures. Note that model complexity is often described by the number of trainable parameters. A model's complexity increases as the number of trainable parameters increases. Because of their complex structure, networks can easily memories data points. However, with insufficient data, the increasing complexity of model architectures may exacerbate the shortcomings of CNN models. The over fitting problem, which can be defined as the performance difference between the training and validation/test stages, is one of the most visible issues when implementing complex CNN models. Over fitting occurs when a model is either too complex for the data or when the data is insufficient. Although training and validation accuracy improved concurrently during the early stages of training, they diverged after a certain point, when the model began to lose its ability to generalize.

Regularisation techniques such as dropout, ridge regression, and Lasso regression regularisation are implemented at the model architectural level. The main objective of these techniques is to reduce the complexity of a neural network model during training, which is considered the main reason behind over fitting, especially when the model is trained on small datasets. Other techniques, like batch normalisation and transfer learning, may speed up the training process and also have an impact on preventing over fitting. These techniques could be viewed as by-products of the constant competition for higher performance through the development of new complex deep neural architectures, such as VGG-16, ResNet, Inception-V3, and DenseNet. In fact, these models aim to achieve higher accuracy on large datasets such as Image net, which contains over 14 million images.

Data augmentation can be classified according to the intended purpose of use (e.g., increasing training dataset size and/or diversity) or according to the problems. Here are some examples of the latter: To address the occlusion issue, the random erasing technique was proposed; rotation and flipping were supposed to partially resolve the viewpoint issue; brightness was used to address the change in lighting, and cropping and zooming were used to address the scaling and background issues. The most common classification of data augmentation is deep learning-based data augmentation and traditional data augmentation, which is further classified as geometric, photometric, and noise data augmentation. For reviews of deep learning approaches for data augmentation. This survey was primarily concerned with recent articles that used data augmentation techniques in image classification and segmentation, regardless of the data augmentation category, models, or datasets employed in the studies. There are few surveys in the fields of data augmentation in image classification and segmentation that we are aware of. Another significant contribution of this article is the development of a new geometric data augmentation technique that can supplement existing data augmentation strategies. Traditional rotation is well known as one of the most widely used geometric data augmentation techniques. For example, when rotating, a significant amount of pixel information is lost. It is clear that rotating a squareshaped image in a circular trajectory results in black patches at the edges [1-6].