Classification of grapevine leaves images using VGG-16 and VGG-19 deep learning nets

ABSTRACT


INTRODUCTION
In the field of image classification has experienced significant growth and has become increasingly popular among technology developers in recent times.This growth can be attributed to the rapid increase in data volume across various industries such as electronic commerce, automotive, medical care, and playing games [1], [2].Classification is a methodical process of organizing entities into distinct groups and categories based on their inherent features.Image classification emerged as an endeavor to bridge the disparity between computer vision and human vision, accomplished by training computers using relevant data.This task involves segregating images into predetermined categories based on their visual content [3], [4].Machine learning represents a critical facet within the domain of artificial intelligence.Despite its trajectory of over five decades of development, certain challenges remain unresolved.These challenges include intricate image comprehension and recognition, natural language translation, as well as recommendation systems [5], [6].Deep learning constitutes a significant offshoot founded on the principles of machine learning.The approach leverages the hierarchical nature of artificial neural networks and biological neural systems to analyze data, where it acquires high-level features through the integration of low-level features using feature combination methodologies.This capability enables the successful accomplishment of tasks such as image categorization or prediction.Unlike conventional machine learning, deep learning leverages multi-layered neural networks to autonomously learn from images and extract intricate underlying features present within the images [7], [8].Among the prevailing deep neural network architectures, (convolutional neural networks (CNNs) or ConvNets) stand as one of the most widely embraced.CNNs perform convolutions of acquired features with input data, and their utilization of 2D convolutional layers renders them highly adept at processing two-dimensional data, notably images.This architecture obviates the requirement for manual feature extraction, sparing the need to explicitly identify features employed in image classification.Instead, CNNs operate by directly extracting features from the images themselves [9], [10].This research employs two distinct deep learning methodologies, namely VGG-16 and VGG-19, for the classification of grapevine leaf images, subsequently predicting their correct class.A sizable dataset comprising thousands of images serves as the input data for this study.The investigation focuses on analyzing and comparing the accuracy achieved during the 'train' sessions at various percentage levels.
Following is an outline of the article: the related works are included in section 2. Section 3 included the existing materials and methods utilized in the proposed grapevine leaves images classification.In section 4, the layout of the proposed grapevine leaves images classification is described.Section 5 covers the findings.Section 6 contains its conclusions.

RELATED WORK
An extensive review of relevant literature used in this investigation is provided in this section.Diago et al. [11] provided a novel approach for analyzing leaf area and yield in color photos while characterizing the grapevine canopy.The approach is predicated on creating a supervised classifier using the Mahalanobis distance.It automates the processing of image sets and computes the areas (in terms of how many pixels there are).The initialization of each class relies on user input, wherein the user selects representative pixels to serve as clustering anchors.The segmentation outcomes demonstrate impressive performance, with 92% accuracy for leaf identification and 98% accuracy for cluster detection.The simplicity of the image acquisition setup and the precise definition of pixel classes make this method robust and well-suited for providing valuable information for vineyard management.
Pereira et al. [12] proposed a novel method for evaluating the effectiveness of transfer learning and fine-tuning methods using the AlexNet, with a specific focus on grape variety identification.This study involves two distinct vineyard image datasets collected from different geographic areas.The process of generating diverse datasets for training and classification involved the utilization of various image processing techniques, among which a warping method utilizing four image corners was employed.By applying the transfer learning scheme based on AlexNet to the image dataset pre-processed, a promising accuracy rate achieved was 77.30%.Additionally, when this classifier model was utilized, an impressive accuracy achieved on the well-known Flavia leaf dataset was 89.75%.
Koklu et al. [13] suggested a classification technique to classify grapevine leaves images using deep learning techniques.Initially, 500 vine leaf images from 5 diverse classes were captured using a specialized self-illuminating system.Data augmentation methods were then employed to expand this dataset, resulting in a total of 2500 images.For the classification task, a modern CNN model called MobileNetv2 was tuned.Three distinct approaches were explored: In the first approach, classification was directly performed using the tuned MobileNetv2 model.In the next method, features were taken from the improved, and process of classification was done utilizing several support vector machine (SVM) classifiers.Finally, 1000 features were taken out of MobileNetv2 and chosen using the Chi-Squares method.These features were then reduced to 250 through dimensionality reduction.The classification was subsequently performed using various SVM kernels based on these chosen features.The Chi-Squares approach was found to be the most effective at extracting features from MobileNetv2's logits layer and then reducing those features.Remarkably, this approach achieved a classification success rate of 97.60%.
Zhang et al. [14] offered a deep learning-method entitled YOLOv5-CA, designed to attain an optimal balance between grape downy mildew (GDM) detection accuracy and processing speed in normal circumstances.The approach incorporates a coordinate attention (CA) mechanism into the YOLOv5 architecture, effectively highlighting visual features relevant to downy mildew disease, thereby enhancing the detection performance.To assess the efficacy of the proposed approach, the challenging GDM dataset was acquired from a vineyard under various natural scenes, encompassing diverse lighting, shadows, and backgrounds.According to the findings, YOLOv5-CA had excellent detected precision of 85.59%, recall of 83.70%, and mAP@0.5 of 89.55%.These performance metrics outperform those of well-known techniques such as faster R-CNN, YOLOv3, and YOLOv5.The suggested method also has excellent inference speed, processing at 58.82 frames per second, making it appropriate for real-time disease control requirements.
Ahmed et al. [15] suggested a (CNN)-based model tailored for grape leaf classification by adapting the DenseNet201 architecture.A primary focus of this investigation is to assess the influence of layer freezing on the performance of DenseNet201 during the fine-tuning process.To conduct this research, a publicly available dataset comprising 500 images, encompassing five distinct classes with 100 images per class was utilized.To expand the training set, various data augmentation techniques were employed.The proposed CNN model, named DenseNet-30, demonstrated superior performance compared to existing works on grape leaf classification, where the dataset was originally sourced.The DenseNet-30 achieved an impressive overall accuracy of 98%, underscoring its effectiveness in accurately classifying grape leaves.

IMATERIALS AND METHODS
This section explains the materials and methods used in the proposed system.Section 3.1 explain the dataset that was used in the proposed system.Sections 3.2 and 3.3 describe the deep learning techniques used in the Grapevine Leaves classification process such as VGG-16 and VGG-19, respectively.

Grapevine leaves image dataset
In this study, we used the freely available dataset that we acquired from [13].This collection includes grapevine leaf samples from five different species: "Ak," "AlaIdris," "Buzgulu," "Dimnit," and "Nazli."For every species, there are 100 photos, each with 512×512 pixel dimensions.Consequently, the aggregate number of images utilized in this study amounts to 500.Notably, the acquisition of these images was facilitated through the use of a specialized automatic illumination system.Figure 1 presents a representative sample from each class.

VGG-16 net
In this investigation, we employed the publicly the VGG-16 architecture represents a deep (CNN) structure, where the number "16" denotes the presence of 16 layers, encompassing both convolutional and fully connected layers.This design is characterized by the utilization of compact 3×3 convolutional filters and deep architectures with a stride size of 1.The pooling layers adopt a 2×2 configuration with a stride size of 2 and maintain the same padding.By default, the VGG-16 network processes input images of size 224×224.Preceding the fully connected layers, a 7×7 feature map containing from 512 channels is employed.Subsequently, this feature map is transformed into a vector with 25,088 channels (7×7×512) as the resulting feature representation [16], [17].Figure 2

VGG-19 net
VGG-19 is a feed forward network constituted of 19 layers, arranged in a sequential manner.The network uses only 3×3 convolutional filters throughout the whole architecture, which makes it computationally more efficient than architectures that use larger filters.The first 16 layers of VGG-19 are convolutional layers, and they are separated into five blocks, each of which comprises several convolutional layers then followed by a max-pooling layer.As the network progresses deeper, the quantity of filters within each block undergoes increment.The convolutional layers are responsible for extracting salient features from the input images, whereas the fully connected layers classify images using the retrieved features.Additionally, the use of max-pooling layers reduces the dimensionality of the features and lessens the possibility of overfitting.A visual representation of the architectural arrangement of the VGG-19 net model is shown in Figure 3 [19], [20].

PROPOSED SYSTEM
In this study, we proposed a grapevine leaf image classification system based on using VGG-16 and VGG-19 deep learning nets.It has six main consecutive stages; (i) color grapevine leaf images dataset loading, (ii) resizing the image, (iii) loading the deep learning nets, (iv) training options for the deep learning nets, (v) tested image from the inside dataset, and (vi) tested image from the outside dataset.Figure 4 illustrates the structure of the proposed grapevine leaves image classification system.

Color grapevine leaves images dataset loading
This stage represents the first stage of the proposed system.It involves the process of reading and loading a collection of color images of grapevine leaves into memory.Also, enables further processing, analysis, or machine learning tasks.

Resize the image
This stage refers to the process of changing its dimensions, making it larger or smaller.When you resize an image, you are essentially altering the number of pixels it contains, either increasing or decreasing its width and height.Also, the image is resized to 224×224.

Loading the deep learning nets
This stage refers to the process of loading pre-trained neural network models into memory to make predictions or further train them on new data.Deep learning networks are typically trained on large datasets for specific tasks such as image classification.The loading process is done using MATLAB program and two pre-trained networks are loading named VGG-16 and VGG-19.

Training options for the deep learning nets
Once trained, the models are saved in a specific format that allows them to be reused later without the need for retraining.During training, the model's parameters are optimized to reduce a selected loss function, which determines the variation between expected and actual outputs.After training, the model's architecture and learned parameters (weights and biases) are saved to disk in a serialized format.

Tested image from inside dataset
A tested image generally refers to an individual image or data point that is employed to assess the effectiveness of a trained deep learning model.During testing, the model generates an output using the tested image as input (e.g., a predicted class label for image classification tasks).The model's predictions are then compared to the ground-truth labels of the test set to determine the effectiveness of measurements like: accuracy, precision, and recall.These metrics make it easier to assess the model's effectiveness in completing the assigned task by providing useful information about how it performs on data that hasn't been seen before.

Tested image from outside dataset
Typically refers to an image that is not part of the original dataset used for training and evaluating a deep-learning model.Instead, it comes from a completely different dataset that the model has never encountered during its training phase.It means that we take an image from an entirely different dataset that was not used for training, or testing.This simulates how the model would perform on entirely new, unseen data from a different source or distribution.

RESULTS AND DISCUSSION
Four primary criteria are used to assess the efficiency and accuracy of the system that is suggested, namely accuracy, precision, recall, and specificity.Accuracy pertains to the proportion of correct estimates achieved by the system as the (1) [21].Precision measures the ratio of real positive detections to all positive detections, as the (2) [22].On the other hand, recall measures the number of accurately found ground truth annotations, as the (3) [23].Lastly, specificity quantifies the proportion of accurately identified negative values, as the (4) [24], [25].presents the outcomes obtained from the assessment of the proposed model using the aforementioned metrics.The performance results for VGG-16 net are as follows: accuracy 99.6%, precision 99.00%, recall 00.00%, and specificity 99.75%.Conversely, for the VGG-19 net, the performance metrics achieved a perfect score of 100.00% across all metrics.According to the test findings, it is evident that the VGG-19 net outperforms the VGG-16 net.Consequently, the VGG-19 net is deemed superior in its ability to accurately classify grapevine leaf images.Figure 5 and Figure 6 present the precision, recall, and specificity results for each class within the dataset for VGG-16 net and VGG-19 net, respectively.Figure 7 and Figure 8 show the use of four different test images randomly selected from inside and outside the dataset to assess the efficacy of the suggested model.From the results, it was found that the accuracy of the first image from each figure selected from inside the dataset is equal to 100, while the accuracy decreases for the rest of the images from outside the dataset, depending on the strength of the network in the classification process.Also, these figures show the efficiency of the VGG-19 net in classifying images from outside the dataset, so that the accuracy of the three test images in the case of the use of VGG-16 net are 95.373%,56.209%, and 50.022%, respectively, while the accuracy of the three test images in the case of the use of VGG-19 net are 99.936%,98.352%, and 71.683%, respectively.Figure 9 illustrates the comparative analysis between the VGG-16 net and VGG-19 net concerning their accuracy, precision, recall, and specificity.Notably, the results highlight the superior performance of the VGG-19 net in the image classification task, underscoring its efficacy in this context.2 presents a comparative analysis of our proposed strategy alongside previous studies, revealing its superior performance over alternative approaches.The results demonstrate that our suggested approach achieves higher levels of evaluation compared to earlier investigations.Consequently, the efficacy of our proposed system has been empirically validated.Accuracy (%) Precision (%) Recall (%) Specificity (%) [12] 89.75 --- [13] 97

CONCLUSION
In conclusion, the use of the deep learning models VGG-16 and VGG-19 for the categorization of photos of grapevine leaves has shown to be very successful and promising.It was demonstrated that both models could handle challenging picture identification tasks by their strong performance in correctly classifying grapevine leaves into different groups.Regarding overall accuracy and generalization, the deeper architecture of the VGG-19 model gave it a slight advantage over the VGG-16 model.However when compared to traditional machine learning methods, both models showed notable gains, underscoring the effectiveness of deep learning for picture categorization problems.Based on the data, the VGG-16 net obtains an accuracy rate of 99.6%, whereas the VGG-19 net achieves a 100% accuracy rate.We can state that the VGG-19 net performs better than the VGG-16 net in the classification of images of grapevine leaves based on this comparison study.

Figure 1 .
Figure 1.A sample of each class of grapevine leaf refers to the structure of VGG-16 net[18].

Figure 4 .
Figure 4.The proposed grapevine leaves images classification layout 449 the indicated grapevine leaves images system utilized VGG-16 and VGG-19 deep learning nets are evaluated using the dataset described in 3.1.The dataset was divided into two sections: training and testing.Training data made up 80% of the whole dataset while testing data made up 20% of the total dataset.The evaluation of the proposed system's performance entails the training of two distinct networks, namely the VGG-16 net and the VGG-19 net.The classification process commences by subjecting the dataset images to all stages of our suggested model, facilitating training and accuracy assessment.

Figure 5 .Figure 6 .
Figure 5.The results obtained for each class when utilizing the VGG-16 net

Figure 7 .Figure 8 .
Figure 7.The accuracy of four tested images to evaluate model using VGG-16 net

Table 1 .
The result of grapevine leaves images classification

Table 2 .
Comparison of our outcomes with results from earlier experimentsRef.