Dataset information
The study uses a Kaggle dataset that was carefully selected to enable testing with images showing healthy and sick chickpea plants. According to a study by
Hayit et al., (2023), the dataset categorizes the images into different classes based on the severity levels: 1(HR) indicates Highly Resistant (0%-10% plant wilt), 3(R) indicates Resistant (11%-20% plant wilt), 5(MR) indicates Tolerant/Moderately Resistant (21%-30% plant wilt), 7(S) indicates Susceptible (31%-50% plant wilt) and 9(HS) indicates Highly Susceptible (Over 51% plant wilt).
The variety of chickpea plant conditions in the dataset is an invaluable tool for investigating and creating efficient classification algorithms. This dataset, which is a substantial collection of 4,339 leaf images, carefully classifies each image based on the different degrees of Fusarium Wilt Disease severity in chickpea plants. Of these, 959 images show strong resistance, while 1,177 images show resistant conditions. Furthermore, 1,133 images depict situations that are either somewhat resistant or tolerant, 558 images depict susceptible conditions and 512 images depict extremely susceptible events. Models intended to identify Fusarium Wilt Disease in chickpea plants can be trained and evaluated using this rich and varied dataset, which carefully captures a range of severity levels. Fig 1 represents the severity of the diseases in chickpea plants.
Convolution neural network (CNN) model process
In this work, a sequential CNN model is utilized for detecting the Fusarium wilt disease in chickpea leaves. Before the training process, image preprocessing (resizing, normalization and augmentation) of leaf images is needed to increase the accuracy of the CNN algorithms.
Image preprocessing (Resizing, normalization and augmentation)
Due to the many sources and conditions related to obtaining images of infected chickpeas from various places, noise is inherent in the dataset, which may affect the resolution of the images and the identification outputs of the model. This decreased by preprocessing each image before sending it into the model’s convolution and max pooling layers.
These procedures, which together try to reduce the disruptive effect of noise, include image scaling, normalization and noise filtering. Improving the model’s prediction power is the main goal of these preprocessing phases. All of the images in the collection were resized to 256×256×32 pixels. Normalization is essential for effective machine learning (ML) training because pixel values span from 0 to 255. It is imperative to normalize pixel values to be between 0 and 1 to avoid learning process slowdowns brought on by big integer values included in the input image. Noise filtering is a crucial process that deals with image corruption caused by decoding errors or positive and negative signals carried by noise channels. The selected filtering method successfully reduces image noise. Expanding the dataset through augmentation is essential because it introduces small visual distortions that prevent overfitting during training (Fig 2). This adjustment improves testing performance and becomes essential during testing, particularly when assessing the model’s performance on rotated images.
In addition, the dataset is carefully separated into train and test using an 80 and 20 ratio. Twenty percent of the dataset is set to evaluate the model’s performance and the remaining eighty percent is used to train the proposed algorithm. The validation dataset is used in the model’s continuous evaluation during training, which is crucial for optimizing performance by fine-tuning hyperparameters.
Feature learning and classification
Feature extraction was utilized to automatically extract important features from the collected images using the sequential CNN model. This allowed for the easy classification of images into Fusarium wilt-related classes (1(HR), 3(R), 5(MR), 7(S) and 9(HS)). Within the model’s step-by-step structure, there are discrete feature extraction layers. With 64 filters in its 3×3 filter, the first convolution layer, conv2d, produces an output shape of (32, 256, 256, 64). An output shape of (32, 128, 128, 64) is obtained by repeatedly 2×2 pool size of max pooling. The process of extracting features and reducing spatial dimensions involves the application of filters and max pooling operations over successive convolutional layers. Lastly, completely connected layers for categorization are reached using Flatten, fully connected layers and softmax.
A representation of the CNN model’s convolution and max pooling layers is shown in Fig 3. Table 1 represents hyperparameters used for model execution. The input for the fully connected layers is the flattened output (32, 2048) from earlier layers. There are 64 neurons in the dense layer and there are 5 neurons in the dense (1) layer that correspond to the classes. Rectified Linear Unit known as ReLU is used to introduce nonlinear features, which is selected for deep Convolutional Neural Network (CNN) training because of its computational efficiency. The ReLU activation function maintains the current value for non-negative inputs and outputs zero for negative input values to effectively train the model. The CNN model is trained on categorized training images as part of the model training procedure. The extracted features are then used in a classification process.
In the suggested model, the SoftMax classifying layer is introduced to identify the probability related to the expected label of chickpea diseases, following the CNN layers. When the total probability equals one, it generates output values between 0 and 1, that represents the estimated probability that the input image belongs to a particular class. SoftMax has the following benefits: it is suitable for accepting the output from the final fully connected layer; it is rapid to train and predict; and it is simple to define the output probability range. There is no standard approach for determining appropriate hyperparameters, therefore it takes a lot of trial and error to find the best values for the model. The hyperparameter values, which include learning rate, loss function, epochs, batch size and algorithm optimization, are set before training.
Filters sometimes referred to as kernels, are used in convolution processes to systematically extract information from overlapping regions of an input image or feature map. Multiplying the filter elements by the matching input image elements and then adding up the results are the mathematical operations involved in this complex procedure. The following formulation effectively conveys the dynamics of a two-dimensional convolution operation
via the interaction of an input image (I) and a kernel (K):
In this case, the coordinates of the image (I) are represented by i and j, whereas the coordinates of the kernel (K) are indicated by x and y. To reduce the probability of overfitting and improve computational efficiency, a max pooling layer is a strategic addition to the model. A Fully Connected Layer is added to the model thereafter, which is in responsible for classifying images according to patterns identified in earlier layers. A Softmax function is used in this layer to help classify the incoming data reliably and understandably. The softmax function converts the numerical values, x1, x2, x3…xn, of the neurons in the preceding layer into probabilities, Q1, Q2, Q3…Qn.
x
k= Numerical.
j
th= Neuron.
Q
k= Denotes the probability of class k.
The Adam optimization approach is used to train a deep learning model, reducing error rates and minimizing loss function. It uses an adjustable learning rate, squared gradient scaling and a moving average of the gradient for parameter optimization. A learning rate of 0.0001 is used to balance training efficacy and model performance, while the loss function evaluates the model’s effectiveness in achieving its objectives.
Parameters of the evaluation matrix
Accuracy, precision and recall are three widely used parameters that were used to evaluate the performance of the proposed approach. For evaluating the accuracy, the ratio of correctly identified data to the total number of inputs determines accuracy and the error rate is expressed as the percentage of incorrectly detected values. Accuracy is a dependable variable that ensures accurate results in every class, allowing a thorough evaluation. Another important statistic is precision, which can be computed as the ratio of true positives to the total of false positives and true positives. Recall represents the ability of the model to locate relevant events within a dataset. The F1-Score, ranging from 0 to 1, serves as an equilibrium point between precision and recall in performance evaluation. The simple mathematical expressions for the Accuracy, precision, recall and F1 Score are given as:
The confusion matrix is represented by TP (True Positive), FP (False Positive), FN (False Negative) and TN (True Negative) values.