The method used for building a deep convolutional neural network (CNN) model for Faba bean leaf disease identification is described in the following sections. It is divided into several significant segments, the first of which is the gathering of images for deep neural network classification.
Dataset
The CNN model requires a sizable dataset to be processed. In this work, the data is captured with the help of agricultural experts. The Dataset contains 8021 images of faba bean leaf images captured in the fields, the detail is summarized in Table 1. It is separated into four categories: three disease categories and one healthy category. The disease groups include Healthy, Rust, Faba Bean Gall and Chocolate Spot leaf images (Fig 1 and Table 1). In this study, the entire dataset was split into training, testing and validation data in an 80:20 ratio.
Image processing and labelling
Image pre-processing was used to enhance or modify the raw images that the CNN classifier needed to process before training the model. Images acquired from various sources might have various dimensions, so it would be necessary to resize and rescale the pictures to make sure that the dimensions of the images are the same. Considering the computational cost of handling larger-sized images, this procedure is essential for consistency as well as for speeding up the training process. Scaling the data to a standardized size and format is essential before putting it into the network. It is standard procedure to use 224 × 224 input images in reliable models. A meaningful comparison with state-of-the-art models is facilitated by aligning the image size of our network with these standard dimensions.
So, the images were first resized to 224 × 224 pixels to normalize its size. The pictures were then converted to grayscale. For the explicit learning of the training data features, a significant amount of training data is needed at this pre-processing stage. The next stage involved sorting the photos of Faba bean leaves according to type and labeling each photo with the appropriate disease acronym. In this instance, the test collection and training dataset displayed four classes (Table 1).
Training dataset
This step involved applying the Convolutional Neural Network (CNN) process to generate a model for performance evaluation using image data as input. The steps involved in normalizing images of Faba bean leaves are depicted in Fig 2.
Convolutional neural network (CNN) model
In the CNN model, images can be efficiently filtered by the convolution operation because of its matrix structure. A convolutional layer, input layer, fully connected layer, pooling layer, drop-out layer and a final linked dataset classification layer are among the layers used in the Convolutional Neural Network for data training. An order of operations is mapped onto the input test set by each layer. Table 2 provides the details of the critical architectural and training parameters of a Convolutional Neural Network (CNN) employed in this study.
The CNN incorporated eight convolutional layers, each employing 32 and 64 filters to extract intricate features from input data. The architecture is further enriched with four max-pooling layers to down-sample spatial dimensions effectively. Dropout, a regularization technique, is strategically applied with rates of 0.25 and 0.4 in specific layers, enhancing the model’s generalization ability. Uniform weight assignment, Rectified Linear Unit (ReLU) activation function and a low learning rate of 0.00001 contribute to the network’s stability. The training regimen spans 75 epochs, each involving a batch size of 32 instances. Fig 3 visually presents the comprehensive architecture.
Feature extraction process
The present work employs a Convolutional Neural Network (CNN) to classify Faba bean leaf diseases. The study is conducted within the Jupyter Notebook environment of the Anaconda platform. There are four different classes in the categorization test. The Sequential API from the Keras package is utilized in the construction of the neural architecture. The input shape parameter is set to (224, 224, 3), reflecting the dimensions of the RGB input images, after a Conv2D layer is initialized with 32 filters of 3×3 dimensions. A crucial component of feature extraction is the introduction of non-linearity to the model through the subsequent use of the Rectified Linear Unit (ReLU) activation function. After the convolutional layer, a MaxPooling2D layer with a 2´2 pool size is deliberately included to improve feature abstraction and enable spatial dimension downsampling. This pattern of architecture is repeated twice, this time with extra Conv2D layers that have 64 and 128 filters, respectively, along with max-pooling and ReLU activation. The ability to extract spatial information hidden in the input images and capture complex structure are made possible by these layers. The addition of a flattening layer, which converts the three-dimensional feature maps into a single one-dimensional vector, marks an important change. Then, a dense layer with 128 neurons that uses ReLU activation acts as a feature aggregator, adding a substantial amount of complexity to the model. A Dropout layer with a dropout rate of 0.5 is purposefully inserted into the architecture to fix the overfitting problem. This layer effectively improves the model’s capacity for generalization by methodically deactivating 50% of neurons during the training phase.
Convolution layer
Convolution is a type of specialized linear operation that involves multiplying a filter and an input matrix element by element, followed by a summation at each location in a feature map. A 3´3 kernel is used for the entire input matrix. The output value in the corresponding position of the output tensor, also known as a feature map, is obtained by summing the element-wise products of each kernel element and the corresponding input tensor element at each location. Over a two-dimensional input image (I) and two-dimensional kernel (K), the convolution operation is defined as follows:
Where
m and n = Kernel (K) coordinates.
i and j = Image (I) coordinates.
After every convolution operation with W
output × H
output × D
output, the output size of the convolution layer is calculated as follows:
Where
K = Number of filters applied to the image matrix.
F = Filter size.
P = Number of zero paddings.
S = Number of stride sizes.
Pooling layer
Max-pooling is the most popular type of pooling technique. It reduces the number of parameters in the network and eases computational demands by surveying larger image areas, which lowers the resolution of an output from a convolutional layer. The idea behind max-pooling is that the network recognizes unique elements like edges, curves and circles for a given image. The hypothesis states that more activated pixels have higher values. As a result, max-pooling chooses the most active pixels, forward-propagating these high values while eliminating less active pixels with lower values.
Performance evaluation parameters
a) Accuracy
The ratio of the total number of accurate predictions including both true positives and true negatives to the total number of predictions is one of the most widely used performance evaluation metrics. It usually indicates whether a model is being trained correctly and provides an estimate of its general performance. The following is the accuracy calculation formula:
TP = True positive.
TN = True negative.
FP = False positive.
FN = False negative.
b) Precision
The frequency of accurate model predictions is indicated by this parameter. The calculation involves dividing the total number of positive labels by the number of accurate positive predictions.
c) Recall or sensitivity
It calculates the classifier’s completeness. It is the proportion between the total number of positive reviews in the dataset and the number of correctly predicted positive observations. Discovering the most positive labels is the aim of computing these metrics. The following is the recall calculation formula:
d) F1 Score
An F1-score of 1 indicates perfect performance, while a score of 0 indicates complete failure. The following formula is used to get the F-measure:
Steps used in algorithm
• Collect the color images of the leaf of faba beans.
• Utilize a convolutional neural network (CNN)-based segmentation to generate a mask from the given color image.
• Overlay the original color image with the generated mask to produce a new masked image.
• Divide the masked image into smaller regions known as tiles (Ktiles).
• Classify each tile (Ktiles) from the masked image into the category of “Faba bean.”
• Identify and analyze the classified tiles (Ktiles) to pinpoint regions indicative of a diseased part of the leaf.
• Conclude the process, having completed the steps for disease detection.