The study focused to utilise transfer learning techniques to improve grape leaf disease classification accuracy more efficiently. Fig 1 shows images of healthy and disease affected grape leaf’s. Transfer learning models which are good for image recognition with limited dataset have been chosen for the study, Cho O.H, 2024. These techniques are memory efficient and take less computational time. The architecture of transfer learning models namely, VGG16 model, VGG19, DenseNet121, InceptionV3 and ResNet50V2 is studied as follows:
VGG16 model
VGG 16 model is known for its low complexity and efficacy. Fig 2 shows architecture of VGG16 model. Convolutional regions are the main component of VGG16, which is accompanied by layers that maximize pooling. Such layering techniques use minor (3×3) screens layered on several levels to generate detailed illustrations of the provided visuals.
Mathematically, the completely linked layers operate as follows:
Z=f(WX+b) ..........(1)
Where,
X= Characteristic vector from the preceding layers.
W= Weight matrix.
Z= Resultant classification values.
The values are subjected to irregular stimulation by the parameter f, allowing the system to decide on categorization.
VGG 19 Model
The framework has 19 layers, including 16 feature retrieval convolutional phases and three layers fully interconnected that classify images into different types of disease
(Assad et al., 2023).
The mathematical illustration of the steps within the convolutional stages is as follows:
Y= W* X+b ..........(2)
Y= Resultant feature map.
W= Filter weight.
X= Input image.
b= Bias term.
This method can be illustrated as:
Z=f(WX+b) ..........(3)
Where,
X= Feature, vector from the preceding layers.
Z= Resultant category values.
To classfiy the results, the funcation
f uses a non-linear mode. Because of its performance in identifying image operations, the VGG19 algorithm is the preferred option for grape disease foresight, as shown in Fig 3
(Rudenko et al., 2023).
DenseNet121 model
The DenseNet121 design, which is known for its unusually dense connection sequence, is used in our approach. In the area of grape disease prediction utilizing DenseNet121, we have a dataset of 2207 pictures, each indicating the health status of grape leaves.
Mathematically, the computations within the dense blocks can be represented as:
Xi+1= H [Conv1 (BN (Relu (Conv1(Xi))))] ...........(4)
Here,
X
i= Feature maps at the i
th layer.
BN= Batch normalization.
Conv
1= 3×3 convolution operation.
Relu signifies the activation function of the corrected quadratic unit.
H demonstrates the concatenation operation.
The uploaded picture is categorized into one of a thousand groups in the final fully linked layer. For picture categorization applications, especially when there is a lack of training data, DenseNet121 is a reliable and effective framework, as shown in Fig 4.
Inception V3 model
We use the InceptionV3 architecture, as indicated in Fig 5 For grape disease identification, a system is built of sections, each including several convolutional layers with various kernel sizes. Rather than depending on a single huge convolution, InceptionV3 leverages two limited convolutions. For instance, this method converts a single 7×7 convolution into two 3´3 convolutions.
The softmax layer can be described statistically as follows:
..........(5)
Where,
P(class=i)= Likelihood that the supplied image belongs to class i.
z
i= Raw score for class i and the denominator is the total of all classes.
The versatility and efficacy of the InceptionV3 architecture
Agarwal et al., (2019) make it a potential choice for the exact estimation of grape diseases.
ResNet50V2 model
Fig 6 depicts the ResNet50V2 design, which is divided into 4 phases, each of which has several residual blocks. The first phase involves a single residual block with a 7×7 convolutional kernel and then includes a 3×3 max pooling layer. Prior phases each consist of 3 residual blocks with 3×3 convolution kernels.
The ResNet50V2 residual block architecture will be described below:
X(i+1)= F(Xi , Wi)+ Xi ..........(6)
Where,
Xi and X(i+1)= Parameter maps of the input and output, correspondingly.
Wi= Weights of the layers of convolution within the block.
Data pre-processing
In the present analysis, a dataset of 2,207 visuals that had been divided into two categories shown in Fig 7, "healthy" and "disease" was used. The research started by integrating the necessary software tools and modules, containing recognized packages like TensorFlow, to make the procedure simpler.