Image dataset
For the CNN model, a well-labeled, fine dataset of animal images with normal and lumpy skin disease was collected and divided into two groups representing the diseased and normal conditions. The dataset contained 1,023 images, including different categories such as age, breed and severity of disease (Table 1). Data for this study was collected with the help of specialists in the field of cattle farming. Specifically, the images in this collection included both LSD and non-LSD skin images and a sufficient number of images were kept in each category. The lumps or nodules on the skin of the animal were the main clinical indications for this disease (Fig 1). These could also be seen in the skin, cups and mucous membranes and they could range in size
(Datten et al., 2023). The dataset was split into a training set and a validation set with an 80:20 ratio, where 80% of the data was used for training and 20% was used for validation.
Data pre-processing
The first step in the image-based ML technique involves the input of a dataset, known as the pre-processing stage, which helps in the optimal preparation of images by removing unwanted noise
(Salvi et al., 2021). To keep images identical, they were resized to 256 × 256 pixels. This is important because the model needs uniformly sized input images to analyze data as efficiently as possible. This was followed by normalization where each image is transformed into a set of pixel values that align more closely with familiar or standard ranges
(Hosakoti et al., 2021). For this, each pixel value in an image was divided by 255, the maximum value for its bit-depth, to ensure that the pixel values range between 0 and 1
(Johnson et al., 2019). Image normalization is a common procedure in image processing that alters the range of pixel intensity values. Improved integration during the model’s training is made possible by this normalization. Each picture was then labelled with the appropriate health category. These training and test datasets allowed for the identification of two classes: LSD and non-LSD skin images.
Data augmentation
The data augmentation technique is used to generate new images based on the existing dataset to increase the diversity and variability of the dataset. Images may be shifted, rotated randomly, zoomed in and flipped vertically or horizontally during this augmentation process. The employment of data augmentation is essential in increasing the model’s capacity to generalize effectively to unknown data while simultaneously reducing overfitting problems.
Model selection
Sequential CNN, as the pre-trained model for the classification of images, is utilized. For large dataset classification tasks, the CNN architecture has shown excellent performance. It easily identifies complex details in images.
Transfer learning
This method utilizes the trained CNN model as a feature extractor to take advantage of transfer learning. In this, the first layers’ weights are frozen and only the subsequent layers’ weights are adjusted to make them particular to the lumpy skin disease dataset. This methodology takes advantage of the depth of knowledge that the model has gained from a large dataset while conserving training time and processing resources. The procedure followed for the CNN model is presented in Fig 2.
Model architecture
The defined CNN model is designed for binary classification with two classes. The model architecture included six convolutional layers (Conv2D) with increasing filter sizes (32, 64) and kernel sizes (3×3), followed by max-pooling layers (MaxPooling2D) for spatial downsampling. After flattening the feature maps, the network runs data via a dense layer (Dense) activated by ReLU and 64 units. The output probabilities for each class are then generated by a dense layer using softmax activation. The architecture focuses on obtaining features through convolutional layers and records non-linear correlations in fully connected layers, following a standard pattern for image classification. Certain hyperparameters, such as layer depths and filter sizes, may require fine-tuning depending on the dataset’s properties. The important architecture and training parameters of the CNN used in this study are listed in Table 2.
Six convolutional layers build up the sequential CNN architecture and each one uses 32 or 64 filters to extract complex characteristics from the input data. Six max-pooling layers are added to the design to efficiently down-sample spatial dimensions. A regularization strategy called dropout is employed strategically at a rate of 0.5 to improve the model’s performance for generalization. The network is stable because of its uniform weight assignment, Rectified Linear Unit (ReLU) activation function and modest learning rate of 0.0001. The training protocol consists of 75 epochs with a batch size of 32 examples. Fig 3 shows the sketch of the overall architecture.
The process for the CNN model can be briefly described as follows. It involves processing the input image denoted as (x, y), with dimensions N × M, where (x, y) belongs to the set of real numbers. Equation 1 is used to compute the histogram of the image.
hf (k) = 0j (1)
Where
h
f(k) represents the histogram of an image.
f is the frequency of events.
O
j [j = 1, 2, 3…. (k-1)] = Events of grayscales.
Using the above formula, the range of infected images are given using equation 2:
~hf (k) = hf (k)[Ij] k1, kn (2)
Where,
J = Pixel values.
I = Affected area.
k
1 to k
n = Range of the infected region.
~h
f(k) = Complete infected region.
The total image variable negatives are determined using equation 3 and 4.
The weight matrix and bias matrix of the convolutional layer are shown in equation 5 as follows:
Where
W = Weight matrix of the I
th layer.
b = Bias matrix of the I
th layer.
S = Properties of the first convolutional layer.
x, y = An enhanced image.
Next, the ReLu activation layer was added. The filter size of the subsequent convolutional layer was [3, 3], the stride was [1, 1] and there were 64 channels and 64 filters. ReLU activation function was used to normalize this layer’s properties. A max-pooling layer with a filter size of [2, 2] and a stride of [2, 2] was then applied. The process is compiled using six convolutional and maxpooling layers, dropout, flatten and dense layer.
Evaluation matrices
To evaluate the model’s performance, various metrics, including F1-score, Accuracy, Recall and Precision are determined. Based on the quantity of true positive (TP), true negative (TN), false positive (FP) and false negative (FN) samples, these metrics are calculated.
Accuracy
The ratio of correctly identified tests (positive or negative) to the total number of samples is known as accuracy.
Precision
Precision, which is defined as the ratio of correct positive predictions to all positive predictions made, is a statistic used to quantify the number of correct positive predictions built.
Recall
The amount of accurate positive predictions made out of all possible positive predictions is measured using a metric called recall.
F1 -Score
A single metric called the F-score is utilized to sum up model performance. When one of the matrices is high and the other is low, it can be utilized to balance out the model’s performance.