Chief EditorJ. S. Sandhu
Print ISSN 0250-5371
Online ISSN 0976-0571
NAAS Rating 6.80
Impact Factor 0.8 (2023)
Background: Machine learning has shown remarkable promise in recent years for use in areas such as pattern detection and categorization. The diagnosis of diseases is crucial in agriculture since they are a natural occurrence in plants. The easiest and most effective way to identify crop disease is through the use of image processing, computer vision and machine learning techniques.
Methods: To identify and categorize cotton leaf diseases, the study compares the effectiveness of established techniques like Support Vector Machine (SVM) and random forest with state-of-the-art techniques like neural network (CNN) methods and architectures like Inceptionv3, VGG16 and RasNet50 with data augmentation and transfer learning.
Result: The models were trained with four distinct types of plant photos that were manually gathered from a government agency and a farm. It was also noted that as the quantity of training data rose, so performed the resultant models.
Recent discoveries in machine learning, neural networks and computer vision have led to intriguing new developments in a wide variety of industries, including healthcare, transportation, business analytics and agriculture (Bhosale et al., 2023). The sector of agriculture plays an important part in India’s overall economy. The only way to stay up with the ever-changing environment and surroundings is to automate and improve upon tried-and-true procedures and methods of illness detection. This is the only way to keep up with the ever-changing climate and surroundings. It is necessary to make significant advancements in the agricultural industry to meet the issues of demand and supply for the crop that is being targeted. Regular crop monitoring and fast disease identification in the event of a contaminated crop may allow for maximum agricultural production (Jain and Jaidka, 2023). Cotton farmers in India often struggle with issues related to leaf diseases. Bacteria are responsible for the development of gray mildew, whereas fungi are responsible for leaf spot and reddening and viruses are responsible for leaf curl. The production suffers in both quality and quantity as a consequence. Continuous monitoring in agricultural settings helps boost crop yields by facilitating the identification of problems at an early stage and the application of appropriate remedies to those problems. There are a few different approaches to taking plant disease identification tests. Certain diseases do not exhibit any external indications or symptoms; in these instances, it is vital to conduct a comprehensive examination (Shreelakshmi and Raju, 2023).
Machine learning is an area of artificial intelligence that enables computers to improve their predictive abilities without having to be reprogrammed from the ground up. The effect of machine learning is gradually spreading over all different areas of research. Machine learning is something that almost everyone uses regularly without even being aware of it. The creation of software that can train itself is necessary for the continued success of machine learning. To improve its ability to make accurate predictions and judgments in the future, the model first looks for patterns in the datasets or observations that are provided. This indicates that someday, artificial intelligence (AI) may be able to teach themselves and come to their conclusions without the need for human assistance (Ingole and Padole, 2023). There are strategies for machine learning that are either supervised or unsupervised. In this paper, we explore the use of supervised machine-learning techniques (Asha, 2023).
Because they reduce agricultural output and quality, leaf diseases are becoming an increasing issue for farmers. To effectively monitor broad agricultural regions, it is very necessary to create systems that are capable of immediately recognizing and classifying crop leaf diseases the moment they become visible on the leaf (Kanaga et al., 2022).
Cotton and type of leaf diseases
The seeds of plants belonging to the mallow family Malvaceae produce a boll, or protective casing, that contains cotton’s soft, fluffy staple fibre. The percentage of cellulose found in the fibre is rather high. Cotton bolls have the potential to aid in the natural distribution of seeds. The yarn or thread that is produced as a consequence is used to construct textiles that are breathable and lightweight. Cotton has a long history of being used as a textile fibre, as shown by the fact that relics of the cotton fabric have been found dating back to the Indus Valley Civilization, which thrived in the fifth millennium B.C. The most trustworthy sign that crops are sick is the appearance of spots on the leaf. On the lower surface of every cotton leaf lies a small cup-shaped structure, commonly known as a nectar-holding organ. Several different insects are drawn to the plant because of this deposit, as well as the fact that its stem is wet (Malunao et al., 2022). When it comes to destroying a plant, viruses, bacteria and fungi are the three most common agents responsible for doing so. The virus may enter the plant through a lesion that disrupts the plant’s normal development, mechanical damage to the plant or leaf, or a viral disease that causes the plant to become infected. Viruses are the culprits behind diseases such as the cotton leaf curl virus. Bacteria are the cause of diseases that may manifest in a variety of plant parts, such as the stems, leaves, roots and flowers of the plant (Jha et al., 2022). An example of a bacterial sickness is the bacterial blight that may affect plants. Diseases caused by fungi result in the fast spread of the fungus over a whole plant. Fungicides are employed to maintain control of the situation. Grey mildew is a fungal illness. There is a significant problem caused by diseases that affect the cotton crop’s leaves. The following significant ailments were chosen for this research based on the opinions of medical professionals and the availability of relevant data: Alternia leaf, leaf reddening (a fungal disease) and grey mildew are all examples of illnesses caused by bacteria (Bhartya et al., 2022).
The great majority of farmers still rely on the manual approach to identify leaf diseases, however they are often incorrect in their diagnoses. They make diagnoses based on past work experience or information gathered from a local network of agricultural experts. Insecticides may be used before a disease outbreak occurs. The wrong pesticide may have both immediate and long-term effects on plant development and growth (Kumar et al., 2020). Because early causes may be discovered and preventive steps can be taken thanks to continuous monitoring, agricultural productivity has increased. Cotton Leaf Diseases Classification (CLDC) is accurately diagnosed using image processing and machine learning techniques. Disease detection efficiency and accuracy may be improved by using some of the most effective procedures (Annabel et al., 2019).
In the past, researchers have utilized image processing and machine learning to automatically detect leaf diseases in plants. Recently, deep learning has been used for detection and classification in medical imaging and satellite imaging, with promising results; in this study, we adapt this technique for use in agriculture (Korkut et al., 2018).
This article describes in depth the structure of the research approach used in the study of crop leaf disease detection and classification. The study’s methodology was custom-tailored to the subject matter. The researcher explains the study’s rationale and the methodology they chose to use. The instrument utilised for data collection is discussed and the protocols followed to perform this research are mentioned (Ramesh and Vydeki, 2018). Two methods for Leaf Disease Detection and Classification are evaluated here. Both the traditional machine learning methods and the Deep Learning method are employed in the first method. In the deep learning technique, learning through transfer algorithms is also employed. The machine learning strategies for identifying crop leaf diseases and categorizing them are systematically shown in Fig 1. The following components are included in the various methods for identifying leaf diseases and classification (Sarangdhar and Pawar, 2017).
MATERIALS AND METHODS
The absence of easily available data is one of the biggest problems facing researchers today. Most research relies on data that has been obtained and kept secret by the author. The present study effort would not have been possible without the difficulty of data gathering. The information was gathered in September and October from the relevant government agencies and nearby farms. The image data was collected under the oversight of agricultural professionals and in compliance with accepted practices. They are nabbed in September and October when cotton leaf illnesses are at their peak. Both time and perspective were used to capture these images. JPEG and PNG formats were used to save the photographs. Additionally, several pictures came from a reliable government source (Durmus et al., 2017). The following text describes the datasets in depth. Alternia leaf, Gray Mildew, Leaf Reddening and Healthy leaf are the four types of data included in this collection (Fig 2, Fig 3, Fig 4 and Fig 5).
To acquire data in a specified analytic format, an original picture must undergo a sequence of adjustments known as preprocessing. Cotton pictures may vary in brightness due to weather conditions, individual cameras, or other causes. It indicates an uneven distribution of intensity throughout the picture. Therefore, the picture will show the noise. Images of the same leaf taken by the same camera at various times of the year might seem quite different from one another. To ensure that every leaf seemed to have the same brightness, we used intensity normalization. Images are uniformly scaled to match the dimensionality restrictions of the models being used in practice. Then, the illness categories are labeled using the Label Encoder Python module (Ashourloo et al., 2016).
Dataset selection, preprocessing and spitting
• Obtain a large dataset of labeled disease categories images of cotton leaves.
• Verify that the dataset includes representations of a range of conditions and variations in cotton leaf diseases.
• Clean and preprocess the dataset, addressing issues such as noise, outliers and inconsistencies.
• Perform image augmentation to increase the diversity of the dataset, ensuring better model generalization.
• Divide the dataset into training, validation and testing sets to train and evaluate the models effectively.
Python is used for programming because of its wide library and prominence in machine learning.
Machine Learning Libraries and Deep Learning Architectures
• Use Scikit-learn for implementing traditional machine learning models such as Support Vector Machines (SVM) and Random Forest.
• Implement Convolutional Neural Networks (CNN) using TensorFlow and Keras for deep learning.
• Implement and compare deep learning architectures like Inceptionv3, VGG16 and ResNet50 to evaluate their performance.
• Employ Scikit-learn metrics for evaluating model performance, including accuracy, precision, recall and F1-score.
Work flow chart of algorithm
The procedure follows for results evaluation is presented in Fig 6.
Before the classification models analyze the photos, they are converted into NumPy arrays to normalize the RGB values. One of the most typical issues with picture data is the presence of irregularities within the dataset. Some are the wrong size or shape, some are rectangular instead of square and so on. Overfitting occurs often because of the abundance of data in the training set. With the Image Data Generator pre-processing module in Keras, we can solve these issues by augmenting the training set data with synthetic pictures to boost the model’s classification accuracy. In addition to the enhancement, this study uses batch normalization and dropout layers on the CNN model to improve validation accuracy on the test set (Dutta et al., 2014; Kalpana et al., 2023). In this research, we utilise the following settings from the Keras Image Data Generator class to enhance our data
• Rotation range: Used to rotate the loaded image by the number of degrees specified
• Width shift range: Shift the image down the horizontal axis, with common values ranging from 0 to 1.
• Height shift range: Move the image up and down the vertical axis, with typical values ranging from 0 to 1.
• Shear range: By fixing one axis and stretching the image angle according to the defined shear angle, the image is slanted.
• Zoom range: Zooms in on specific portions of an image at random, allowing algorithms to better train on those highlighted features.
• Horizontal flip: Used to horizontally flip an image. Very handy for data generalization.
• Fill mode: Points in the image with null pixel values will be filled.
In this research, we apply the aforementioned image augmentation method to improve the generalization of the neural network’s performance on the training dataset over more conventional machine learning techniques, such as transfer training and convolutional neural network training. In this research, we apply the aforementioned image augmentation method to improve the generalization of the neural network’s performance on the training dataset over more conventional machine learning techniques, such as transfer learning and convolutional neural networks. For the convolutional neural network method, the only modification made to the test dataset is batch normalization (Bhosale et al., 2023).
Integration of ML and TL techniques
Several models were developed and compared against one another to see which one was the most effective. Two distinct techniques were used to develop the models. The initial strategy was to build models from scratch, which entailed carrying out procedures like segmentation and feature extraction before using SVM, RF and training them using just research data. The second approach included developing deep learning models with numerous layers, such as Inception v3, VGG16 and ResNet, all of which were descended from CNN. These leverage pre-trained weights from an image net dataset and use transfer learning methods (Jain and Jaidka, 2023). To analyze the data, just the last few layers were retrained. In addition, several experiments were done to fine-tune the parameters of the models that were adopted to find the most effective model for detecting and labeling cotton illnesses examination (Shreelakshmi and Raju, 2023).
Process and evaluation method
Since this study is concerned with classifying data, accuracy is one of the metrics used to assess the effectiveness of the model. Comparisons are made between models developed from scratch and those developed via transfer learning utilizing F-1 score and precision, recall scores. What each of these indicators entails for the present study is briefly discussed below. A true positive (TP) occurs when the value of the observed event matches the value predicted by the model. A false positive (FP) occurs when the observed value of an event contradicts the negative prediction. A true negative (TN) is defined as an occurrence for which both the observed and expected values are negative. The observed event value is negative, while the anticipated value is positive; this is a false negative (FN) (Ingole and Padole, 2023). The tuples that were successfully labeled as positive by the classifier are called true positives (TP), while the corresponding tuples that were correctly labeled as negative are called true negatives (TN). Incorrectly classified negative tuples are known as false positives (FP). In a similar vein, false negatives (FN) refer to mislabeled positive tuples. The method used to get the accuracy ratings in Table 1 may be seen above.
Among supervised machine learning methods, random forest stands out. It is a popular method because it is effective, easy to use and flexible. Its nonlinear character and the fact that it can do classification and regression tasks make it very adaptable to many kinds of data and situations. Named a “forest” because of the abundance of “decision trees” it includes. The information from various trees is then blended to provide the most accurate forecasts. The forest guarantees a more precise answer than a single decision tree since it considers a greater number of groups and choices. Furthermore, it introduces uncertainty into the model by picking the optimal feature from a pool of features chosen at random. Overall, these benefits lead to a model with a great deal of variation (Asha, 2023).
Support vector machine
It is an application of the supervised learning strategy, which may be used for both regression and classification issues. The primary goal of this technique is to locate a hyperplane that effectively partitions the characteristics of the various classes. The method aims to maximize the distance between the data points and the hyperplane, or support vectors, by selecting the optimum line. It employs soft margins and kernel methods, with polynomial and RBF kernels being the most common, to improve outcomes if the classes are not separated by linearity, as is the case with maize disease classification (Kanaga et al., 2022).
By initially training a neural network model on a subject that is analogous to the situation at hand, transfer learning is a deep learning approach. A freshly trained model incorporates some or all of the learned model’s layers. Transfer learning helps since it shortens the time it takes to train a model and decreases the amount of computing power needed because the network is already trained. By adding a few additional thick layers to the previously trained network, it may learn to recognize photos from a different dataset. Utilizing previously learned features and weights that have been trained by previously trained models, the VGG16, ResNet50 and Inception-v3 transfer learning models are applied to a cotton plant leaf image dataset (Malunao et al., 2022).
Convolutional neural networks (CNNs) like the VGG16 are widely used but also quite simple (Fig 7). There are a total of 16 weighted layers in this network. Only 3×3 convolutional layers are used in this deep learning architecture, with the number of filters increasing with layer depth. In addition to the 5 layers and the softmax classifier on the two fully connected layers containing 4096 neurons, max pooling is utilized to minimize the volume size. Therefore, in this research, we use VGG architectures for image identification in the diagnosis of leaf diseases in plants (Wang, 2022).
ResNet was described as a game-changing strategy for creating deep neural network models by stacking a huge number of residual blocks without increasing the number of parameters or the degree of complexity of the computation. To train the ResNet-50 (residual neural network), a variation of the ResNet with 50 deep layers, at least one million photos from the ImageNet collection were employed. Most ResNet models bypass layers that have nonlinearities since they utilize ReLU and batch normalization. To avoid using weights, the well-known HighwayNet model employs a second weight matrix. The architecture that ResNet-50 adopts is known as convolutional block sequencing with average pooling. As the last stage of classification, Softmax is used. There are a total of five convolutional layers in ResNet-50; they are labeled conv1, conv2 x, conv3 x, conv4 x and conv5 x. In the first stage of processing (conv1 layer), the input picture is passed through a convolutional layer with 64 filters and a 7 × 7 kernel size. A max-pooling layer (conv2 layer) with a stride length of 2 then processes it and lastly, a pooling layer (conv3 layer) with the same stride length completes the chain. Due to the interconnected nature of residual network topologies, the conv2x method pairs off the layers. layers with kernel sizes of 3 × 3 and 256 filters, respectively; a further layer with kernel sizes of 3 ´ 3 and 64 filters, repeated three times; and a final layer with kernel sizes of 3 × 3 and 256 filters, respectively. These layers correlate to the layers that are positioned between the pool with the first layer (Kanaga et al., 2022).
The Inception-v3 model is the newest generation of the Inception microarchitecture, occasionally referred to as GoogLeNet or Going Deeper with Convolutions (Fig 8). The Inception module’s purpose is to lower the model’s computational cost by breaking down big filter dimensions into smaller convolutions and conducting aggressive regularization via labeling smoothing. Fig 8 shows that the computational cost of a 5×5 convolutions filter is 25/9=2.78 times that of a layer of 3×3 convolution, hence decreasing the number of parameters by 28% may be achieved by utilizing two layers of 3×3 filters (3*3+3*3=18) (Malunao et al., 2022).
RESULTS AND DISCUSSION
Support vector machine (SVM) results
A Support Vector Machine, or SVM, classification model is developed and then tested using Stratified K-Fold Cross-Validation for the four types of cotton leaves in the dataset. Stratified cross-validation with k-folds prepares input information for the classification model by dividing the data into training and testing sets. The RGB images, Lab colour space and HSV colour space are all used independently in this study (Annabel et al., 2019). There are four efficiency metrics used to evaluate the results. Table 2 includes performance indicators such as accuracy, precision, recall and f1 score. HSV testing yields higher overall categorization accuracy than either Lab or RGB images. Lab color representation stands out with the highest values in Accuracy (90.2%), Precision (90.6%), Recall (91.5%) and F1 Score (85.3%). The correctness of classification and f1 score are both higher for Lab and HSV-converted photos than they are for RGB images, as shown in Table 2.
Table 3’s validation results show that SVM’s 90% prediction accuracy is superior to that of Random Forest’s 86%.
Convolutional neural network (CNN) architecture
To begin, we used the convolution layer, max pooling and other pre-existing layers in the deep learning package Keras to construct a fundamental CNN in the Deep Learning Network. The accuracy of a custom-made, multi-layer CNN was evaluated. In all, 25 training iterations and 11 layers of convolution with filter sizes of 32, 64, 128 and 256 are used to create this model from scratch (Korkut et al., 2018). The optimal rate of learning and batch size may be found by a battery of experiments that fine-tune the hyperparameters. Throughout 25 iterations, the model attained an accuracy of 85.31 percent using a learning rate of 0.0001 and a batch size of 32. The F1-score that was calculated is 83.75 as well. It also shows the created model’s confusion matrix, which compares the observed and expected class counts. Fig 9 illustrates the confusion matrix. The CNN model’s Training and validation accuracy is shown in Fig 10, while the CNN model’s Training and validation loss is depicted in Fig 11.
Table 4 presents the metric related to the model performance. The Convolutional Neural Network (CNN) model demonstrated strong performance on the task, with an accuracy of 85.31%, precision of 83.5%, recall of 85.25% and an F1 Score of 83.75%. The precision indicates that 83.5% of the positive predictions made by the CNN were accurate, while the recall value of 85.25% signifies the model’s ability to capture the majority of positive instances. The F1 Score, a harmonic mean of precision and recall, reflects a balanced trade-off between the two. Overall, the high accuracy and well-balanced precision, recall and F1 Score values suggest that CNN is effective in accurately classifying instances.
Comparatively, the SVM model, particularly using Lab and HSV color representations, demonstrated slightly higher accuracy than the CNN model (90.2% vs. 85.31%). On the other hand, the CNN model demonstrated a similar compromise between recall and precision, indicating successful instance classification. Both models have their advantages; the CNN performs well in identifying intricate patterns from the images, while the SVM is particularly effective in some color spaces. Depending on the particulars and demands of the task at hand, one may choose between the two.
CONSENT FOR PUBLICATION
- Annabel, L.S.P., Annapoorani, T., Deepalakshmi, P. (2019). Machine learning for plant leaf disease detection and classification- a review. International Conference on Communication and Signal Processing (ICCSP), Chennai, India. 0538-0542.
- Asha, V. (2023). An enhanced deep learning algorithms for image recognition and plant leaf disease detection. Second International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India. 611-615.
- Ashourloo, D., Aghighi, H., Matkan, A.A., Mobasheri, M.R., Rad, A.M. (2016). An investigation into machine learning regression techniques for the leaf rust disease detection using hyperspectral measurement. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing. 9: 4344-4351.
- Bhartiya, V.P., Janghel, R.R., Rathore, Y.K. (2022). Rice leaf disease prediction using machine learning. Second International Conference on Power, Control and Computing Technologies (ICPC2T), Raipur, India. 1-5.
- Bhosale, Y.H., Zanwar, S.R., Ali, S.S., Vaidya, N.S., Auti, R.A., Patil, D.H. (2023). Multi-plant and multi-crop leaf disease detection and classification using deep neural networks, machine learning, image processing with precision agriculture- A review. International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India. 1-7.
- Durmuº, H., Güneº, E.O., Kýrcý, M. (2017). Disease detection on the leaves of the tomato plants by using deep learning. 6th International Conference on Agro-Geoinformatics, Fairfax, VA, USA, 1-5.
- Dutta, R., Smith, D., Shu, Y., Liu, Q., Doust, P., Heidrich, S. (2014). Salad leaf disease detection using machine learning based hyper spectral sensing. SENSORS, IEEE, Valencia, Spain. 511-514.
- Ingole, K., Padole, D. (2023). Design approaches for internet of things based system model for agricultural applications. 11th International Conference on Emerging Trends in Engineering and Technology-Signal and Information Processing (ICETET - SIP), Nagpur, India. 1-5.
- Jain, S., Jaidka, P. (2023). Mango Leaf disease Classification using deep learning hybrid model. International Conference on Power, Instrumentation, Energy and Control (PIECON), Aligarh, India. 1-6.
- Jha, A., Purohit, M., Maurya, V., Tripathy, A.K. (2022). Plant leaf disease detection and classification based on machine learning model. IEEE Bombay Section Signature Conference (IBSSC), Mumbai, India. 1-5.
- Kalpana, M., Karthiba, L., Senguttuvan, K., and Parimalarangan, R. (2023). Diagnosis of major foliar diseases in black gram (Vigna mungo L.) using convolution neural network (CNN). Legume Research. 46(7): 940-945. https://doi.org/10.18805/LR-5083.
- Kanaga, P.P, Ashwini, J., Anushalini, R., Divya, G. (2022). Image segmentation enhanced with machine learning techniques for automatic detection of plant leaf diseases. 8th International Conference on Smart Structures and Systems (ICSSS), Chennai, India. 1-6.
- Korkut, U.B., Göktürk, Ö.B., Yildiz, O. (2018). Detection of Plant Diseases by Machine Learning. 26th Signal Processing and Communications Applications Conference (SIU), Izmir, Turkey. 1-4.
- Kumar, S., Prasad, K., Srilekha, A., Suman, T., Rao, B.P., Krishna, J.N.V. (2020). Leaf disease detection and classification based on machine learning. International Conference on Smart Technologies in Computing, Electrical and Electronics (ICSTCEE), Bengaluru, India. 361-365.
- Malunao, D.C., Tamargo, R.S., Sandil, R.C., Cunanan, C.F., Merin, J.V., Jallorina, R.D. (2022). Deep convolutional neural networks-based machine vision system for detecting tomato leaf disease. IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT), Bangalore, India. 1-5.
- Prabavathy, K., Bharath, M., Sanjayratnam, K., Reddy, N.S.S.R., Reddy, M.S. (2023). Plant leaf disease detection using machine learning. 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), Salem, India. 378-382.
- Ramesh S., Vydeki, D. (2018). Rice blast disease detection and classification using machine learning algorithm. 2nd International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE), Ghaziabad, India. 255-259.
- Sarangdhar, A.A., Pawar, V.R. (2017). Machine learning regression technique for cotton leaf disease detection and controlling using IoT. International conference of Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India. 449-454.
- Shreelakshmi, C.M., Raju, C. (2023). Revolutionizing crop management: An emphasis on ginger leaf disease detection techniques using machine learning and iot. International Conference on Data Science and Network Security (ICDSNS), Tiptur, India. 1-5.
- Wang, H. (2022). Intelligent identification of logging cuttings based on deep learning. Energy Reports. 8: 1-7.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.
This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.