Agriculture, as the backbone of global food production, faces extreme difficulties in ensuring the health and productivity of crops. One of the persistent threats is the occurrence of plant diseases, which can lead to substantial yield losses, economic repercussions and implications for food security. In recent years, the intersection of agriculture and cutting-edge technology has given rise to transformative solutions and notably, the integration of machine learning techniques has emerged as a powerful tool in the domain of plant disease detection.
The current approach to plant disease detection involves professionals identifying and detecting plant diseases by simple visual observation. This requires constant plant monitoring and a sizable staff of specialists, both of which are quite expensive when dealing with huge farms. However, in certain nations, farmers lack the necessary resources or even the knowledge that they can consult professionals. Because of this, seeking the advice of specialists may be expensive and time-consuming. Under these circumstances, the recommended method works well for keeping an eye on vast agricultural fields. In addition to being simpler and less expensive, automatic disease diagnosis is achieved by just observing the symptoms on plant leaves. Additionally, this allows machine vision to offer robot guiding, inspection and autonomous process control based on images
(Arivazhagan et al., 2013; Kulkarni and Patil, 2012;
Porwal et al., 2024).
Machine learning, a subset of artificial intelligence, empowers systems to learn patterns and make predictions from data without explicit programming. In the context of agriculture and specifically plant disease detection, machine learning offers a paradigm shift from traditional methods, enabling the development of accurate, efficient and early detection systems (
Cho, 2024). This technological leap is especially critical in the face of evolving plant pathogens and the need for timely intervention to mitigate the impact of diseases on crop health. Plant disease detection using visual means is a more time-consuming, less reliable method that is only applicable in certain regions. Conversely, if an automatic detection approach is employed, it will need less time, effort and accuracy. Brown and yellow patches, both early and late scorch, as well as bacterial, viral and fungal infections, are a few common plant diseases. Image processing is utilized to quantify the diseased region and measure the variation in color of the afflicted area (
Dhaygude and Kumbhar, 2013;
Bashir and Sharma, 2012). A deep convolutional encoder-decoder model created for image segmentation tasks, the SegNet architecture has shown impressive performance in a variety of computer vision applications (
Badrinarayanan et al., 2017). DL technologies have been proposed for the identification of pests and anomalies in plants, such as CNNs and DBNs. The detection and identification of lesions from digital pictures have demonstrated encouraging results with these methods (
Kaur and Sharma, 2021;
Siddiqua et al., 2022; Wang, 2022;
Min et al., 2024).
A comprehensive overview of various aspects, such as plant biology, environmental circumstances and the dynamic interactions between crops and pathogens, is typically necessary due to the complex nature of plant diseases. Machine learning algorithms, spanning from traditional methods to advanced deep learning models, have been shown efficient at detecting these complex structures. Making use of data from various sources such as imaging devices, sensors and genomic information, machine learning enables the development of models that can identify subtle illness indicators that are frequently invisible to the naked eye. This introduction sets the stage for a comprehensive exploration of the role of machine learning in plant disease detection. Through an examination of the details of various machine learning techniques, the incorporation of sophisticated sensors and prospects in the domain, this review seeks to illuminate the revolutionary possibilities of machine learning technologies in maintaining worldwide crop health. The combination of plant disease identification and machine learning represents a breakthrough that might completely alter the field of sustainable and resilient agriculture as it seeks to fulfill the needs of an expanding global population and a changing environment.
Review of literature
There is a lack of thorough research on image-centered plant disease diagnosis due to the evaluation of several papers on machine learning (ML) and deep learning (DL) algorithms utilized in agriculture. Plant Disease Detection (PDD) strategies were found to be necessary in the most recent assessment, which included segmentation, classification, localization and disease procedures.
(Gao et al., 2020). The study conducted by
Noon et al., (2020) focused on assessing the efficacy of the Convolutional Neural Network (CNN) technique in the context of Plant Disease Detection (PDD) for various plant species, including fruits and vegetables.
Ronneberger et al., (2015) presented the U-net architecture, a novel convolutional neural network (CNN) framework designed especially for biomedical image segmentation applications for early disease detection.
Early detection of plant illnesses can reduce the need for defensive pesticides, which are essential for plant growth and defence. For instance, contemporary artificial intelligence (AI) methods like deep learning and machine learning, along with image processing (IP) based disease detection, have been presented to identify and diagnose plant health
(Mohanty et al., 2016; Shah et al., 2022; Picon et al., 2019; Huang et al., 2020). Studies on the CNN technique as it is applied in the PDD by
Abade et al., (2021) and
Hasan et al., (2020) discovered that there is still room for improvement when taking visually similar-sized surroundings into account.
Kalpana et al., (2023) conducted a study to diagnose major foliar diseases in black gram (
Vigna mungo L.) using convolutional neural networks (CNNs). According to
Nagaraju and Chawla (2020), evaluating the
in-situ potential of utilizing several ways is still vital, even in cases where prior research has focused on a few strategies and raised substantial issues. For the categorization of plant leaf diseases,
Ghaiwat and Arora (2014) gave a summary of the many available classification methods. In the current test case, the k-nearest-neighbor approach appears to be the most appropriate and straightforward of all the methods used for class prediction. Choosing the best parameters for SVM can be challenging whenever the data used for training is not linearly different. This is one of the problems with SVM.
Kulkarni et al., (2012) provide a method for the precise and early identification of plant diseases through the use of artificial neural networks (ANN) and additional image processing techniques. The recommended technique, which is based upon an ANN classifier for classifications using a Gabor filter for extraction of features, yields better results, with a rate of recognition of up to 91%.
Kulkarni and Patil, (2012) reported that a classifier utilizing an artificial neural network (ANN) is capable of identifying various plant illnesses through the combination of attributes, hues and textures. The authors provide an efficient technique for detecting illness in
Malus domestica using texture, color and K-mean clustering (
Bashir and Sharma, 2012). It makes use of color and texture characteristics that are often present in both normal and afflicted regions to categorize and identify various types of diseases.
According to
Naikwadi and Amoda, (2013), plant illness is identified by histogram matching. Since plant diseases often manifest as leaf disease, edge detection and color features are used to support vector machines are highly promising AI techniques that have a wide range of applications in the resolution of categorization issues
(Vijayaraghavan et al., 2014; Maltare et al., 2023). Support vector regression (SVR) is the term for the SVM that is used to resolve regression issues. Researchers love SVR because it gives the solution model the opportunity to be more broadly applicable match histograms. One of the main reasons for crop losses in many areas is the appearance of diseases on plantations. The technique for automatically classifying cotton diseases using the extraction of features of foliar symptoms using digital photos is provided by
Bernardes et al., (2013). This technique employs the wavelet transform energy for feature extraction and a support vector machine (SVM) for real classification.
In recent times, several methods have been introduced to acknowledge the growing application of AI in agriculture. Among these methods is the creation of advanced AI models for plant disease diagnosis through the application of deep learning (DL) methods (
Barbedo, 2019;
Chen, 2020). Notwithstanding improvements in AI-based methods, the challenge of identifying diseases of plants in their natural environments remains. Drones are used to monitor plant conditions and track infections. Therefore, unlike digital cameras, these semi-autonomous or semi-reliable aircraft provide robust and dependable vision systems appropriate for a variety of crops.
(Liu et al., 2021).
This work offers a thorough analysis of current approaches for the identification of illnesses and plant diseases, with an emphasis on popular AI and image processing techniques including deep learning and machine learning (ML and DL). Additionally, the study carefully assess the salient characteristics, constraints and advantages of various approaches in practical settings.
The relevant literature was searched up until December 2023 using databases: PubMed and the keywords “Machine learning” and “Plant disease detection.” There were no limitations on language. The information on the test machine learning, artificial intelligence and potential modes of action were evaluated in the papers. Over the past two years, there has been a rise in scientific curiosity in the application of machine learning in the detection of plant diseases. The review’s featured papers were released in the period from 2014 to 2023. 225 papers were published over this period (in the PubMed database).
The following criteria were used to weed out papers:
(i) They were review articles.
(ii) They were brief conference or congress abstracts.
(iii) They were published before 2023.
(iv) They were not full text available. Moreover, the review excludes studies that do not meet the aforementioned criteria.
Sources of information and search methodology
To locate and compile publications relevant to the areas of emphasis of our systematic review, a literature search was carried out using the PubMed database. We selected these online databases because we thought they were relevant to the subject and range of the study. The abstract, title and keywords were the fields that search queries considered. The Boolean operators AND, OR and NOT, as well as a few keywords that were included inside the present topics mentioned in the qualifying criteria, were used to query the computerized archives of scientific papers.
Deep learning-based technology for image recognition
In contrast to alternative image recognition techniques, deep learning-based image recognition technology only requires iterative learning to identify relevant features. This allows it to acquire contextual and global image features, as well as greater recognition accuracy and robustness.
Deep learning (DL)
The idea behind Deep Learning (DL) was first presented in a research article by
Hinton et al., (2006). Deep learning is based on neural networks, which are used for feature learning as well as data analysis. Several hidden layers are used to extract low-level attributes from the data; each hidden layer may be thought of as a perceptron. These low-level attributes are combined to form abstract high-level attributes, which can greatly reduce the local minimum issue. Deep learning, that is gaining increasing attention from academia, mitigates the shortcoming of typical algorithms depending on purposefully constructed traits. Now-a-days, computer vision, natural language processing, speech recognition, recommendation systems and pattern recognition have all successfully incorporated it
(Liu et al., 2017).
Convolutional neural network (CNN)
CNNs, or convolutional neural networks, are network structures with a complicated topology and the ability to execute convolution operations. Within the deep learning sector, CNN is a widely used model. The explanation is that CNN’s fundamental structural features give it a significant edge in picture identification due to its enormous model capacity and complicated information. Simultaneously, CNN’s achievements in computer vision tasks have contributed to the increasing acceptance of deep learning.
Full convolution neural network (FCN)
Picture segmentation based on semantics is based on the full convolution neural network (FCN). Presently, FCN serves as the foundation for nearly all semantic segmentation models. First, FCN extracts and uses convolution to code the features of the input picture. Deconvolution or greater sampling are then used to gradually downsize the characteristic picture to match the size of the input image. Plant disease segmentation techniques fall into three categories: traditional FCN, U-net and SegNet, depending on variations in FCN network design.
·
Conventional FCN
In order to address the issue that standard computer vision is vulnerable to varying lighting and varied backgrounds,
Wang, (2022) introduced a novel approach to maize leaf disease segmentation based on complete convolution neural networks, with segmentation accuracy reaching 96.26.
·
U-Net
U-net is a traditional encoder-decoder structure as well as a traditional FCN structure. The segmentation information is recovered more easily by adding a layer-hopping connection that combines the feature map from the decoding stage with the one from the coding step. A convolutional neural network based on U-net was used by
Lin et al., 2019 to segment 50 cucumber powdery mildew leaves that were collected in their natural habitat.
·
SegNet
Its encoder-decoder structure is also conventional. What makes it unique is that the decoder’s up-sampling procedure uses the largest pooling procedure in the encoder’s index.
(Kerkech et al., 2020) presented an approach for picture segmentation utilizing unmanned aerial vehicles. The following four categories were identified using SegNet: shadows, the ground, healthy and symptomatic grape vines. SegNet was utilized to segment 480 samples from both visible and infrared photos. The recommended approach had detection rates of 92% on grape vines and 87% on leaves.
R-CNN mask
One of the most used picture instance segmentation techniques nowadays is mask R-CNN. It may be viewed as a detection- and segmentation-based network-based multitask learning technique. For instance, when several lesions of the identical type adhere to or overlap, instance segmentation may be used to determine individual lesions as well as count the total amount of lesions. Semantic segmentation, on the other hand, frequently handles several lesions of the identical class at once. A Mask R-CNN model was developed by
Stewart et al., (2019) to segment maize northern leaf blight (NLB) lesions from an unmanned aerial vehicle picture.
The number of papers on machine learning in identifying plant diseases found in PubMed between 2013 and 2023 is displayed in the Table 1. The number of publications increased significantly during the course of the decade, from only 4 in 2015 to 66 in 2023. This implies that the use of machine learning as a technique for plant disease diagnosis is growing in popularity. With thirty publications, 2020 had the most, followed by 2022 with sixty-two and 2023 with sixty-six. The table, taken as a whole, shows how interest in by employing machine learning to plant disease detection is expanding.
Table 2 compiles the statistical information of all articles published annually between 2013 and 2023 about the application of machine learning to plant disease diagnosis. An average of twenty-five publications each year with a twenty-three standard deviation the most common value, or the mode, is 66 and the median is 18. This implies that there is a rightward tilt in the distribution of publications, with some years having much more publications than others. The range, or the variation between the highest and lowest values, is 62, which shows that the number of publications has varied significantly over the years. The true mean number of publications each year lies between 6.76 and 43.24, with a 95% confidence level of 18.24.
The synthesis of existing literature on machine learning (ML) applications in plant disease detection reveals a landscape marked by substantial progress, challenges and future potential. This discussion delves into key themes that emerge from the reviewed studies, highlighting their implications for the field and identifying avenues for further exploration.
Challenges and considerations
· Addressing data quality and diversity challenges is paramount to ensure robustness and generalization of ML models across different crops, regions and environmental conditions.
· The selection of appropriate ML algorithms requires a nuanced understanding of trade-offs between complexity, interpretability and performance, emphasizing the need for tailored approaches based on specific use cases.
· Ensuring the interpretability and explainability of ML models is crucial for building trust among end-users, researchers and policymakers, particularly in decision-critical agricultural scenarios.
· Ethical considerations, including issues of bias, fairness and the equitable distribution of benefits, necessitate careful attention to foster socially responsible applications of ML in plant disease detection.
Future directions
Future directions include advancements in sensor technologies, enhanced interdisciplinary collaboration and the exploration of explainable AI techniques to build trust in ML models. Additionally, the deployment of ML in low-resource settings, integration with precision agriculture and a focus on robustness to climate change are pivotal for ensuring widespread accessibility and applicability. Continued research into edge computing, real-time decision support and disease forecasting holds promise for empowering farmers with instantaneous insights. Furthermore, the incorporation of genomic data, global collaboration and the development of policy frameworks are critical steps towards ensuring the responsible and ethical deployment of ML technologies in agriculture. As we navigate these future directions, a collective commitment to innovation, sustainability and equitable access will propel ML toward realizing its full potential in safeguarding global crop health and ensuring food security for a growing population.