India is the seventh-largest nation geographically, occupying 328 million hectares (Mha) area with 2.4% land area of the world
(Bhattacharyya et al., 2015). To feed the ever-increasing population, India has increased 6.2 times the food grain production of 314 million tons (Mt) from a meagre of 51 Mt in 1950-51. Moreover, the production of fruits and vegetables increased 13.3 times,
i.e. from 25 to 333 Mt during the same period
(Pathak et al., 2022). Providing safe and healthy food to 600 million Indians expected to reside in urban areas by 2030 is the biggest challenge as land, water and air, the important natural resources, are continuously deteriorating in nature
(Hoegh-Guldberg et al., 2018). Diseases inception in plants may seriously affect the growth of the crop. Various infectious agents,
i.e. pathogens like fungi, viruses, bacteria, nematodes, protozoa, prokaryotes and eukaryotes are causative agents for disease in plants (
Abdulkhair and Alghuthaymi, 2016). Pathogen-generated diseases can migrate from one plant to another and destroy plant cell tissues, roots, leaves, stems and fruit
(Savary et al., 2012). The diseases can cause significant loss of total agricultural production (
Goel and Nagpal, 2023).
Farmers rely on traditional ways of determining agricultural disease by visually inspecting the crops
(Pham et al., 2020). Sometimes, agricultural specialists are also unable to recognize the specific disease in any plant. The reasons were the high degree of complexity, variety of cultivated plants and conventional techniques involved in symptomatic identification of the plant leave ailment, which led to the selection of wrong protection treatments for plants (
Ferentinos, 2018). Conventional scientific methods consist of gathering samples from the field and chemically analysing them, which are labour-intensive, time-consuming and have a limited scope of disease detection in plants
(Kang et al., 2023). The earliest detection of illnesses in the plants is the way forward to prevent crop damage and increase the quality and productivity of the crops. For effective control of the ailment, it is crucial to utilize accurate and reliable ways to assess the severity of diseases in plants. Artificial intelligence (AI) and computer vision have been extensively employed to recognize plant leaf diseases, classify plant species and find the severity of plant diseases. Continuous development in electronic device hardware performance and computer imaging technology in recent years enables the scientific community to develop modern tools for disease diagnoses in agriculture
(Liang et al., 2019). Machine learning (ML) and deep learning (DL) approaches are progressively employed to recognize crop ailment from imaging data, incorporating feature extraction with classification. The illness classification and recognition strategies aim to assist non-expert users, such as those who are neither pathologists nor botanists. These techniques have been extensively utilized to detect diseases like early and late blight, scrab, black rot, powdery mildew, bacterial blight, brown leaf spot and leaf rust in fruit, vegetable and cereal crops using ML and DL models
(Joseph et al., 2024; Patil and More, 2025;
Shafik et al., 2024; Mehta et al., 2025).
These technological developments in computer vision have the potential to decrease labour costs, minimize time wastage and upgrade the quality of crops and yield overall. Thus, this systematic review aims to analyze the existing methods to detect and classify plant ailments by applying numerous ML and DL architectures. It also examined different optimization and classification techniques for datasets, accuracy, difficulties and outcomes. It can help farmers plan appropriate and proper treatments and researchers can create more dependable prediction systems. Additionally, this review will assist academics in determining how learning algorithms might be tailored to fit more complex models, making disease prediction more accurate for use in agriculture. In light of this, recommendations and suggestions for the future are also offered, which may help researchers working in this context to make progress in forecasting techniques.
Contribution and organization of the paper
This study provides an in-depth comparison of current research works applying AI-based methodologies. The majority of the methods in the various papers have significant results and are based on the implementation of ML and DL frameworks. It includes the Analysis of various crop illnesses, distinct augmentation techniques utilized to increase dataset, various features extracted and feature extraction techniques, crop disease recognition and classification utilizing DL and ML architectures. It also includes challenges corresponding to recognize and categorising plant illnesses in AI based methods. Following the introduction, the paper is arranged as follows. The research methodology and approaches used to select the literature are discussed in section 2. Section 3 discussed the challenges faced by researchers in conducting research. The conclusion of this paper and potential future directions are covered in Section 4.
The PRISMA guidelines are followed in this systematic literature review utilizing keywords such as plant disease detection, crop disease recognition and image processing using ML and DL. Fig 1 shows the many screening phases of the PRISMA flowchart, which guides the process of choosing from the collected papers. This review evaluates the relevance of 45 selected papers on crop disease recognition and categorization using deep learning and machine learning architectures. The papers are empirically investigated, with results tabulated and graphed to answer study questions.
Literature review
A comprehensive evaluation process was carried out by keeping an eye on the current models to address the following research questions formulated to write a systematic review.
RQ1: What are the most prevalent diseases which impact various plants?
RQ2: Which methods are employed for data augmentation?
RQ3: Which feature extraction techniques are employed in plant disease detection?
RQ4: Which ML and DL techniques are utilized to identify crop diseases?
RQ5: Which datasets are frequently employed to identify and categorize plant ailments?
RQ6: Which challenges are faced by the researchers in the classification process?
Analysis of various diseases detected in plants
Timely, fast, accurate and automatic recognition of plant diseases is critically required to reduce the horizontal spread of crop infections and the associated yield and economic losses. Utilizing modern image analysis tools to identify plant diseases automatically will also lower the labour costs of closely monitoring crops for potential infections. Using leaves images, three different kinds of diseases were identified in lady finger,
i.e. yellow mosaic vein, powdery mildew and leaf spot
(Sahithya et al., 2019). Different leaf and fruit diseases such as whitefly, rust, algal leaf spot, fruit canker, fruit rot and anthracnose in guava were identified by
(Howlader et al., 2019; Farhan Al Haque et al., 2019). Images of mango leaves were utilized to recognize three distinct diseases,
i.e. anthracnose, gall midge and powdery mildew
(Pham et al., 2020). (Abbas et al., 2021) utilized DenseNet121, a DL tool for diagnosing nine kinds of diseases yellow leaf curl virus, mosaic virus, two-spotted spider mite, septoria leaf spot, target spot, late blight, mosaic virus, leaf mould and early blight from leaf photos of tomato.
(Kaur et al., 2022) utilized CNN to recognise black measles, mosaic virus and leaf blight in grape plants using transfer learning and EfficientNet B7. A model was proposed by (
Datta and Gupta, 2023) to recognize gray blight, brown blight, algal spot, red spot and helopeltis in tea leaves. Using images
(Singh et al., 2024) performed a comparative analysis of various DL approaches to detect late blight and early blight in potato.
(Joseph et al., 2024) provided a methodology for diagnosing four fungal illnesses of wheat, four fungal illnesses of maize, two fungal and bacterial illnesses of rice. Research on diseases in crops like wheat, grapes, potato, tomato, guava, mango, maize, rice and tea highlights advancements in agricultural disease detection.
Analysis of various features and feature extraction techniques
Features are essential to recognise patterns since they assist in item description. To create feature vectors, features recommended by specialists are extracted from the picture. Texture, shape and colour are the three different feature categories that (
Es-saady et al., 2016) identified. The colour histogram, colour moments (mean, skewness and standard de
viation) and colour structure descriptor were used to extract color features, whereas texture features were obtained by applying Grey Level Co-occurence Matrix. Shape features were complexity, circularity, area and perimeter.
(Kumari et al., 2019) utilized K-means clustering to extract different features, whereas
(Pham et al., 2020) used CLAHE and Wrapper techniques to find Area, Eccentricity, Convex area, Mean, Standard de
viation, Kurtosis, Skewness, Contrast and homogeneity. Scale-invariant feature transform was used by
(Chouhan et al., 2021) to extract features. Texture and colour features are the two primary categories of information retrieved from the pictures (
Jain and Dharavath, 2023).
The colour moment equations mean, skewness, standard de
viation and kurtosis were applied to determine colour characteristics and texture features were retrieved using GLCM. More recent studies
(Aboelenin et al., 2025) applied DL model Inception-V3, DenseNet20 and VGG16 for feature extraction in apple and corn crops.
(Chavan et al., 2025) extensively used the important features extraction based on color, shape and texture using LGXP (Local Gabor XOR pattern) technique for detailed classification of crops and identification of diseases in crops. The summary of various feature extraction techniques is given in Table 1.
Accurate feature extraction is crucial for early illness recognition, reducing manual inspections and promoting sustainable farming practices. Techniques like color histograms, color moments, GLCM, LBP, HOG and SIFT help identify disease-related structural abnormalities in plant images. Deep learning techniques have revolutionized feature extraction by automatically learning and retrieving features from unprocessed picture data, reducing crop loss and optimizing disease management. These methods help identify patterns of discolouration, texture features and disease-related structural abnormalities.
Analysis of various augmentation techniques
Data augmentation is an approach to enlarging the dataset by utilising distinct approaches like rotation, scaling, shearing, flipping, cropping and zooming to reduce overfitting and improve effectiveness. Data augmentation helps to expand the quantity of positive and negative instances for contrastive learning, which improves the effect of contrastive learning. This also makes a limited amount of data generate value similar to greater data without increasing the data (
Shorten and Khoshgoftaar, 2019). In order to reduce the problem of overfitting, (
Farhan Al Haque et al., 2019) used several augmentation methods, including horizontal flipping, nearest fill, width and height shifting, zooming, shearing and rotation. To enhance the quantity of photos,
(Zhang et al., 2019) employed geometric and intensity adjustments. Five methods were employed for the intensity changes: brightness enhancement, PCA jittering, colour jittering, blur (radial) and contrast enhancement. Pictures underwent horizontal and vertical geometric modifications, including enlargement, cropping, rotation and flipping. To enhance the number of leaf images of the lady finger crop, (
Selvam and Kavitha, 2020) carried out distinct augmentation operations: zooming, rotation, shearing, horizontal flipping and shifting of width and height (
Akshai and Anitha, 2021) applied various augmentation methods, including zooming, rotation and shifting, to enlarge the dataset and minimize the issue of overfitting.
The SMOTE (synthetic minority oversampling approach) was employed by
(Divakar et al., 2021) to increase the quantity of images, which is a statistical way to enlarge the dataset in a balanced way. To expand the dataset size and decrease overfitting,
(Vallabhajosyula et al., 2022) used four distinct augmentation techniques,
i.e. scaling (resizing), rotation, translation and picture enhancement. The size of dataset was increased from 55,448 pictures to 234,008 pictures
(Pandian et al., 2022) by applying position and colour augmentation, principal component analysis (PCA), deep convolutional generative adversarial network (DCGAN) and neural style transfer.
(Joseph et al., 2024) applied different augmentation approaches like vertical flipping, horizontal flipping, rotation of images by 90 degrees, Shearing and Zooming by 0.2, width and height Shifting by the range of 0.2 and Brightness enhancement (0.2-0.8) for expanding the dataset from 1500 images to 25000 images.
(Ashurov et al., 2025) employed luminance adjustment, flipping and rotation techniques to enhance the dataset size. The summary of different augmentation techniques is given in Table 2. The datasets diversity is enhanced through augmentation techniques like random changes in object orientation, aspect ratio adjustments, luminance variations, scaling, rotation, cropping, zooming and flipping are used to improve the robustness and representativeness of crop disease data. These techniques solve overfitting problems, improving accuracy and reliability in identifying plant illnesses. By providing a wider range of datasets, models can perform better in real-world problems and generalize more effectively.
Analysis of various classification techniques used in plant disease detection
In image processing and computer vision, disease classification is the most prominent phase in detecting plant diseases. The effectiveness of this phase, which is crucial for disease identification, also depends on preliminary techniques such as acquisition, preprocessing and feature extraction. Numerous classification approaches have been investigated and applied to detect and categorize plant diseases utilizing leaf images. Support Vector Machine was utilized to recognise various diseases by
(Hossain et al., 2018; Hou et al., 2021; Mukhopadhyay et al., 2021; Singh and Kaur, 2021), achieving an accuracy of 93%, 97.40%, 83% and 95.99% respectively. (
Kumar and Vani, 2019) applied VGG16 to identify multiple diseases in tomato having accuracy of 99.11%.
(Tiwari et al., 2020) utilized VGG19 architecture to recognize late blight and early blight in leaf images of potato and attained 97.80% accuracy.
(Jadhav et al., 2021) utilized GoogleNet and AlexNet to detect brown spot, bacterial blight and frogeye spot in soyabean leaves and attained 96.25% and 98.75% accuracy, respectively. Moreover
(Ajra et al., 2020) employed AlexNet and ResNet50 to classify unhealthy and healthy leaves with an accuracy of 96.5%, 97% in potato, 96% and 95.3% in tomato (
Iqbal and Talukder, 2020) applied random forest for classification of healthy, late blight and early blight in potato leaves with 97% accuracy.
(Hong et al., 2020) used DenseNet and
(Kibriya et al., 2021) employed VGG16 and GoogleNet to classify multiple diseases on tomato, achieving an accuracy of 97.10%, 98% and 99.23%.
(Pham et al., 2020) employed various deep learning models to find diseases such as powdery mildew, anthracnose and gall midge on mango with an accuracy of 89.41%, 78.64%, 79.92% and 84.88%, respectively. (
Sambasivam and Opiyo, 2021;
Khalifa et al., 2021; Akbar et al., 2022; Datta and Gupta, 2023) applied CNN to detect different diseases on potato, peach and tea, attaining an accuracy of 98%, 99% and 96.56%, respectively.
(Kaur et al., 2022) developed a model by applying Efficient NetB7 for feature reduction and logistic regression for categorization of black rot, leaf blight and black measles on grapes leaves and achieved 98.7% accuracy (
Attallah, 2023) proposed a novel technique with transfer learning to retrieve features and KNN and SVM to classify distinct diseases on tomato, attaining 99.92% and 99.90% accuracy (
Singh and Yogi, 2023) did a comparison of Resnet with other deep learning models to classify healthy leaves, late blight and early blight of potato and 99.62% accuracy was achieved.
(Joseph et al., 2024) utilized eight fine-tuned deep learning architectures and demonstrated that MobileNet, Xception and CNN performed the best in identifying illnesses of maize leaves, with testing accuracy of 94.64%,95.80% and 97.04%respectively. Similarly, the architectures MobileNetV2, MobileNet and CNN outperformed in identifying the leaf illnesses of wheat, with testing accuracy of 96.32%, 96.28% and 98.08%, respectively. In terms of identifying leaf illnesses of rice, the Inception V3, Xception and CNN models outperformed, with testing accuracy of 96.20%, 97.28% and 97.06%, respectively.
(Sutiaji et al., 2024) suggested a novel weighted deep learning ensemble technique to enhance the performance of plant disease detection. By ensembling the architecture with a combination of two and three pretrained CNN models, applied transfer learning to individual CNN architectures by using weight updation on the final few layers to prioritise high-dimensional features. Grid search was used to find each model’s optimal weights for ensembling the models. Evaluation metrics showed that the three-model ensemble outperformed the two-model ensemble. The MobileNetV2-Xception-DenseNet121 and MobileNetV2-DenseNet121 ensemble models have the best accuracy values, which are 99.49% and 99.56%, respectively. For effective plant disease diagnosis and classification,
(Shafik et al., 2024) presented two Plant Disease Detection (PDDNet) architectures: the lead voting ensemble (LVE) and early fusion (AE), which are merged with some pre-trained convolutional neural networks and fine-tuning is done by applying deep feature extraction. Prominent pre-trained networks such as ResNet50, ResNet101, DenseNet201, AlexNet, EfficientNetB7, GoogleNet, ResNet18, Conv Next Small and NAS Net Mobile, were used for hyper parameter fine-tuning. Finally, logistic regression is applied to assess the effectiveness of distinct CNN architectures. Additionally, the suggested model, deep learning classifiers and related recent studies were compared. The evaluations showed that PDDNet-LVE and PDDNet-AE outperformed existing CNNs in terms of robustness and generalisation, achieving 97.79% and 96.74%, respectively, on tests of multiple plant diseases.
Kanade et al., (2025) utilized 4,480 real-time leaf photographs and found CNN-based YOLOv8 model exhibited superior precision, recall and F1-score of 97.2%, 78.6% and 86.9%, respectively. Table 3 shows classification approaches used in plant disease detection.
Discussion on classification techniques
Numerous classification approaches have demonstrated high accuracy rates in various studies. ML algorithms, such as Decision Trees, Support Vector Machines (SVM) and Random Forests, have been widely applied because of their potential to manage huge and complicated datasets and provide interpretable results. Depending on the usage of feature selection, preprocessing methods and the size of dataset, these algorithms often attain high accuracy levels, ranging from 80% to 99%. These are particularly effective when trained on well-curated datasets with distinct features. In contrast, DL models, particularly Convolutional Neural Networks (CNNs), have revolutionized the recognition and categorization of crop ailments by instinctively learning hierarchical features from raw image data. CNNs excel in capturing complex patterns and peculiarities in plant diseases, achieving accuracies upwards of 95% in numerous studies. Their ability to adapt and generalize across diverse datasets and environmental conditions underscores their superiority in complex image classification tasks. These developments show significant promise for real-world agricultural applications by highlighting the ability of complex classification techniques to improve the precision and robustness of crop disease diagnosis.
The accuracy rate of various ML and DL architectures is demonstrated in Fig 2. With 99.90% accuracy, SVM performs the best classification tasks. Convolutional Neural Networks (CNN), with 99% accuracy, demonstrate their efficacy in intricate data patterns. The resilience of various deep learning architectures such as ResNet50, Google Net and VGG16 in image processing tasks is demonstrated by the accuracy of 99.62%, 99.23% and 99.11%, respectively. The remarkable accuracy of 99.56% achieved by ensemble methods such as the MobileNetV2-Xception-DenseNet21 combination suggests that incorporating different models can improve performance. Overall, deep learning models high accuracies highlight their superiority in managing challenging tasks. In contrast, the effectiveness of conventional approaches highlights the necessity of choosing the right models depending on particular application requirements.
Datasets used in plant disease detection
Gathering data from plant leaf imaging is the first step in identifying and classifying leaf diseases. Using a camera device, one can obtain image data from open-source repositories or personally snap pictures of plant leaves. This section discusses various resources many researchers have used to gather image data for their studies. Table 4 shows a detailed description of various datasets and their corresponding links. In conclusion, collecting image data is essential for determining the kind and severity of plant leaf diseases. Researchers have employed open-source repositories and customized datasets like Plant Pathology Dataset, Plant Village, New Plant Diseases Dataset and PlantDoc to collect image data for analysing and detecting plant leaf diseases, enhancing efficiency.
Critical issues and challenges
i. The shortage of sufficient data to train the models data is a common problem for researchers. Smaller data sets are more difficult for researchers since they might lead to over-fitting issues, which provide new challenges.
ii. Neural networks have high computation time and high complexity because they have many layers. Another frequent problem faced by researchers is the increased dimensionality of data. It is because of data having a large number of features referred as dimensionality.
iii. Despite the numerous studies to apply machine learning approaches in farming, farmers may only use a few mobile-based applications. Mobile platforms must be coupled with the models created by different researchers in order for to use and apply them more easily by the farmers.
iv. In certain studies, CNN performs better when certain hyperparameters are adjusted, including weight, max epoch, bias learning rate and minibatch size. Tuning specific parameters in pre-established techniques, such as voting classifier, AdaBoost, XGBoost, Random Forest and Decision Tree and combining them with nature-inspired optimization techniques to improve performances.
v. The researchers have mainly used lab-created datasets with high illumination, contrast, regions and other characteristics. It is possible that some requirements would not be satisfied in real life when preparing the dataset. Therefore, real-world scenarios may not yield optimal performance from models trained on lab-created datasets.