The fruit industry presents a lucrative opportunity for entrepreneurs and businesses. The fresh fruit trade has emerged as a growing opportunity, requiring processing, packaging, cold chain logistics and retail analytics. Climatic changes and pest infestations, combined with improper treatment, can negatively impact fruit cultivation. A wide range of diseases affect both fruits and their outer skin. Due to the diseases observed in fruits, farmers suffer from commercial loss. Studies report that fruit diseases contribute to nearly 10% of postharvest losses (
Patel and Patil, 2024). To prevent this loss, the early detection of diseases in fruits is necessary. Human diagnosis can be unpredictable due to individual perspectives and varying levels of expertise. Advancements in technology such as machine learning techniques can improve the detection of diseases
(Mehta et al., 2025). The technologies, such as Precision farming and Internet of Things (IoT), can help farmers increase yield, monitor environmental conditions and improve fruit quality
(Abbasi et al., 2022).
Common fruit diseases like Anthracnose, Powdery Mildew, Mango Malformation, Mango Scab, Black Sigatoka, Colletotrichum musae, Black Rot and Apple Scab, are affecting crops such as mango, apple, banana and pineapple
etc.
(Meena et al., 2024). The research work focused on the detection of the disease found in the mango. To accomplish this task a machine intelligence can be used to enhance the detection of diseases.
In this research, two datasets based on mango diseases are used for understanding the efficiency of the machine learning models in disease detection. This work focused on the efficiency of the algorithm in the prediction of mango diseases. The efficiency of convolutional neural network (CNN) in recent advancements in the research indicates the remarkable accuracy in the predictions through the images
(Li et al., 2022; Patel et al., 2023). Several algorithms have been combined with CNNs to improve prediction and classification accuracy. Algorithms based on LSTM, BiLSTM, Honey Badger Optimisation Algorithm, SVM, Random Forest,
etc, are the algorithms observed to use with CNN to increase the accuracy in the prediction
(Yuan et al., 2024 and
Patil and Deshpande, 2024). The transfer learning is one of the algorithms that gave fine-tuning in the performance
(Reddy et al., 2020 and
Joseph et al., 2021).
Transfer learning typically involves using a pre trained base model, performing feature extraction, adding optional layers such as dropout or batch normalization and applying a suitable training strategy. The examples of transfer learning are ResNet-50, EfficientNet, Vision Transformer and Swin Transformer
(Zhuang et al., 2021). Fig 1 illustrates the architecture of the transfer learning.
The researchers need to tackle the real-world challenges, such as lighting variations, occlusion and noisy images. The preprocessing and augmentation process will help in the reduction of the difficulties
(Bhat et al., 2023). Preprocessing techniques such as resizing, shifting and zooming are applied to enhance image quality
(Rahman et al., 2023).
Despite these advances, a need remains for lightweight, computationally efficient models that can deliver robust accuracy across multiple datasets and disease classes. The current research focuses on the detection of diseases on mangoes and mango leaves. The current research combined pretrained lightweight CNN variants such as MobileNet, ShuffleNet and ResNet for feature extraction from RGB images and ViT transformer for the disease detection. The proposed research is performing fine tuning of pre-trained layers. Performance metrics are compared to identify the most efficient predictor for each dataset. The proposed framework is designed to balance accuracy with efficiency, enabling scalable deployment in precision agriculture. The main contribution of this work is summarized as follows:
1. The ViT- CNN fusion is used for mango disease detection that combines feature extraction and contextual reasoning.
2. A comparative performance of all three lightweight pretrained CNNs (MobileNet, ShuffleNet, ResNet) is evaluated across datasets related to mango diseases.
3. The efficiency of the experiment will be used in deployment in precision algorithms.
Related work
Machine learning and Computer Vision play a significant role in fruit classification and disease detection, enabling the automation of systems that rely on visual features. For the classification and disease detection problem, research are focused on various image processing techniques and feature extraction methods which are tested across publicly available and self repositories. The publicly available dataset is analysed in the research to focus on resolution, lighting, variety and background complexity. This approach highlights the strengths and limitations of different approaches.
To find the defect in the apple, the YOLOv4 algorithm was applied to the images obtained using an NIR camera. YOLOv4 achieved over 92% accuracy on variants of apples in detecting defects to demonstrate its robustness. The variants of the YOLOv4 algorithms are also applied on the dataset, which gave an average 93.9% overall accuracy
(Fan et al., 2022).
Surveys indicate that VGGNet generally outperforms AlexNet in fruit disease detection. Observations suggest hybrid approaches combining both models can achieve an accuracy of nearly 99% (
Goel and Pandey, 2022). Tomato leaf diseases have been successfully classified using DenseNet121, achieving high accuracy across multiple class configurations. The publicly available dataset was organised in three ways: 5-class, 7-class and 10-class classification. The accuracy obtained on the original dataset is 98.16%, 95.08% and 94.34% for 5-class, 7-class and 10-class classification, respectively
(Abbas et al., 2021). Similarly, grape leaf diseases are detected used DenseNet121 model with an accuracy 99.86% (
Patil and More, 2025).
It has been observed that the Transfer Learning using Convolutional Neural Network gave 94.8% accuracy for public dataset FIDS 30 for classification and fruit detection
(Geerthik et al., 2024). On the same datas
et alexN
et algorithm gives 75% accuracy
(Geerthik et al., 2024). RNN gave 98.47% accuracy for the Dataset FIDS 30
(Dhiman et al., 2021).
A public dataset, Fruit-360, with more than 40,000 images, available on the Kaggle website, is widely used in classification problems based on CNN (
Oltean, 2025). One such research focused on the classification of fruits on the images of the fruits apple, lemon and mango resulted in 95% accuracy
(Bobde et al., 2021). Banana ripeness classification using YOLOv8 variants on a dataset of 18,000 images achieved accuracy between 94% and 96% (
Aishwarya and Vinesh, 2023).
Hybrid approaches, such as combining CNNs with optimization algorithms like Honey Badger, have achieved near perfect accuracy in pomegranate disease classification (
Patil and Deshpande, 2024). Feature extraction is accomplished with the algorithm of RestNet 50 and Detectron 2. The Multiclassification problem provides 99.58% accuracy in the prediction of the diseases (
Patil and Deshpande, 2024).
In another former research, a self-repository FruitQ based on images of 11 fruits was created and tested with deep learning algorithms. Among them, the ResNet18 had given 99.80% result in the classification problem (
Abayomi-Alli et al., 2024).
Another research based on object detection frameworks such as YOLOv8 and Faster R CNN have been used to localise and quantify mango fruit and leaf diseases
(Srinivasan et al., 2025). For the dataset based on Alphanso mangoes from Mysore, Karnataka, the machine learning machine learning classifiers gave 83% and 82% accuracy in the hierarchical classification and single-shot multiclass classification, respectively
(Raghavendra et al., 2020).
Recent research mentioned accuracy above 98% when applied on MangoFruitDDS and MangoLeafBD datasets using ConvNeXt and Vision Transformers (ViTs), underscoring the potential of transformer-based architectures for mango disease detection
(Alamri et al., 2025).
While prior studies demonstrate high accuracy in fruit disease detection, most focus on single datasets or computationally heavy models. There remains a need for lightweight, efficient architectures that generalize across multiple mango disease datasets. This study addresses this gap by proposing a ViT CNN fusion framework.