Comparative Evaluation of DenseNet121 and DenseNet201 for Automated Detection of Major Fava Bean Leaf Diseases using Field-collected Images

F
Fengren Lin1,*
W
Weijie Jiang1
1Department of Information Engineering, Fuzhou Polytechnic, Fuzhou, Fujian, 350108, China.
  • Submitted23-12-2025|

  • Accepted03-04-2026|

  • First Online 23-04-2026|

  • doi 10.18805/LRF-924

Background: Foliar diseases like Chocolate spot, Rust and Gall threaten fava bean yield and quality. Traditional visual inspection is slow and error-prone, so automated deep learning approaches offer faster and more accurate disease detection.

Methods: This study evaluates the performance of the deep learning architectures DenseNet121 and DenseNet201 for automated fava bean leaf disease classification using a field-acquired dataset of 8,021 RGB images across four classes: Healthy, Rust, Gall and Chocolate spot. Both models were trained, validated and tested under identical preprocessing, augmentation and hyperparameter settings. Evaluation metrics included accuracy, precision, recall, F1-score, MCC, ROC-AUC, Precision-Recall curves and confusion matrices, supported by qualitative confidence-score analysis.

Result: Both models demonstrated high and stable performance, with DenseNet121 achieving 96.45% accuracy and DenseNet201 achieving 96.01% accuracy. DenseNet201 converged faster and showed slightly stronger recall for Rust, whereas DenseNet121 achieved marginally higher overall accuracy and MCC. High confidence scores and low misclassification rates across all classes highlight the effectiveness of DenseNet architectures for reliable, real-time disease detection in precision agriculture.

Precision agriculture seeks to maximize crop productivity while minimizing resource use. It does so by combining data, sensors and advanced computational tools to monitor plant health and optimize interventions. Among these tools, deep learning (DL) has emerged as a game changer. DL-based image analysis enables automated plant disease detection. It allows fast, accurate and scalable assessment of crop health under realistic farming conditions (Dolatabadian et al., 2024; Majdalawieh et al., 2025; Mansoor et al., 2025). Fava bean (Vicia faba) is a widely cultivated legume. It plays a key role in food security and local economies in many regions (Karkanis et al., 2018; Sharun et al., 2024; Desfita et al., 2025; Hu et al., 2024). However, this crop is highly vulnerable to foliar diseases. Under favorable environmental conditions, these diseases spread quickly. They severely reduce plant vigor, yield and seed quality. Early detection of these diseases is thus critical for effective crop management. (Gasim et al., 2015; Paul and Gupta, 2021).
       
Conventional leaf inspection is slow, labor-intensive and often inconsistent, limiting timely and accurate disease detection (Das et al., 2025; Gupta et al., 2025). Early studies demonstrated high accuracy when classifying healthy versus diseased leaves on standard datasets. For example, review work shows many CNN based techniques yield classification accuracy above 92% for various crops and diseases (Tugrul et al., 2022; Ngugi et al., 2024). Several works that used the architecture DenseNet121 have reported very high accuracies, e.g. around 99.5% in controlled datasets for crop leaf disease identification (Bakr et al., 2022; Mazumder et al., 2024 and Krishna et al., 2025). Deep learning (DL) methods have been widely applied to crops such as tomatoes, potatoes, apples, maize and grapes (Pacal et al., 2024; Mehta et al., 2025). In contrast, foliar diseases of fava bean remain largely underexplored and insufficiently studied.
       
Salau et al., (2023) developed an end-to-end CNN model and reported 92.1% accuracy on raw images and 98.14% on preprocessed datasets. Although preprocessing improved performance, the study relied primarily on controlled image conditions, which may not reflect field variability. Similarly, Jeong and Na (2024) proposed an 8-layer deep CNN for classifying Chocolate Spot, Rust, Gall and Healthy categories. Their model achieved high training accuracy (99.37%) but lower validation accuracy (89.69%) after 75 epochs. Mostafa et al., (2025) also reported 98.92% accuracy using a sequential CNN; however, details regarding dataset diversity and field-based validation remain limited. These observations suggest that while prior studies demonstrate strong laboratory performance, challenges persist in ensuring robust generalization and deployability under real agricultural conditions. Additionally, most existing work evaluates single architectures without systematic comparison under identical experimental settings, restricting understanding of architectural advantages and limiting evidence-based model selection.
       
To address these limitations, the current research employs field-collected datasets representing real-world imaging variability and systematically compares DenseNet121 and DenseNet201 under identical preprocessing, augmentation and evaluation protocols. Unlike prior studies that primarily report accuracy metrics, this work integrates statistical significance testing, multi-metric evaluation and confidence-score analysis to provide a comprehensive assessment of model performance and reliability. By bridging the gap between laboratory-based high accuracy and field applicability, the study contributes a scalable and scientifically validated solution for automated faba bean disease detection in precision agriculture.
This study compares two deep learning architectures, DenseNet121 and DenseNet201, for the detection of fava bean leaf diseases. The workflow consists of dataset preparation, preprocessing, augmentation, model construction, training and evaluation. The complete procedure is presented in Fig 1. All steps were kept identical for both models to allow a fair comparison. All experiments were executed on a Windows 10 machine. The system was equipped with an Intel® Core™ i5-11320H CPU running at 3.20 GHz and supported by 16 GB of RAM. Storage was provided by a 512 GB SSD (S930P PRO 2.5"). The machine included an NVIDIA GeForce GT 740 graphics card with 4 GB of VRAM. Python 3.11 served as the primary programming language. TensorFlow 2.x and the Keras high-level API were used as the core deep-learning frameworks. All scripts were developed, tested and executed in a Jupyter Notebook environment.

Fig 1: Workflow of leaf disease detection.


       
Although a dedicated GPU was available, the NVIDIA GeForce GT 740 is an entry-level GPU with limited deep-learning acceleration capability. Consequently, training efficiency remained constrained compared with modern GPU architectures. While the current hardware was sufficient to train the models on the present dataset of 8,021 fava bean leaf images, training time remained relatively high. This configuration also limits scalability for larger datasets and reduces practicality for real-time deployment. Therefore, the experimental setup represents a moderate-performance environment rather than a high-performance computing platform and more advanced GPUs could substantially improve computational efficiency and deployment feasibility.
 
Dataset description
 
The dataset consisted of 8,021 RGB images of Vicia faba (faba bean) leaves. All images were collected under natural field conditions. Agricultural experts supervised the collection process to ensure accurate labeling of disease categories. The images were grouped into four classes based on visible symptoms: Healthy (2,019 images); Rust-infected (2,000 images); Faba bean gall-infected (2,000 images) and Chocolate spot-infected (2,002 images). Images were captured using a portable Open Data Kit (ODK) device. The collection covered multiple farms during the active crop season. Each image was stored in RGB format with a resolution of 512×512 pixels. The distribution was nearly balanced, minimizing class-imbalance bias. Fig 2 shows example images from each disease class.

Fig 2: Sample images of healthy, rust, gall and chocolate spot-infected leaves.


 
Data preprocessing
 
All images were subjected to uniform preprocessing steps. Each RGB image was resized to 224×224 pixels using bilinear interpolation. Pixel values were normalized to the [0, 1] range using  . Data augmentation was applied
 
to improve the model’s generalization ability. The augmented images were generated through controlled transformations that slightly altered the original samples while preserving their essential disease features. Each image was randomly rotated within a ±20° range to simulate natural variations in leaf orientation. Horizontal and vertical flipping were applied to mimic different camera angles encountered during field photography. The images were also zoomed within a 0.8-1.2 range to account for variations in distance between the camera and the leaf surface. In addition, width and height shifts of up to ±0.2 were introduced to simulate positional differences within the image frame. Together, these transformations expanded the effective dataset and reduced overfitting by exposing the model to a diverse set of realistic image variations.
 
Model architecture
 
Two pretrained convolutional neural network architectures, DenseNet121 and DenseNet201, were used to classify fava bean leaf diseases, both belonging to the DenseNet family in which each layer receives feature maps from all preceding layers through dense connectivity. This dense flow strengthens feature reuse and reduces redundant computations. DenseNet121 is shallower with fewer parameters, while DenseNet201 has a deeper structure and greater representational capacity. The two models differ mainly in the number of dense blocks and composite layers, which results in variation in depth and computational complexity. Both architectures were initialized with ImageNet weights. The pretrained convolutional layers were kept frozen during training. Freezing stabilizes feature extraction and prevents large gradient updates that may damage pretrained filters. It also reduces training time and computational cost.
       
After the DenseNet backbone, the model applied global average pooling to compress the spatial feature maps into a single feature vector. This reduces overfitting by minimizing the number of trainable parameters. A fully connected output layer with softmax activation was added for classification into four disease categories: healthy, rust, gall and chocolate spot. The classification mapping can be expressed as:
 
ŷ = Softmax [WF(X′) + b]
 
Where,
F(X′)= The feature vector obtained after global average pooling.
W and b= Trainable classification parameters.
ŷ= The probability distribution across the four classes.
       
The general model structure is summarized in Fig 3, which illustrates the preprocessing stages, DenseNet backbone, pooling layer and output classifier.

Fig 3: Training and validation accuracy and loss curves for both models.


 
DenseNet backbone structure
 
Each DenseNet block consists of multiple composite layers. Each composite layer includes batch normalization, ReLU activation and a 3×3 convolution. The output of each composite layer is concatenated with all previous feature maps. For a dense block with L layers, the output after the final layer is:
 
XL = HL [(X0, X11, X2,.., XL - 1)]
 
Where,
XL= Concatenation.
HL=  The nonlinear transformation inside the layer.
       
Transition layers are placed between dense blocks. A transition layer includes 1×1 convolution followed by average pooling to reduce feature map dimensions. These layers prevent feature explosion and stabilize memory usage. DenseNet121 contains four dense blocks with a total of 121 layers. DenseNet201 contains the same number of blocks but extends depth to 201 layers. The increased depth enables DenseNet201 to capture more complex patterns but requires higher computational resources.
 
Classifier head
 
Both DenseNet121 and DenseNet201 were extended with the same classifier head to maintain fairness in comparison. The classifier receives the pooled feature vector and maps it to four classes. The linear transformation is computed using:
 
zi = WiF (X′) + bi
 
Followed by a softmax activation:


The class with the highest wi represents the predicted label. The training process was conducted for a maximum of 50 epochs, utilizing early stopping and model checkpointing to prevent overfitting and save the highest-performing model based on validation accuracy. After training, both DenseNet121 and DenseNet201 were evaluated on the test dataset using accuracy, precision, recall and F1-score, ensuring a comprehensive performance comparison. Additional assessment was performed using the confusion matrix, which allowed visualization of classification errors. The identical training configurations for both models ensured a fair and controlled comparison of their effectiveness in fava bean leaf disease detection.










Statistical comparison of the predictive performance of DenseNet121 and DenseNet201 (McNemar’s test)
 
To statistically compare the predictive performance of DenseNet121 and DenseNet201, McNemar’s test was performed on the independent test dataset. The test dataset consisted of 1,605 images belonging to four classes and was loaded using TensorFlow’s image_ dataset_from_directory function with a fixed image resolution of 224 × 224 pixels and batch size of 32. Data shuffling was disabled to ensure identical sample ordering for both models. All images were preprocessed using the DenseNet-specific preprocess_input function. The best-performing pretrained models, DenseNet121 and DenseNet201, were loaded from saved checkpoint files. Predictions were generated for the entire test set and class labels were obtained using the argmax operation on softmax outputs. Correct and incorrect predictions were determined for each model by comparing predicted labels with true labels. A 2×2 contingency table was constructed representing: (i) instances correctly classified by both models, (ii) instances correctly classified only by DenseNet121, (iii) instances correctly classified only by DenseNet201 and (iv) instances misclassified by both models. McNemar’s exact test was then applied using the statsmodels library to evaluate whether the performance difference between the two architectures was statistically significant at a significance level of 0.05.
In this work, both DenseNet121 and DenseNet201 were trained for 50 epochs and showed smooth and stable convergence. DenseNet121 showed a gradual increase in accuracy, rising from low initial values in the first epoch (≈25% training and ≈39% validation accuracy) and surpassing 90% validation accuracy by the 12th-14th epoch. Its performance continued improving until it reached a peak of ≈96% validation accuracy, after which the curve stabilized. DenseNet201, being deeper and more expressive, converged faster. It achieved over 90% validation accuracy within the first 8-10 epochs and reached its maximum of ≈97% validation accuracy slightly earlier than DenseNet121. In both models, validation loss consistently decreased throughout training, indicating stable optimization without signs of overfitting. Early stopping and check pointing ensured that the best epoch was preserved for final testing, making the comparison fair and reproducible. Overall, DenseNet201 required fewer epochs to reach its optimal performance, whereas DenseNet121 showed a slower but steady learning trajectory.
       
Fig 4 presents the combined confusion matrix results for both DenseNet121 and DenseNet201 models. DenseNet121 achieved strong classification performance, correctly identifying 380 Chocolate spot, 379 Gall, 400 Healthy and 389 Rust samples. Misclassifications were low, with Chocolate spot occasionally predicted as Gall (8), Healthy (2), or Rust (11) and Gall sometimes confused with Chocolate spot (16), Healthy (1), or Rust (4). Healthy samples showed minimal errors (1 to Chocolate spot and 3 to Rust), while Rust displayed small confusion with Chocolate spot (10) and Gall (1). DenseNet201 showed slightly improved accuracy, correctly predicting 367 Chocolate spot, 379 Gall, 398 Healthy and 397 Rust samples. Errors remained minimal, with Chocolate spot misclassified as Gall (13), Healthy (2), or Rust (19), while Gall had limited confusion with Chocolate spot (17), Healthy (2) and Rust (2). Healthy samples recorded only four errors in total and Rust achieved very high precision with just three misclassifications. Overall, DenseNet201 demonstrated stronger class separability, fewer cross-class errors and slightly higher accuracy, whereas DenseNet121 performed consistently well but showed marginally higher confusion across categories.

Fig 4: Confusion matrices for DenseNet121 and DenseNet201 models.


       
Table 1 summarises the class-wise Precision, Recall and F1-Scores for both DenseNet121 and DenseNet201. DenseNet121 shows strong and balanced performance across all four classes. Chocolate spot achieves a Precision of 0.9337 and Recall of 0.9476, indicating reliable detection with few missed cases. Gall performs even better, with Precision 0.9768 and F1-Score 0.9619, reflecting high consistency in prediction. Healthy shows the highest scores (Precision 0.9926, Recall 0.9901) and demonstrates excellent discrimination. Rust also performs strongly with an F1-Score of 0.9641. DenseNet201 shows similar trends. Chocolate spot remains accurate (F1-Score 0.9315), while Gall maintains strong stability with an F1-Score of 0.9523. Healthy remains highly separable with an F1-Score of 0.9876 and Rust perform best with a Recall of 0.9925 and an F1-Score of 0.9683. Accuracy and averages for both models remain close, with DenseNet121 at 0.9645 and DenseNet201 at 0.9601, indicating very similar overall reliability. MCC values also support this consistency, with DenseNet121 scoring 0.9527 and DenseNet201 0.9470. Overall, both models show excellent stability and class-wise precision, with DenseNet121 slightly ahead in overall accuracy and MCC, while DenseNet201 achieves stronger Recall for Rust and maintains competitive performance across all metrics.

Table 1: Classification performance metrics for DenseNet121 and DenseNet201 models.


       
Fig 5 illustrates the ROC and Precision–Recall curves for both DenseNet121 and DenseNet201 in a one-vs-rest (OvR) setting. DenseNet121 shows excellent class-wise discrimination, with AUC values of 0.9937 for Chocolate spot, 0.9975 for Gall, 0.9998 for Healthy and 0.9972 for Rust. All curves stay very close to the upper-left boundary, indicating extremely low false-positive rates and strong separability across categories. The corresponding Precision–Recall curves also show high stability, with average precision (AP) values above 0.98 for all classes. Healthy and Rust achieve the highest AP values (0.9993 and 0.9945, respectively), showing near-perfect precision at all recall levels.

Fig 5: ROC and P-R curves (One-vs-Rest) for DenseNet121 and DenseNet201 models.


       
DenseNet201 demonstrates similarly strong performance. AUC values remain high, with 0.9936 for Chocolate spot, 0.9972 for Gall, 0.9997 for Healthy and 0.9971 for Rust. The ROC curves closely overlap with DenseNet121, confirming consistent model behavior. The Precision-Recall curves also remain sharply peaked near the top-right corner, with AP values above 0.98 for every class. Healthy again shows the strongest curve (AP = 0.9993), while Rust and Gall maintain highly stable precision even at high recall ranges. Overall, both models demonstrate exceptional discrimination ability, with DenseNet121 showing marginally higher AP for Rust, while DenseNet201 remains equally competitive across all classes.
       
Fig 6 presents qualitative prediction examples for both DenseNet121 and DenseNet201, showing correctly classified leaf images across all four categories. DenseNet121 achieves high confidence values for every sample, with confidence scores ranging from 0.9992 to 1.0000, indicating strong certainty in its predictions. The model accurately identifies the visual patterns associated with each disease: dark necrotic patches for Chocolate spot, swollen tissue formations for Gall, circular pustules for Rust and smooth, unaffected foliage for Healthy plants.

Fig 6: Performance of deep learning models based on confidence score distributions.


       
DenseNet201 displays similar reliability, producing confidence scores of 0.9998 to 1.0000 for all tested images. Its predictions remain precise across diverse lighting conditions and leaf textures, reflecting strong feature extraction and generalization. The clear visual differentiation in lesions, along with consistently high prediction certainty, shows that both models successfully learn distinctive disease characteristics. Overall, the qualitative results demonstrate excellent real-world applicability and confirm the strong classification performance observed in the quantitative evaluations.
       
The test dataset contained 1,605 images distributed across four classes and both DenseNet121 and DenseNet201 models were evaluated on the same independent test set under identical:

 
McNemar p-value = 0.499896876860327; Result = No statistically significant difference between models.
       
McNemar’s contingency analysis showed that 1,505 samples were correctly classified by both models, while 43 samples were correctly classified only by DenseNet121 and 36 only by DenseNet201; 21 samples were misclassified by both architectures. The resulting McNemar’s exact test yielded a p-value of 0.4999, indicating no statistically significant difference in classification performance between the two models (p>0.05). Although DenseNet121 exhibited marginally higher accuracy (96.45%) compared to DenseNet201 (96.01%), the dominance of jointly correct predictions and the minimal imbalance between discordant pairs (43 vs. 36) suggest highly similar decision boundaries and comparable generalization capability. These findings confirm that the observed 0.44% accuracy difference does not represent meaningful model superiority under the current experimental configuration. The application of statistical validation strengthens the scientific rigour of the comparison and prevents overinterpretation of minor numerical variations. Future investigations may incorporate full backbone fine-tuning, cross-validation and calibration analysis to further explore subtle architectural advantages and enhance robustness of performance assessment.
       
Previous studies have demonstrated the effectiveness of DenseNet architectures for plant disease detection across various crops and datasets. Wei et al., (2022) reported 96.4% accuracy for DenseNet121 on PlantVillage, showing its consistent performance across multiple crops and devices. Table 2 presents a comparative evaluation of DenseNet-based models for plant leaf disease classification.

Table 2: Comparative performance of DenseNet-based models for plant leaf disease.


       
Bansal et al., (2021) obtained 96.25% accuracy using a DenseNet121-based ensemble for apple leaves, while Odounfa et al., (2025) reported 96.25% classification accuracy for DenseNet121 on chili leaves. Bajpai et al., (2023) achieved 97.78% accuracy by combining DenseNet201 with SVM, outperforming DenseNet121 alone (94%), highlighting the benefits of hybrid approaches.
               
Enhanced architectures, such as DenseNet201Plus for banana and black gram (Mazumder et al., 2024) and hybrid CNN-ViT with DenseNet201 for apple and corn (Aboelenin et al., 2025), achieved higher accuracies, while fine-tuned DenseNet121 on PlantVillage reached 99.81% (Andrew et al., 2022). In comparison, this study applied DenseNet121 and DenseNet201 to a field-collected fava bean dataset, achieving 96.45% and 96.01% accuracy with MCC values of 0.9527 and 0.9470. DenseNet201 converged faster and showed slightly better recall for Rust, whereas DenseNet121 maintained higher overall stability. These results demonstrate that the proposed models provide a practical, field-ready solution for fava bean disease detection, enabling early intervention, minimizing yield losses and supporting precision agriculture practices.
This study demonstrates the effectiveness of deep learning, especially DenseNet architectures, for detecting fava bean leaf diseases from field images. Both DenseNet121 and DenseNet201 performed exceptionally across all metrics. DenseNet121 achieved slightly higher overall accuracy (96.45%) and MCC, while DenseNet201 converged faster and showed stronger recall for Rust. Confusion matrices, ROC-AUC and Precision–Recall curves confirmed clear separability of disease classes. High confidence scores indicate reliable performance in real-world agricultural conditions. Despite these strengths, the study has limitations. The dataset, although extensive, is restricted to specific field conditions and may not cover all environmental variations. Model deployment on resource-limited devices remains challenging due to computational demands. Future work will focus on lightweight or mobile-optimized DenseNet models. Integration with smart farming systems and multi-sensor data could enhance real-time disease monitoring and intervention. These developments would further support precision agriculture and sustainable crop management.
Disclaimers
 
The views and conclusions expressed in this article are solely those of the authors and do not necessarily represent the views of their affiliated institutions. The authors are responsible for the accuracy and completeness of the information provided, but do not accept any liability for any direct or indirect losses resulting from the use of this content.
 
Funding details
 
This research received no external funding.
 
Authors’ contributions
 
All authors contributed toward data analysis, drafting and revising the paper and agreed to be responsible for all the aspects of this work.
 
Data availability
 
The data analysed/generated in the present study will be made available from corresponding authors upon reasonable request.
 
Availability of data and materials
 
Not applicable.
 
Use of artificial intelligence
 
Not applicable.
 
Declarations
 
Authors declare that all works are original and this manuscript has not been published in any other journal.
The authors declare that they have no conflict of interest.
 

  1. Aboelenin, S., Elbasheer, F. A., Eltoukhy, M.M., El-Hady, W.M. and Hosny, K.M. (2025). A hybrid Framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer. Complex and Intelligent Systems. 11(2). https://doi.org/10.1007/ s40747-024-01764-x.

  2. Andrew, J., Eunice, J., Popescu, D.E., Chowdary, M.K. and Hemanth, J. (2022). Deep learning-based leaf disease detection in crops using images for agricultural applications. Agronomy. 12(10): 2395. https://doi.org/10.3390/agronomy12102395.

  3. Bajpai, C., Sahu, R. and Naik, K.J. (2023). Deep learning model for plant-leaf disease detection in precision agriculture. International Journal of Intelligent Systems Technologies and Applications. 21(1): 72. https://doi.org/10.1504/ ijista.2023.130562.

  4. Bakr, M., Abdel-Gaber, S., Nasr, M. and Hazman, M. (2022). DenseNet based model for plant diseases diagnosis. European Journal of Electrical Engineering and Computer Science6(5): 1-9. https://doi.org/10.24018/ejece.2022.6.5.458.

  5. Bansal, P., Kumar, R. and Kumar, S. (2021). Disease detection in apple leaves using deep convolutional neural network. Agriculture. 11(7): 617. https://doi.org/10.3390/agriculture 11070617.

  6. Das, A., Pathan, F., Jim, J.R., Kabir, M.M. and Mridha, M. (2025). Deep learning-based classification, detection and segmentation of tomato leaf diseases: A state-of-the- art review. Artificial Intelligence in Agriculture. 15(2): 192-220. https://doi.org/10.1016/j.aiia.2025.02.006.

  7. Desfita, S., Sari, W., Wahyuni, D., Putri, F., Pramesyanti, P.A., Pato, U., Pratiwi, D., Grzelczyk, J. and Budryn, G. (2025). Synergistic effects of multi-strain probiotic and prebiotic combinations on immune recovery in aging populations. International Journal of Probiotics and Prebiotics. 20: 10-18. https://doi.org/10.37290/ijpp2641-7197.20:10- 18.

  8. Dolatabadian, A., Neik, T.X., Danilevicz, M.F., Upadhyaya, S.R., Batley, J. and Edwards, D. (2024). Image based crop disease detection using machine learning. Plant Pathology. 74(1): 18-38. https://doi.org/10.1111/ppa.14006.

  9. Gasim, S., Hamad, S.A., Abdelmula, A. and Ahmed, I.A.M. (2015). Yield and quality attributes of faba bean inbred lines grown under marginal environmental conditions of Sudan. Food Science and Nutrition. 3(6): 539-547. https://doi.org/10.1002/fsn3.245.

  10. Gupta, C., Gill, N.S., Gulia, P., Duhan, S., Karamti, H., Kumar, A., Alamneh, D.A. and Safra, I. (2025). An enhanced deep learning-based framework for diagnosing apple leaf diseases. Scientific Reports. 15(1): 39699. https:// doi.org/10.1038/s41598-025-23272-9.

  11. Hu, Y., Yang, L., Tong, J., Li, H., Wei, Q. and Chen, H. (2024). Current status and perspectives on the use of traditional Chinese medicine in the treatment of gastric cancer. Current Topics in Nutraceutical Research. 22(4): 1187- 1192. https://doi.org/10.37290/ctnr2641-452X.

  12. Jeong, H.Y. and Na, I.S. (2024). Efficient faba bean leaf disease identification through smart detection using deep convolutional neural networks. Legume Research-An International Journal. 47(8): 1404-1411. doi: 10.18805/LRF-798.

  13. Karkanis, A., Ntatsi, G., Lepse, L., Fernández, J.A., Vågen, I.M., Rewald, B., Alsiòa, I., Kronberga, A., Balliu, A., Olle, M., Bodner, G., Dubova, L., Rosa, E. and Savvas, D. (2018). Faba bean cultivation-revealing novel managing practices for more sustainable and competitive European cropping systems. Frontiers in Plant Science. 9: 1115. https:// doi.org/10.3389/fpls.2018.01115.

  14. Krishna, M.S., Machado, P., Otuka, R.I., Yahaya, S.W., Santos, F.N.D. and Ihianle, I.K. (2025). Plant leaf disease detection using deep learning: A multi-dataset approach. J- Multidisciplinary Scientific Journal. 8(1): 4. https:// doi.org/10.3390/j8010004.

  15. Majdalawieh, M., Martins, C., Radi, M., Alaraj, M. and Khan, S. (2025). Precision agriculture in the age of AI: A systematic review of machine learning methods for crop disease detection. Smart Agricultural Technology. 12: 101491. https://doi.org/10.1016/j.atech.2025.101491.

  16. Mansoor, S., Iqbal, S., Popescu, S.M., Kim, S.L., Chung, Y.S. and Baek, J. (2025). Integration of smart sensors and IOT in precision agriculture: Trends, challenges and future prospectives. Frontiers in Plant Science. 16: 1587869. https://doi.org/10.3389/fpls.2025.1587869.

  17. Mazumder, M.K.A., Kabir, M.M., Rahman, A., Abdullah-Al-Jubair, M. and Mridha, M. (2024). DenseNet201plus: Cost-effective transfer-learning architecture for rapid leaf disease identification with attention mechanisms. Heliyon. 10(15):  e35625. https://doi.org/10.1016/j.heliyon.2024.e35625.

  18. Mehta, A.R., Kumar, P., Prem, G., Aggarwal, S. and Kumar, R. (2025). AI-powered innovations in agriculture: A systematic review on plant disease detection and classification. Indian Journal of Agricultural Research. 59(9): 1321- 1330. doi: 10.18805/IJARe.A-6371.

  19. Mostafa, A., Alnuaim, A. and AlZubi, A.A. (2025). Utilizing convolutional neural networks for accurate detection of leaf diseases in fava beans. Legume Research-An International Journal. 48(3): 494-502. doi: 10.18805/LRF-823.

  20. Ngugi, H.N., Akinyelu, A.A. and Ezugwu, A.E. (2024). Machine learning and deep learning for crop disease diagnosis: Performance analysis and review. Agronomy. 14(12): 3001. https://doi.org/10.3390/agronomy14123001.

  21. Odounfa, M.G.F., Hounmenou, C.G., Salako, V.K., Affokpon, A. and Kakaï, R.L.G. (2025). Deep learning enables precision agriculture for sustainable chili pepper disease detection in benin. Discover Artificial Intelligence. 5(1). https:// doi.org/10.1007/s44163-025-00583-4.

  22. Pacal, I., Kunduracioglu, I., Alma, M.H., Deveci, M., Kadry, S., Nedoma, J., Slany, V. and Martinek, R. (2024). A systematic review of deep learning techniques for plant diseases. Artificial Intelligence Review. 57(11). https://doi.org/10.1007/ s10462-024-10944-7.

  23. Paul, S.K. and Gupta, D.R. (2021). Faba bean (Vicia faba L.), a promising grain legume crop of Bangladesh: A review. Agricultural Reviews. 42(3): 292-299. doi: 10.18805/ag.R-203.

  24. Salau, A.O., Abeje, B.T., Faisal, A.N. and Asfaw, T.T. (2023). Faba Bean Disease Detection using Deep Learning Techniques. In 2023 International Conference on Cyber Management and Engineering (CyMaEn). IEEE. (pp. 344-349). https:// doi.org/10.1109/CyMaEn57228.2023.10051088.

  25. Sharun, K., Banu, S.A., Mamachan, M., Abualigah, L., Pawde, A.M. and Dhama, K. (2024). Unleashing the future: Exploring the transformative prospects of artificial intelligence in veterinary science. Journal of Experimental Biology and Agricultural Sciences. 12(3): 297-317. https://doi.org/10.18006/2024.12(3).297.317.

  26. Tugrul, B., Elfatimi, E. and Eryigit, R. (2022). Convolutional neural networks in detection of plant leaf Diseases: A review. Agriculture. 12(8): 1192. https://doi.org/10.3390/agriculture 12081192.

  27. Wei, S.J., Riza, D.F.A. and Nugroho, H. (2022). Comparative study on the performance of deep learning implementation in the edge computing: Case study on the plant leaf disease identification. Journal of Agriculture and Food Research. 10: 100389. https://doi.org/10.1016/j.jafr.2022.100389.

Comparative Evaluation of DenseNet121 and DenseNet201 for Automated Detection of Major Fava Bean Leaf Diseases using Field-collected Images

F
Fengren Lin1,*
W
Weijie Jiang1
1Department of Information Engineering, Fuzhou Polytechnic, Fuzhou, Fujian, 350108, China.
  • Submitted23-12-2025|

  • Accepted03-04-2026|

  • First Online 23-04-2026|

  • doi 10.18805/LRF-924

Background: Foliar diseases like Chocolate spot, Rust and Gall threaten fava bean yield and quality. Traditional visual inspection is slow and error-prone, so automated deep learning approaches offer faster and more accurate disease detection.

Methods: This study evaluates the performance of the deep learning architectures DenseNet121 and DenseNet201 for automated fava bean leaf disease classification using a field-acquired dataset of 8,021 RGB images across four classes: Healthy, Rust, Gall and Chocolate spot. Both models were trained, validated and tested under identical preprocessing, augmentation and hyperparameter settings. Evaluation metrics included accuracy, precision, recall, F1-score, MCC, ROC-AUC, Precision-Recall curves and confusion matrices, supported by qualitative confidence-score analysis.

Result: Both models demonstrated high and stable performance, with DenseNet121 achieving 96.45% accuracy and DenseNet201 achieving 96.01% accuracy. DenseNet201 converged faster and showed slightly stronger recall for Rust, whereas DenseNet121 achieved marginally higher overall accuracy and MCC. High confidence scores and low misclassification rates across all classes highlight the effectiveness of DenseNet architectures for reliable, real-time disease detection in precision agriculture.

Precision agriculture seeks to maximize crop productivity while minimizing resource use. It does so by combining data, sensors and advanced computational tools to monitor plant health and optimize interventions. Among these tools, deep learning (DL) has emerged as a game changer. DL-based image analysis enables automated plant disease detection. It allows fast, accurate and scalable assessment of crop health under realistic farming conditions (Dolatabadian et al., 2024; Majdalawieh et al., 2025; Mansoor et al., 2025). Fava bean (Vicia faba) is a widely cultivated legume. It plays a key role in food security and local economies in many regions (Karkanis et al., 2018; Sharun et al., 2024; Desfita et al., 2025; Hu et al., 2024). However, this crop is highly vulnerable to foliar diseases. Under favorable environmental conditions, these diseases spread quickly. They severely reduce plant vigor, yield and seed quality. Early detection of these diseases is thus critical for effective crop management. (Gasim et al., 2015; Paul and Gupta, 2021).
       
Conventional leaf inspection is slow, labor-intensive and often inconsistent, limiting timely and accurate disease detection (Das et al., 2025; Gupta et al., 2025). Early studies demonstrated high accuracy when classifying healthy versus diseased leaves on standard datasets. For example, review work shows many CNN based techniques yield classification accuracy above 92% for various crops and diseases (Tugrul et al., 2022; Ngugi et al., 2024). Several works that used the architecture DenseNet121 have reported very high accuracies, e.g. around 99.5% in controlled datasets for crop leaf disease identification (Bakr et al., 2022; Mazumder et al., 2024 and Krishna et al., 2025). Deep learning (DL) methods have been widely applied to crops such as tomatoes, potatoes, apples, maize and grapes (Pacal et al., 2024; Mehta et al., 2025). In contrast, foliar diseases of fava bean remain largely underexplored and insufficiently studied.
       
Salau et al., (2023) developed an end-to-end CNN model and reported 92.1% accuracy on raw images and 98.14% on preprocessed datasets. Although preprocessing improved performance, the study relied primarily on controlled image conditions, which may not reflect field variability. Similarly, Jeong and Na (2024) proposed an 8-layer deep CNN for classifying Chocolate Spot, Rust, Gall and Healthy categories. Their model achieved high training accuracy (99.37%) but lower validation accuracy (89.69%) after 75 epochs. Mostafa et al., (2025) also reported 98.92% accuracy using a sequential CNN; however, details regarding dataset diversity and field-based validation remain limited. These observations suggest that while prior studies demonstrate strong laboratory performance, challenges persist in ensuring robust generalization and deployability under real agricultural conditions. Additionally, most existing work evaluates single architectures without systematic comparison under identical experimental settings, restricting understanding of architectural advantages and limiting evidence-based model selection.
       
To address these limitations, the current research employs field-collected datasets representing real-world imaging variability and systematically compares DenseNet121 and DenseNet201 under identical preprocessing, augmentation and evaluation protocols. Unlike prior studies that primarily report accuracy metrics, this work integrates statistical significance testing, multi-metric evaluation and confidence-score analysis to provide a comprehensive assessment of model performance and reliability. By bridging the gap between laboratory-based high accuracy and field applicability, the study contributes a scalable and scientifically validated solution for automated faba bean disease detection in precision agriculture.
This study compares two deep learning architectures, DenseNet121 and DenseNet201, for the detection of fava bean leaf diseases. The workflow consists of dataset preparation, preprocessing, augmentation, model construction, training and evaluation. The complete procedure is presented in Fig 1. All steps were kept identical for both models to allow a fair comparison. All experiments were executed on a Windows 10 machine. The system was equipped with an Intel® Core™ i5-11320H CPU running at 3.20 GHz and supported by 16 GB of RAM. Storage was provided by a 512 GB SSD (S930P PRO 2.5"). The machine included an NVIDIA GeForce GT 740 graphics card with 4 GB of VRAM. Python 3.11 served as the primary programming language. TensorFlow 2.x and the Keras high-level API were used as the core deep-learning frameworks. All scripts were developed, tested and executed in a Jupyter Notebook environment.

Fig 1: Workflow of leaf disease detection.


       
Although a dedicated GPU was available, the NVIDIA GeForce GT 740 is an entry-level GPU with limited deep-learning acceleration capability. Consequently, training efficiency remained constrained compared with modern GPU architectures. While the current hardware was sufficient to train the models on the present dataset of 8,021 fava bean leaf images, training time remained relatively high. This configuration also limits scalability for larger datasets and reduces practicality for real-time deployment. Therefore, the experimental setup represents a moderate-performance environment rather than a high-performance computing platform and more advanced GPUs could substantially improve computational efficiency and deployment feasibility.
 
Dataset description
 
The dataset consisted of 8,021 RGB images of Vicia faba (faba bean) leaves. All images were collected under natural field conditions. Agricultural experts supervised the collection process to ensure accurate labeling of disease categories. The images were grouped into four classes based on visible symptoms: Healthy (2,019 images); Rust-infected (2,000 images); Faba bean gall-infected (2,000 images) and Chocolate spot-infected (2,002 images). Images were captured using a portable Open Data Kit (ODK) device. The collection covered multiple farms during the active crop season. Each image was stored in RGB format with a resolution of 512×512 pixels. The distribution was nearly balanced, minimizing class-imbalance bias. Fig 2 shows example images from each disease class.

Fig 2: Sample images of healthy, rust, gall and chocolate spot-infected leaves.


 
Data preprocessing
 
All images were subjected to uniform preprocessing steps. Each RGB image was resized to 224×224 pixels using bilinear interpolation. Pixel values were normalized to the [0, 1] range using  . Data augmentation was applied
 
to improve the model’s generalization ability. The augmented images were generated through controlled transformations that slightly altered the original samples while preserving their essential disease features. Each image was randomly rotated within a ±20° range to simulate natural variations in leaf orientation. Horizontal and vertical flipping were applied to mimic different camera angles encountered during field photography. The images were also zoomed within a 0.8-1.2 range to account for variations in distance between the camera and the leaf surface. In addition, width and height shifts of up to ±0.2 were introduced to simulate positional differences within the image frame. Together, these transformations expanded the effective dataset and reduced overfitting by exposing the model to a diverse set of realistic image variations.
 
Model architecture
 
Two pretrained convolutional neural network architectures, DenseNet121 and DenseNet201, were used to classify fava bean leaf diseases, both belonging to the DenseNet family in which each layer receives feature maps from all preceding layers through dense connectivity. This dense flow strengthens feature reuse and reduces redundant computations. DenseNet121 is shallower with fewer parameters, while DenseNet201 has a deeper structure and greater representational capacity. The two models differ mainly in the number of dense blocks and composite layers, which results in variation in depth and computational complexity. Both architectures were initialized with ImageNet weights. The pretrained convolutional layers were kept frozen during training. Freezing stabilizes feature extraction and prevents large gradient updates that may damage pretrained filters. It also reduces training time and computational cost.
       
After the DenseNet backbone, the model applied global average pooling to compress the spatial feature maps into a single feature vector. This reduces overfitting by minimizing the number of trainable parameters. A fully connected output layer with softmax activation was added for classification into four disease categories: healthy, rust, gall and chocolate spot. The classification mapping can be expressed as:
 
ŷ = Softmax [WF(X′) + b]
 
Where,
F(X′)= The feature vector obtained after global average pooling.
W and b= Trainable classification parameters.
ŷ= The probability distribution across the four classes.
       
The general model structure is summarized in Fig 3, which illustrates the preprocessing stages, DenseNet backbone, pooling layer and output classifier.

Fig 3: Training and validation accuracy and loss curves for both models.


 
DenseNet backbone structure
 
Each DenseNet block consists of multiple composite layers. Each composite layer includes batch normalization, ReLU activation and a 3×3 convolution. The output of each composite layer is concatenated with all previous feature maps. For a dense block with L layers, the output after the final layer is:
 
XL = HL [(X0, X11, X2,.., XL - 1)]
 
Where,
XL= Concatenation.
HL=  The nonlinear transformation inside the layer.
       
Transition layers are placed between dense blocks. A transition layer includes 1×1 convolution followed by average pooling to reduce feature map dimensions. These layers prevent feature explosion and stabilize memory usage. DenseNet121 contains four dense blocks with a total of 121 layers. DenseNet201 contains the same number of blocks but extends depth to 201 layers. The increased depth enables DenseNet201 to capture more complex patterns but requires higher computational resources.
 
Classifier head
 
Both DenseNet121 and DenseNet201 were extended with the same classifier head to maintain fairness in comparison. The classifier receives the pooled feature vector and maps it to four classes. The linear transformation is computed using:
 
zi = WiF (X′) + bi
 
Followed by a softmax activation:


The class with the highest wi represents the predicted label. The training process was conducted for a maximum of 50 epochs, utilizing early stopping and model checkpointing to prevent overfitting and save the highest-performing model based on validation accuracy. After training, both DenseNet121 and DenseNet201 were evaluated on the test dataset using accuracy, precision, recall and F1-score, ensuring a comprehensive performance comparison. Additional assessment was performed using the confusion matrix, which allowed visualization of classification errors. The identical training configurations for both models ensured a fair and controlled comparison of their effectiveness in fava bean leaf disease detection.










Statistical comparison of the predictive performance of DenseNet121 and DenseNet201 (McNemar’s test)
 
To statistically compare the predictive performance of DenseNet121 and DenseNet201, McNemar’s test was performed on the independent test dataset. The test dataset consisted of 1,605 images belonging to four classes and was loaded using TensorFlow’s image_ dataset_from_directory function with a fixed image resolution of 224 × 224 pixels and batch size of 32. Data shuffling was disabled to ensure identical sample ordering for both models. All images were preprocessed using the DenseNet-specific preprocess_input function. The best-performing pretrained models, DenseNet121 and DenseNet201, were loaded from saved checkpoint files. Predictions were generated for the entire test set and class labels were obtained using the argmax operation on softmax outputs. Correct and incorrect predictions were determined for each model by comparing predicted labels with true labels. A 2×2 contingency table was constructed representing: (i) instances correctly classified by both models, (ii) instances correctly classified only by DenseNet121, (iii) instances correctly classified only by DenseNet201 and (iv) instances misclassified by both models. McNemar’s exact test was then applied using the statsmodels library to evaluate whether the performance difference between the two architectures was statistically significant at a significance level of 0.05.
In this work, both DenseNet121 and DenseNet201 were trained for 50 epochs and showed smooth and stable convergence. DenseNet121 showed a gradual increase in accuracy, rising from low initial values in the first epoch (≈25% training and ≈39% validation accuracy) and surpassing 90% validation accuracy by the 12th-14th epoch. Its performance continued improving until it reached a peak of ≈96% validation accuracy, after which the curve stabilized. DenseNet201, being deeper and more expressive, converged faster. It achieved over 90% validation accuracy within the first 8-10 epochs and reached its maximum of ≈97% validation accuracy slightly earlier than DenseNet121. In both models, validation loss consistently decreased throughout training, indicating stable optimization without signs of overfitting. Early stopping and check pointing ensured that the best epoch was preserved for final testing, making the comparison fair and reproducible. Overall, DenseNet201 required fewer epochs to reach its optimal performance, whereas DenseNet121 showed a slower but steady learning trajectory.
       
Fig 4 presents the combined confusion matrix results for both DenseNet121 and DenseNet201 models. DenseNet121 achieved strong classification performance, correctly identifying 380 Chocolate spot, 379 Gall, 400 Healthy and 389 Rust samples. Misclassifications were low, with Chocolate spot occasionally predicted as Gall (8), Healthy (2), or Rust (11) and Gall sometimes confused with Chocolate spot (16), Healthy (1), or Rust (4). Healthy samples showed minimal errors (1 to Chocolate spot and 3 to Rust), while Rust displayed small confusion with Chocolate spot (10) and Gall (1). DenseNet201 showed slightly improved accuracy, correctly predicting 367 Chocolate spot, 379 Gall, 398 Healthy and 397 Rust samples. Errors remained minimal, with Chocolate spot misclassified as Gall (13), Healthy (2), or Rust (19), while Gall had limited confusion with Chocolate spot (17), Healthy (2) and Rust (2). Healthy samples recorded only four errors in total and Rust achieved very high precision with just three misclassifications. Overall, DenseNet201 demonstrated stronger class separability, fewer cross-class errors and slightly higher accuracy, whereas DenseNet121 performed consistently well but showed marginally higher confusion across categories.

Fig 4: Confusion matrices for DenseNet121 and DenseNet201 models.


       
Table 1 summarises the class-wise Precision, Recall and F1-Scores for both DenseNet121 and DenseNet201. DenseNet121 shows strong and balanced performance across all four classes. Chocolate spot achieves a Precision of 0.9337 and Recall of 0.9476, indicating reliable detection with few missed cases. Gall performs even better, with Precision 0.9768 and F1-Score 0.9619, reflecting high consistency in prediction. Healthy shows the highest scores (Precision 0.9926, Recall 0.9901) and demonstrates excellent discrimination. Rust also performs strongly with an F1-Score of 0.9641. DenseNet201 shows similar trends. Chocolate spot remains accurate (F1-Score 0.9315), while Gall maintains strong stability with an F1-Score of 0.9523. Healthy remains highly separable with an F1-Score of 0.9876 and Rust perform best with a Recall of 0.9925 and an F1-Score of 0.9683. Accuracy and averages for both models remain close, with DenseNet121 at 0.9645 and DenseNet201 at 0.9601, indicating very similar overall reliability. MCC values also support this consistency, with DenseNet121 scoring 0.9527 and DenseNet201 0.9470. Overall, both models show excellent stability and class-wise precision, with DenseNet121 slightly ahead in overall accuracy and MCC, while DenseNet201 achieves stronger Recall for Rust and maintains competitive performance across all metrics.

Table 1: Classification performance metrics for DenseNet121 and DenseNet201 models.


       
Fig 5 illustrates the ROC and Precision–Recall curves for both DenseNet121 and DenseNet201 in a one-vs-rest (OvR) setting. DenseNet121 shows excellent class-wise discrimination, with AUC values of 0.9937 for Chocolate spot, 0.9975 for Gall, 0.9998 for Healthy and 0.9972 for Rust. All curves stay very close to the upper-left boundary, indicating extremely low false-positive rates and strong separability across categories. The corresponding Precision–Recall curves also show high stability, with average precision (AP) values above 0.98 for all classes. Healthy and Rust achieve the highest AP values (0.9993 and 0.9945, respectively), showing near-perfect precision at all recall levels.

Fig 5: ROC and P-R curves (One-vs-Rest) for DenseNet121 and DenseNet201 models.


       
DenseNet201 demonstrates similarly strong performance. AUC values remain high, with 0.9936 for Chocolate spot, 0.9972 for Gall, 0.9997 for Healthy and 0.9971 for Rust. The ROC curves closely overlap with DenseNet121, confirming consistent model behavior. The Precision-Recall curves also remain sharply peaked near the top-right corner, with AP values above 0.98 for every class. Healthy again shows the strongest curve (AP = 0.9993), while Rust and Gall maintain highly stable precision even at high recall ranges. Overall, both models demonstrate exceptional discrimination ability, with DenseNet121 showing marginally higher AP for Rust, while DenseNet201 remains equally competitive across all classes.
       
Fig 6 presents qualitative prediction examples for both DenseNet121 and DenseNet201, showing correctly classified leaf images across all four categories. DenseNet121 achieves high confidence values for every sample, with confidence scores ranging from 0.9992 to 1.0000, indicating strong certainty in its predictions. The model accurately identifies the visual patterns associated with each disease: dark necrotic patches for Chocolate spot, swollen tissue formations for Gall, circular pustules for Rust and smooth, unaffected foliage for Healthy plants.

Fig 6: Performance of deep learning models based on confidence score distributions.


       
DenseNet201 displays similar reliability, producing confidence scores of 0.9998 to 1.0000 for all tested images. Its predictions remain precise across diverse lighting conditions and leaf textures, reflecting strong feature extraction and generalization. The clear visual differentiation in lesions, along with consistently high prediction certainty, shows that both models successfully learn distinctive disease characteristics. Overall, the qualitative results demonstrate excellent real-world applicability and confirm the strong classification performance observed in the quantitative evaluations.
       
The test dataset contained 1,605 images distributed across four classes and both DenseNet121 and DenseNet201 models were evaluated on the same independent test set under identical:

 
McNemar p-value = 0.499896876860327; Result = No statistically significant difference between models.
       
McNemar’s contingency analysis showed that 1,505 samples were correctly classified by both models, while 43 samples were correctly classified only by DenseNet121 and 36 only by DenseNet201; 21 samples were misclassified by both architectures. The resulting McNemar’s exact test yielded a p-value of 0.4999, indicating no statistically significant difference in classification performance between the two models (p>0.05). Although DenseNet121 exhibited marginally higher accuracy (96.45%) compared to DenseNet201 (96.01%), the dominance of jointly correct predictions and the minimal imbalance between discordant pairs (43 vs. 36) suggest highly similar decision boundaries and comparable generalization capability. These findings confirm that the observed 0.44% accuracy difference does not represent meaningful model superiority under the current experimental configuration. The application of statistical validation strengthens the scientific rigour of the comparison and prevents overinterpretation of minor numerical variations. Future investigations may incorporate full backbone fine-tuning, cross-validation and calibration analysis to further explore subtle architectural advantages and enhance robustness of performance assessment.
       
Previous studies have demonstrated the effectiveness of DenseNet architectures for plant disease detection across various crops and datasets. Wei et al., (2022) reported 96.4% accuracy for DenseNet121 on PlantVillage, showing its consistent performance across multiple crops and devices. Table 2 presents a comparative evaluation of DenseNet-based models for plant leaf disease classification.

Table 2: Comparative performance of DenseNet-based models for plant leaf disease.


       
Bansal et al., (2021) obtained 96.25% accuracy using a DenseNet121-based ensemble for apple leaves, while Odounfa et al., (2025) reported 96.25% classification accuracy for DenseNet121 on chili leaves. Bajpai et al., (2023) achieved 97.78% accuracy by combining DenseNet201 with SVM, outperforming DenseNet121 alone (94%), highlighting the benefits of hybrid approaches.
               
Enhanced architectures, such as DenseNet201Plus for banana and black gram (Mazumder et al., 2024) and hybrid CNN-ViT with DenseNet201 for apple and corn (Aboelenin et al., 2025), achieved higher accuracies, while fine-tuned DenseNet121 on PlantVillage reached 99.81% (Andrew et al., 2022). In comparison, this study applied DenseNet121 and DenseNet201 to a field-collected fava bean dataset, achieving 96.45% and 96.01% accuracy with MCC values of 0.9527 and 0.9470. DenseNet201 converged faster and showed slightly better recall for Rust, whereas DenseNet121 maintained higher overall stability. These results demonstrate that the proposed models provide a practical, field-ready solution for fava bean disease detection, enabling early intervention, minimizing yield losses and supporting precision agriculture practices.
This study demonstrates the effectiveness of deep learning, especially DenseNet architectures, for detecting fava bean leaf diseases from field images. Both DenseNet121 and DenseNet201 performed exceptionally across all metrics. DenseNet121 achieved slightly higher overall accuracy (96.45%) and MCC, while DenseNet201 converged faster and showed stronger recall for Rust. Confusion matrices, ROC-AUC and Precision–Recall curves confirmed clear separability of disease classes. High confidence scores indicate reliable performance in real-world agricultural conditions. Despite these strengths, the study has limitations. The dataset, although extensive, is restricted to specific field conditions and may not cover all environmental variations. Model deployment on resource-limited devices remains challenging due to computational demands. Future work will focus on lightweight or mobile-optimized DenseNet models. Integration with smart farming systems and multi-sensor data could enhance real-time disease monitoring and intervention. These developments would further support precision agriculture and sustainable crop management.
Disclaimers
 
The views and conclusions expressed in this article are solely those of the authors and do not necessarily represent the views of their affiliated institutions. The authors are responsible for the accuracy and completeness of the information provided, but do not accept any liability for any direct or indirect losses resulting from the use of this content.
 
Funding details
 
This research received no external funding.
 
Authors’ contributions
 
All authors contributed toward data analysis, drafting and revising the paper and agreed to be responsible for all the aspects of this work.
 
Data availability
 
The data analysed/generated in the present study will be made available from corresponding authors upon reasonable request.
 
Availability of data and materials
 
Not applicable.
 
Use of artificial intelligence
 
Not applicable.
 
Declarations
 
Authors declare that all works are original and this manuscript has not been published in any other journal.
The authors declare that they have no conflict of interest.
 

  1. Aboelenin, S., Elbasheer, F. A., Eltoukhy, M.M., El-Hady, W.M. and Hosny, K.M. (2025). A hybrid Framework for plant leaf disease detection and classification using convolutional neural networks and vision transformer. Complex and Intelligent Systems. 11(2). https://doi.org/10.1007/ s40747-024-01764-x.

  2. Andrew, J., Eunice, J., Popescu, D.E., Chowdary, M.K. and Hemanth, J. (2022). Deep learning-based leaf disease detection in crops using images for agricultural applications. Agronomy. 12(10): 2395. https://doi.org/10.3390/agronomy12102395.

  3. Bajpai, C., Sahu, R. and Naik, K.J. (2023). Deep learning model for plant-leaf disease detection in precision agriculture. International Journal of Intelligent Systems Technologies and Applications. 21(1): 72. https://doi.org/10.1504/ ijista.2023.130562.

  4. Bakr, M., Abdel-Gaber, S., Nasr, M. and Hazman, M. (2022). DenseNet based model for plant diseases diagnosis. European Journal of Electrical Engineering and Computer Science6(5): 1-9. https://doi.org/10.24018/ejece.2022.6.5.458.

  5. Bansal, P., Kumar, R. and Kumar, S. (2021). Disease detection in apple leaves using deep convolutional neural network. Agriculture. 11(7): 617. https://doi.org/10.3390/agriculture 11070617.

  6. Das, A., Pathan, F., Jim, J.R., Kabir, M.M. and Mridha, M. (2025). Deep learning-based classification, detection and segmentation of tomato leaf diseases: A state-of-the- art review. Artificial Intelligence in Agriculture. 15(2): 192-220. https://doi.org/10.1016/j.aiia.2025.02.006.

  7. Desfita, S., Sari, W., Wahyuni, D., Putri, F., Pramesyanti, P.A., Pato, U., Pratiwi, D., Grzelczyk, J. and Budryn, G. (2025). Synergistic effects of multi-strain probiotic and prebiotic combinations on immune recovery in aging populations. International Journal of Probiotics and Prebiotics. 20: 10-18. https://doi.org/10.37290/ijpp2641-7197.20:10- 18.

  8. Dolatabadian, A., Neik, T.X., Danilevicz, M.F., Upadhyaya, S.R., Batley, J. and Edwards, D. (2024). Image based crop disease detection using machine learning. Plant Pathology. 74(1): 18-38. https://doi.org/10.1111/ppa.14006.

  9. Gasim, S., Hamad, S.A., Abdelmula, A. and Ahmed, I.A.M. (2015). Yield and quality attributes of faba bean inbred lines grown under marginal environmental conditions of Sudan. Food Science and Nutrition. 3(6): 539-547. https://doi.org/10.1002/fsn3.245.

  10. Gupta, C., Gill, N.S., Gulia, P., Duhan, S., Karamti, H., Kumar, A., Alamneh, D.A. and Safra, I. (2025). An enhanced deep learning-based framework for diagnosing apple leaf diseases. Scientific Reports. 15(1): 39699. https:// doi.org/10.1038/s41598-025-23272-9.

  11. Hu, Y., Yang, L., Tong, J., Li, H., Wei, Q. and Chen, H. (2024). Current status and perspectives on the use of traditional Chinese medicine in the treatment of gastric cancer. Current Topics in Nutraceutical Research. 22(4): 1187- 1192. https://doi.org/10.37290/ctnr2641-452X.

  12. Jeong, H.Y. and Na, I.S. (2024). Efficient faba bean leaf disease identification through smart detection using deep convolutional neural networks. Legume Research-An International Journal. 47(8): 1404-1411. doi: 10.18805/LRF-798.

  13. Karkanis, A., Ntatsi, G., Lepse, L., Fernández, J.A., Vågen, I.M., Rewald, B., Alsiòa, I., Kronberga, A., Balliu, A., Olle, M., Bodner, G., Dubova, L., Rosa, E. and Savvas, D. (2018). Faba bean cultivation-revealing novel managing practices for more sustainable and competitive European cropping systems. Frontiers in Plant Science. 9: 1115. https:// doi.org/10.3389/fpls.2018.01115.

  14. Krishna, M.S., Machado, P., Otuka, R.I., Yahaya, S.W., Santos, F.N.D. and Ihianle, I.K. (2025). Plant leaf disease detection using deep learning: A multi-dataset approach. J- Multidisciplinary Scientific Journal. 8(1): 4. https:// doi.org/10.3390/j8010004.

  15. Majdalawieh, M., Martins, C., Radi, M., Alaraj, M. and Khan, S. (2025). Precision agriculture in the age of AI: A systematic review of machine learning methods for crop disease detection. Smart Agricultural Technology. 12: 101491. https://doi.org/10.1016/j.atech.2025.101491.

  16. Mansoor, S., Iqbal, S., Popescu, S.M., Kim, S.L., Chung, Y.S. and Baek, J. (2025). Integration of smart sensors and IOT in precision agriculture: Trends, challenges and future prospectives. Frontiers in Plant Science. 16: 1587869. https://doi.org/10.3389/fpls.2025.1587869.

  17. Mazumder, M.K.A., Kabir, M.M., Rahman, A., Abdullah-Al-Jubair, M. and Mridha, M. (2024). DenseNet201plus: Cost-effective transfer-learning architecture for rapid leaf disease identification with attention mechanisms. Heliyon. 10(15):  e35625. https://doi.org/10.1016/j.heliyon.2024.e35625.

  18. Mehta, A.R., Kumar, P., Prem, G., Aggarwal, S. and Kumar, R. (2025). AI-powered innovations in agriculture: A systematic review on plant disease detection and classification. Indian Journal of Agricultural Research. 59(9): 1321- 1330. doi: 10.18805/IJARe.A-6371.

  19. Mostafa, A., Alnuaim, A. and AlZubi, A.A. (2025). Utilizing convolutional neural networks for accurate detection of leaf diseases in fava beans. Legume Research-An International Journal. 48(3): 494-502. doi: 10.18805/LRF-823.

  20. Ngugi, H.N., Akinyelu, A.A. and Ezugwu, A.E. (2024). Machine learning and deep learning for crop disease diagnosis: Performance analysis and review. Agronomy. 14(12): 3001. https://doi.org/10.3390/agronomy14123001.

  21. Odounfa, M.G.F., Hounmenou, C.G., Salako, V.K., Affokpon, A. and Kakaï, R.L.G. (2025). Deep learning enables precision agriculture for sustainable chili pepper disease detection in benin. Discover Artificial Intelligence. 5(1). https:// doi.org/10.1007/s44163-025-00583-4.

  22. Pacal, I., Kunduracioglu, I., Alma, M.H., Deveci, M., Kadry, S., Nedoma, J., Slany, V. and Martinek, R. (2024). A systematic review of deep learning techniques for plant diseases. Artificial Intelligence Review. 57(11). https://doi.org/10.1007/ s10462-024-10944-7.

  23. Paul, S.K. and Gupta, D.R. (2021). Faba bean (Vicia faba L.), a promising grain legume crop of Bangladesh: A review. Agricultural Reviews. 42(3): 292-299. doi: 10.18805/ag.R-203.

  24. Salau, A.O., Abeje, B.T., Faisal, A.N. and Asfaw, T.T. (2023). Faba Bean Disease Detection using Deep Learning Techniques. In 2023 International Conference on Cyber Management and Engineering (CyMaEn). IEEE. (pp. 344-349). https:// doi.org/10.1109/CyMaEn57228.2023.10051088.

  25. Sharun, K., Banu, S.A., Mamachan, M., Abualigah, L., Pawde, A.M. and Dhama, K. (2024). Unleashing the future: Exploring the transformative prospects of artificial intelligence in veterinary science. Journal of Experimental Biology and Agricultural Sciences. 12(3): 297-317. https://doi.org/10.18006/2024.12(3).297.317.

  26. Tugrul, B., Elfatimi, E. and Eryigit, R. (2022). Convolutional neural networks in detection of plant leaf Diseases: A review. Agriculture. 12(8): 1192. https://doi.org/10.3390/agriculture 12081192.

  27. Wei, S.J., Riza, D.F.A. and Nugroho, H. (2022). Comparative study on the performance of deep learning implementation in the edge computing: Case study on the plant leaf disease identification. Journal of Agriculture and Food Research. 10: 100389. https://doi.org/10.1016/j.jafr.2022.100389.
In this Article
Published In
Legume Research

Editorial Board

View all (0)