Experimental design and fair evaluation protocol
All models were evaluated under identical preprocessing, dataset splits and training conditions. No hyperparameter tuning was performed on the test set to ensure fair comparison.
Quantitative performance evaluation
The comparative performance of all models on the test set is presented in Table 1.
The overall accuracy of the proposed DPAF-Net is the largest one compared with all the models discussed (98.98%). What is more important is that it has the greatest recall (99.27%), thus the false negative rate is reduced.
The importance of recall is especially relevant to the livestock disease detection. The absence of a single ill animal can be the cause of infecting the disease and a false positive will only need a further test. Therefore, the increase in the recall with the proposed model not only represents a statistical advancement but also has an epidemiological meaning.
Despite the competitive results of the form of the Xception-based model implementation
Shakeel et al., (2024) and the good recall of the EfficientNet-B7 model
Girmaw (2025), the proposed model demonstrates the increase in recall with a high precision and a general trade-off.
The confusion matrices (Fig 3) give a class-level perspective on the prediction behavior. Compared to the baseline models, DPAF-Net has fewer false negatives and a low false positive rate. This verifies that the performance improvement is not solely due to accuracy enhancement but also due to better class-level discrimination.
ROC curve analysis
ROC curves were generated to evaluate model discrimination ability across varying thresholds.
As shown in Fig 4, all the models that were tested had AUC values exceeding 0. 99, which means that the models can very well distinguish between the healthy and infected classes. The model that has the highest AUC value is the proposed DPAF-Net, with a value of 0. 9992.
The ROC curve of DPAF-Net is near the top, left corner of the ROC space all the time.
Training dynamics and generalization behavior
To evaluate convergence behavior and optimization stability, validation accuracy and loss trends were analyzed.
The validation accuracy curve (Fig 5) shows smooth and stable convergence. The validation accuracy grows very quickly in the initial epochs and becomes stable after around epoch 8, which shows successful feature learning without any long-lasting instability.
The training and validation loss curve (Fig 6) also verifies that:
• There is a continuous decrease in the training loss.
• There is no divergence between the training and validation loss.
• There is no sign of overfitting.
Comparative confusion matrix analysis
Fig 3 illustrates the confusion matrices of all compared models under the same testing environment.
Based on the analysis of the confusion matrices, the following points can be noted:
•
Raj et al., (2023) has a relatively higher number of false positives and false negatives.
•
Shakeel et al., (2024) has better class-wise performance but still has a slightly higher misclassification rate than the previous models.
•
Girmaw (2025) has a better true positive rate but has a moderate number of false positives.
• The Proposed DPAF-Net has the lowest number of false negatives and a lower number of false positives than all the other models.
In the proposed model:
• True Negatives (Healthy correctly classified): 1062.
• False Positives: 13.
• False Negatives: 5.
• True Positives: 677.
The very low number of false negatives directly results in the high value of recall (99.27%).
The comparison of the confusion matrix verifies that the performance advantage of DPAF-Net is not only in terms of accuracy but also in terms of better class-wise distribution and error reduction.
Failure case analysis
To further understand model limitations, misclassified samples were examined.
Representative misclassification examples are shown in Fig 7.
Observations
False Positives are mainly caused by:
• Irregular patches of pigmentation.
• Strong shadowing.
• Skin discoloration outside the lesion area.
False Negatives are found in:
• Early lesions with subtle texture changes.
• Distant or low-resolution images.
• Mildly visible swelling.
These observations indicate that the errors of the model are mainly due to the ambiguity of the images and not due to instability in the model.
Architectural impact on performance
The reasons for the enhanced performance of DPAF-Net can be summarized as follows:
Feature extraction through dual paths:
• Fine texture details are extracted by EfficientNet-B4.
• Structural and context-based features are extracted by ConvNeXt-Tiny.
Channel attention:
• Boosts the channels of relevant features.
Gated feature fusion:
• Complementary feature maps are dynamically fused.
• Redundancy is minimized, unlike simple concatenation.
The organized fusion strategy allows for more comprehensive multi-scale representation learning, which is presumably responsible for the enhanced recall and AUC values.
Practical and clinical implications
The high recall (99.27%) indicates improved sensitivity for detecting infected cattle, while maintaining high precision. This balance is important for practical livestock disease screening systems.
Summary of findings
The experimental outcomes show that:
• Deep CNN models perform better than traditional feature-based approaches.
• Very deep models enhance recall but can cause increased computational complexity.
• The proposed DPAF-Net model performs well.
• Attention-guided dual-path fusion improves discriminative power and suppresses false negatives.
In conclusion, the results validate the efficiency of adaptive dual-path feature fusion for LSD detection.