The model’s ability depends on many variables, including the features of the dataset, the selected hyperparameters and the details of the classification work. To get the best outcomes, experimentation and fine-tuning with various designs and settings can be required.
The model’s performance metrics after the 25th training period were as follows: The loss value, representing the average loss computed over the entire set of training samples, was 0.1582 (Fig 5). Better performance was indicated by a lower loss value. The model accurately identified roughly 94.19% of the training data samples, as shown by the accuracy of 0.9419. Regarding validation, the accuracy was 0.9405 and the loss value was 0.1797. These metrics showed that the model had good performance with reasonably low loss values and excellent accuracy on both the training and unseen validation data.
Fig 6 displayed a set of pictures depicting predicted and actual diseases. This provided information about the trained model’s efficacy and degree of confidence. The predictions shown in the images reflected accuracy and certainty in their values. This visual depiction made it easier to determine the model’s effectiveness in identifying diseases.
The model’s performance in identifying various categories was shown visually in the confusion matrix (Fig 7). With only one incorrect classification out of 220 cases, the model demonstrated excellent accuracy in detecting coccidiosis. However, it misclassified 15 Healthy samples, indicating lower accuracy for this category. Newcastle disease showed moderate performance, with multiple misclassifications across different categories. In contrast, the model exhibited very low misclassification rates and performed exceptionally well in recognizing Salmonella. Overall, while the model accurately classified coccidiosis and Salmonella, it struggled more with distinguishing Newcastle disease and Healthy samples.
The parameters for model evaluation are summarized in Table 2. The precision for coccidiosis was high at 97.77%, suggesting a low rate of false positives. Recall, representing the model’s ability to detect true positives, was also strong at 99.55%, indicating that most cases of coccidiosis were correctly identified. The F1-score was similarly high at 98.65%, reflecting well-balanced performance. The support, representing the total number of instances in each class, was 220, showing that the dataset was reasonably balanced.
Conversely, for healthy samples, the recall was slightly lower at 92.79%, indicating some false negatives, but the precision was high at 94.50%, showing accurate positive predictions. As a result, the F1-score was 93.64%. The support for healthy samples was 222, suggesting that the dataset for this class was well-balanced. The precision for Newcastle disease was much lower at 81.82%, indicating a higher rate of false positives. Moreover, the recall was poor at 32.14%, reflecting a high proportion of false negatives. As a result, the disparity between recall and accuracy was evident in the comparatively low F1-score of 46.15%. The support for Newcastle disease was 28, showing fewer cases in this class.
Salmonella had a high recall rate of 98.29%, indicating that real positives were well captured and a high precision rate of 91.63%, suggesting minimal false positives. Consequently, the F1-score stood at 94.85%, demonstrating strong overall performance. The support for Salmonella was 234, suggesting a well-balanced dataset for this class. The model’s overall accuracy across all classes was 94.32%, demonstrating its ability to categorize a large portion of the dataset accurately. The macro average showed balanced performance across classes, with an accuracy of 91.43%, recall of 80.69% and an F1-score of 83.32%. The weighted average had a precision of 94.06%, recall of 94.32% and an F1-score of 93.72%, illustrating the model’s performance while considering class imbalance.
A comparative table (Table 3) presents the findings of this study alongside previous research, highlighting the accuracy levels of different models used for poultry disease detection.
Wang et al., (2019) employed a DCNN-based approach to identify digestive disorders in broiler chickens using Faster R-CNN and YOLO-V3. Their study achieved a recall rate of 99.1% and a mean average precision (mAP) of 93.3% with Faster R-CNN, while YOLO-V3 attained a recall rate of 88.7% and an mAP of 84.3%. Similarly,
Mbelwa et al., (2021) developed a CNN-based model to classify chicken feces into three disease categories, with the XceptionNet model achieving 94% accuracy, slightly outperforming a fully trained CNN model with 93.67% accuracy.
Okinda et al., (2019) used a machine vision system integrating video surveillance and depth cameras to track movement and posture-related features for disease prediction. Their SVM-based model achieved accuracies of 97.5% and 97.8% when incorporating all feature variables.
In comparison, the Sequential CNN model in this study attained an accuracy of 94.32%, which is competitive with the reported CNN-based approaches. However, unlike studies that incorporated additional features such as movement patterns
(Okinda et al., 2019) or optimized anchor boxes
(Wang et al., 2019), our approach relies solely on fecal images. While this simplifies implementation and reduces the need for complex hardware setups, it also introduces limitations, as fecal color alone is not always a reliable disease indicator. The results indicate that model selection and feature engineering significantly impact classification accuracy. Approaches integrating multiple features, such as movement analysis or anchor box optimization, have demonstrated superior performance.
However, there are limitations to the approach. Fecal color as a diagnostic indicator is influenced by various factors, such as diet, stress, lighting conditions and the presence of multiple diseases that cause similar color changes. Moreover, mixed droppings from group-reared poultry make it difficult to attribute specific colors to individual birds. As such, fecal color alone is not a definitive diagnostic tool and should be viewed as a complementary method. Future work will focus on integrating additional parameters, such as clinical symptoms, behavioral observations and microbiological testing, to improve the system’s robustness and reliability. Additionally, expanding the dataset and testing the model in more diverse poultry environments will help enhance its generalization capabilities.