The results of the training process are presented in Fig 4, which provides insight into the model’s performance over the epochs. Initially, both training and validation losses for box, class and distribution focal loss (DFL) show a downward trend, indicating that the model is learning effectively. For instance, the training box loss decreased from 1.5152 in the first epoch to 1.2224 by the twentieth epoch, while class loss reduced from 1.0523 to 0.61601. This decline signifies improved model accuracy in localizing and classifying objects.
Metrics such as precision, recall and mean Average Precision (mAP) also reveal significant progress [Fig 5 (a, b)]. Starting with a low precision of 0.07882 and recall of 0.13051 in the first epoch, these values improved substantially, reaching 0.9438 and 0.90257, respectively, by the twentieth epoch. The mAP metrics, including mAP @ 50 and mAP @ 50-95, further highlight the model’s capability to detect objects across various thresholds. The mAP@50 increased from 0.03891 to 0.95817, illustrating a robust enhancement in detection performance.
The confusion matrix provides valuable insights into the model’s performance [Fig 5(c)]. For the tiger class, the model identified 544 true positives. This indicates a strong ability to detect tigers in the dataset. However, there were also 38 false negatives, meaning the model missed 38 tiger instances. In the background category, the model recorded 18 false positives. This shows that it incorrectly classified 18 images as background when they were not. These results highlight the model’s strengths and weaknesses. The high number of true positives reflects its effectiveness in detecting tigers. However, the false negatives suggest that the model could improve its sensitivity. Additionally, the false positives in the background category indicate a need for further refinement. Overall, the confusion matrix complements the earlier metrics.
This dataset included a diverse range of conditions, such as varying lighting and backgrounds. Fig 6 showcases a selection of these detections, highlighting the model’s effectiveness in identifying tigers in different environments.
The present work shows good results compared to the existing literature.
Dave et al., (2023) presented a deep-learning model designed to track wild animals in real time using camera footage. The study focused on detecting four animal categories: Lions, Tigers, Leopards and Bears. The authors created a dataset of 1,619 annotated images sourced from documentaries and YouTube videos. They trained three YOLOv8 models: medium, large and extra-large. The extra-large model achieved a mean Average Precision (mAP) of 94.3% and could detect animals in real-time at 20 frames per second (FPS).
Wu et al., (2024) proposed a method for identifying individual Amur tigers using an improved InceptionResNetV2 model. Initially, YOLOv5 detected and segmented facial and stripe areas from 107 tiger images, achieving 97.3% accuracy. Enhancements such as a dropout layer and dual-attention mechanism improved feature capture and reduced overfitting. The model reached an average recognition accuracy of 95.36%, with left stripes at 99.37%. This research provided a practical solution for identifying rare animals and supported conservation efforts.
Rančić et al. (2023) aimed to detect and count deer populations in northwestern Serbia using UAV images and deep neural networks. They compared several architectures, including three YOLO versions and a Single Shot Multibox Detector (SSD), trained on a manually annotated dataset. The best results showed a mean average precision of up to 70.45% and confidence scores of 99%. YOLOv4 achieved the highest precision (86%) and recall (75%), while its compressed version, YOLOv4-tiny, had a counting error of 7.1%.
Prabhu et al., (2022) introduced RescueNet, a YOLO-based deep learning model designed for detecting and counting flood survivors in disaster-stricken areas. The model demonstrated high effectiveness, achieving a precision of 0.98, recall of 0.97, F1-score of 0.94 and mean average precision (mAP) of 98%.
Senbagam and Bharathi (2024) aimed to develop a highly accurate object detection system using the YOLO algorithm for wildlife conservation. The study achieved a mean average precision (mAP) of 93.8% in detecting and identifying animal species under varying weather conditions.
Naresh et al., (2023) developed a machine learning model using the YOLO algorithm to identify harmful snakes, aiming to help farmers recognize and avoid them. The algorithm achieved a precision of 87% in detecting snakes, facilitating real-time identification and improving safety in agricultural environments. Table 2 compares the performance of various object detection models reported in the literature with the proposed YOLOv8-based approach.
The presented work successfully used the YOLOv8 detection method to achieve an mAP of 94.4% for Amur tiger identification. The focus on this critically endangered species aids targeted conservation efforts. It also highlights the potential of advanced object detection technologies in wildlife monitoring.
Despite the strong performance demonstrated by the YOLOv8 model, certain limitations remain. The dataset size for Amur tiger detection is relatively limited, which may restrict the model’s generalizability across broader geographic regions and more complex ecological scenarios. Variations in extreme lighting, heavy occlusion, dense vegetation and motion blur may also affect detection accuracy, as reflected in the observed false negatives. Future improvements will include expanding the dataset with more diverse real-world images, incorporating data augmentation techniques tailored for low-light and occluded conditions and integrating temporal information from video sequences to enhance detection consistency. Additionally, exploring ensemble models or hybrid approaches combining YOLOv8 with attention mechanisms or transformer-based networks will further improve robustness and sensitivity, particularly for rare and cryptic wildlife species.