Acquisition of datasets
Six cows (Keteku and Muturu breeds) in a ranch were acquired for this study in September 2020. They are the trypanotolerant breeds that are common among the Fulanis in Nigeria and mostly reared for their meat and sometimes as farm tools. Each cow possesses body length and body height of 86.6 cm and 95.0 cm respectively. The laboratory experiment on the acquired data was carried out in the Laboratory of the School of Computer Sciences, Universiti Sains Malaysia, in the year 2021. While Fig 1 shows the system for acquiring datasets in the cattle ranch, Fig 2 shows the video image of the individual cows engaging in feeding and drinking.
Process-flow of cattle behavior recognition
Fig 3 shows the four steps that are involved in this study. The first step comprises the video sequences of group-ranched cattle that were extracted from the camera that was placed on the pole as shown in Fig 1. Data labeling and augmentation implementation were involved in the second step. Afterward, using the principle of transfer learning, and by pre-training some models, and comparing their detection accuracy, the most suitable model was chosen for individual cows detection. Behavior analysis of individual cows takes the final step with the investigation of individual cows’ behavior generating statistical results.
Labeling and augmentation of data
One thousand (1000) keyframes were selected and labeled using LabelMe
(Russell et al., 2008), from which 800 frames were used as training datasets, and 200 frames were used as testing datasets. Data augmentation was applied to our little annotated data to meet the large annotated data required for training the deep learning models. The augmentation generated multiple folds of both training and testing datasets from which 4000 frames were used as training datasets and 1000 frames as testing datasets.
Detection of individual cows
Four pre-trained object detection models, namely Mask R-CNN, Faster R-CNN, YOLOv3 and YOLOv4 were employed as potential detection models. Mask R-CNN
(He et al., 2020; He et al., 2017), an extension of Faster R-CNN added mask generator to the model of Faster R-CNN for better object detection. Using the Mask R-CNN as cow detection model, the generated outputs included bounding box, object class, confidence score and mask. With the other models, the generated outputs included all the aforementioned outputs except the masks.
Eq. (1) is the intersection over union (IoU) for determining the accuracy of the bounding box and the remaining outputs, the equation extends to Eq. (4).
..........(1)
The IoU values from 0.5 to 0.95 with mAP@X notation are considered in this study, where X is the value of the threshold employed to compute the metric. Only after all the matches for the image are established can the precision-recall be computed. Precision is the total number of correct objects that the model produces and it is computed as follows:
..........(2)
A recall measures the total positive objects that the model can produce and it is computed as follows:
..........(3)
Where
True-positive predicted as positive as was correct, false-positive predicted as positive but was incorrect and false-negative failed to predict an object that was there. AP is calculated by taking the area under the PR curve and by segmenting the recalls evenly to different parts. AP is calculated as follows:
..........(4)
Where
N is the calculated number of PR points
.
Cow behavior recognition
The following equations calculate both the cow recognition accuracy and the ratio of misidentification:
..........(5)
..........(6)
Where
b is one type of the behaviors, A
b is behavior recognition accuracy, M
b is the ratio of the number of misidentified behavior to the number of real behavior. G
b is the ground-truth observation of a cow. C
b is the correctly identified behavior. T
b is the total number of one type of behavior that could also represent misidentified behaviors in addition to the correctly identified behaviors.
Analysis of cow behavior recognition
Fig 4 shows the framework for recognizing cattle behaviors. The following steps described the recognition process of the cow’s behavior.
Step 1: Individual cows in the current frame were detected by using the preferred model for cow detection. After validating both the previous and current frames, implementation of Step 2 was performed for cow behavior recognition. If not, the action was carried out on the next frame, thenceforth; the implementation of cow detection was performed from Step 1.
Step 2: Analysis of the relationship of spatial location between the bounding boxes and the ground-truth was performed, and using Eq. (1) through Eq. (4), the IoU was calculated and compared with IoU threshold values from 0.5 to 0.80 with mAP@X notation. In Step 2.1, based on the partial bounding box area ratio, the cow eating and drinking behaviors were established. If not, the emphasis was laid on differentiating between behaviors of the cattle’s activeness and inactiveness as iterated in Step 2.2. The action was carried out on the next frame after recognition of cow in the current frame has been ended, thenceforth; the implementation of cow detection was performed from Step 1.
Step 2.1: Eating and drinking behaviors recognition.
(1) Eating behavior recognition
A comparison was made between the IoU of the bounding box and the threshold value of 0.55 if and only if the bounding box’s horizontal length was greater than its vertical length. Or else, the comparison was made between IoU of the bounding box and a threshold value of 0.60. Afterward, if IoU> threshold value of 0.55 or IoU> threshold value of 0.60, the current behavior was recognized as eating. If not, the emphasis was laid on differentiating between behaviors of the cattle’s activeness and inactiveness as iterated in Step 2.2.
(2) Drinking behavior recognition
A comparison was made between the IoU of the bounding box and the threshold value of 0.65 if and only if the bounding box’s horizontal length was greater than its vertical length. Or else, the comparison was made between the IoU of the bounding box and the threshold value of 0.70. Afterward, if IoU> threshold value of 0.65 or IoU> threshold value of 0.70, the current behavior was recognized as drinking. If not, the emphasis was laid on differentiating between behaviors of the cattle’s activeness and inactiveness as iterated in Step 2.2.
Step 2.2: Activeness and inactiveness of cow behaviors recognition.
Activeness and inactiveness of cow behavior recognition were measured using Eq. (7). This is necessary where the intersection between the bounding box and the troughs was not established or the Step 2.1 conditions were not satisfied.
Where
d was the amount of cow movement which was compared with the threshold value of 0.80 and the activeness of cow behavior was established if
d is greater than the threshold value of 0.80, if not, inactiveness behavior was established. The aforementioned thresholds, that is, 0.5 to 0.80 with mAP@X notation were essential for the cow behavior recognition output. In general, the features of bounding boxes and cow behaviors determine the thresholds and these thresholds were of different values due to different sizes of cow body and the way and manner in which the cow images were captured. All invalid frames were not considered in the experiment as they were all replaced with valid frames.
Intersection over union
Fig 5(a) shows the mask-based position distribution. To ease detection accuracy, the IoU was established as shown in Fig 5(b), where the confidence scores were assigned to individual cows in the frame and the precision-recall was computed only after all the matches for the image were established.