Behavior Recognition of Group-ranched Cattle from Video Sequences using Deep Learning

¹School of Computer Sciences, Universiti Sains Malaysia, 11800, Pulau Pinang, Malaysia.

²Department of Mathematics/Computer Sciences, University of Africa, Toru-Orua, Bayelsa State, Nigeria.

³Department of Information Technology, Federal University, Dutse, Jigawa State, Nigeria.

ABSTRACT

Background: One important indicator for the wellbeing status of livestock is their daily behavior. More often than not, daily behavior recognition involves detecting the heads or body gestures of the livestock using conventional methods or tools. To prevail over such limitations, an effective approach using deep learning is proposed in this study for cattle behavior recognition.

Methods: The approach for detecting the behavior of individual cows was designed in terms of their eating, drinking, active, and inactive behaviors captured from video sequences and based on the investigation of the attributes and practicality of the state-of-the-art deep learning methods.

Result: Among the four models employed, Mask R-CNN achieved average recognition accuracies of 93.34%, 88.03%, 93.51% and 93.38% for eating, drinking, active and inactive behaviors. This implied that Mask R-CNN achieved higher cow detection accuracy and speed than the remaining models with 20 fps, making the proposed approach competes favorably well with other approaches and suitable for behavior recognition of group-ranched cattle in real-time.

KEYWORDS

INTRODUCTION

Recently, the demand for cow meat has increased dramatically (Pedersen, 2018; Nasirahmadi et al., 2017; Velarde et al., 2015). Continuous growth in the human population and increase per capita incomes are the keys that drive this development. Consequently, outcomes that are germane to research on immune response and control of disease in cattle are being attained in genomics contrary to the manual observation employed in the past (Chauhan et al., 2021; Chouhan et al., 2021; Dohare et al., 2021; Thorat et al., 2021; Zaborski and Grzesiak, 2021; Zheng et al., 2018) which greatly relied on the experience of farmers with so many limitations (Yang et al., 2020). Hence, there is a need to devise a recognition method automatic enough to overcome these limitations.

So many attempts have been made using machine vision techniques and deep learning models to provide lasting solutions to these limitations (Bello et al., 2021a; Bello et al., 2021b; Bello et al., 2020b; Bello et al., 2020a; Zheng et al., 2018; Kim et al., 2017; Lao et al., 2016; Saberioon and Cisar, 2016; Stavrakakis et al., 2015; Kashiha et al., 2014). Some of these techniques are the Faster R-CNN (Ren et al., 2017; Simonyan and Zisserman, 2015), improved Filter Layer-based YOLOv3 (Jiang et al., 2019; Redmon and Farhadi, 2018), YOLOv4 (Bochkovskiy et al., 2020). However, none of the above techniques could accurately recognize cattle in terms of their eating, drinking, active, and inactive behaviors, thereby motivating this study to devise a technique for studying the behavior recognition of group-ranched cattle from video sequences using deep learning.

MATERIALS AND METHODS

Acquisition of datasets

Six cows (Keteku and Muturu breeds) in a ranch were acquired for this study in September 2020. They are the trypanotolerant breeds that are common among the Fulanis in Nigeria and mostly reared for their meat and sometimes as farm tools. Each cow possesses body length and body height of 86.6 cm and 95.0 cm respectively. The laboratory experiment on the acquired data was carried out in the Laboratory of the School of Computer Sciences, Universiti Sains Malaysia, in the year 2021. While Fig 1 shows the system for acquiring datasets in the cattle ranch, Fig 2 shows the video image of the individual cows engaging in feeding and drinking.

Fig 1: System for acquiring datasets in the cattle ranch.

Fig 2: Video image of cattle in the ranch.

Process-flow of cattle behavior recognition

Fig 3 shows the four steps that are involved in this study. The first step comprises the video sequences of group-ranched cattle that were extracted from the camera that was placed on the pole as shown in Fig 1. Data labeling and augmentation implementation were involved in the second step. Afterward, using the principle of transfer learning, and by pre-training some models, and comparing their detection accuracy, the most suitable model was chosen for individual cows detection. Behavior analysis of individual cows takes the final step with the investigation of individual cows’ behavior generating statistical results.

Fig 3: Process-flow of cattle behavior recognition.

Labeling and augmentation of data

One thousand (1000) keyframes were selected and labeled using LabelMe (Russell et al., 2008), from which 800 frames were used as training datasets, and 200 frames were used as testing datasets. Data augmentation was applied to our little annotated data to meet the large annotated data required for training the deep learning models. The augmentation generated multiple folds of both training and testing datasets from which 4000 frames were used as training datasets and 1000 frames as testing datasets.

Detection of individual cows

Four pre-trained object detection models, namely Mask R-CNN, Faster R-CNN, YOLOv3 and YOLOv4 were employed as potential detection models. Mask R-CNN (He et al., 2020; He et al., 2017), an extension of Faster R-CNN added mask generator to the model of Faster R-CNN for better object detection. Using the Mask R-CNN as cow detection model, the generated outputs included bounding box, object class, confidence score and mask. With the other models, the generated outputs included all the aforementioned outputs except the masks.

Eq. (1) is the intersection over union (IoU) for determining the accuracy of the bounding box and the remaining outputs, the equation extends to Eq. (4).

..........(1)

The IoU values from 0.5 to 0.95 with mAP@X notation are considered in this study, where X is the value of the threshold employed to compute the metric. Only after all the matches for the image are established can the precision-recall be computed. Precision is the total number of correct objects that the model produces and it is computed as follows:

..........(2)

A recall measures the total positive objects that the model can produce and it is computed as follows:

..........(3)

Where
True-positive predicted as positive as was correct, false-positive predicted as positive but was incorrect and false-negative failed to predict an object that was there. AP is calculated by taking the area under the PR curve and by segmenting the recalls evenly to different parts. AP is calculated as follows:

..........(4)

Where
N is the calculated number of PR points.

Cow behavior recognition

The following equations calculate both the cow recognition accuracy and the ratio of misidentification:

..........(5)

..........(6)

Where
b is one type of the behaviors, A_b is behavior recognition accuracy, M_b is the ratio of the number of misidentified behavior to the number of real behavior. G_b is the ground-truth observation of a cow. C_b is the correctly identified behavior. T_b is the total number of one type of behavior that could also represent misidentified behaviors in addition to the correctly identified behaviors.

Analysis of cow behavior recognition

Fig 4 shows the framework for recognizing cattle behaviors. The following steps described the recognition process of the cow’s behavior.

Fig 4: Framework for recognizing cattle behavior.

Step 1: Individual cows in the current frame were detected by using the preferred model for cow detection. After validating both the previous and current frames, implementation of Step 2 was performed for cow behavior recognition. If not, the action was carried out on the next frame, thenceforth; the implementation of cow detection was performed from Step 1.

Step 2: Analysis of the relationship of spatial location between the bounding boxes and the ground-truth was performed, and using Eq. (1) through Eq. (4), the IoU was calculated and compared with IoU threshold values from 0.5 to 0.80 with mAP@X notation. In Step 2.1, based on the partial bounding box area ratio, the cow eating and drinking behaviors were established. If not, the emphasis was laid on differentiating between behaviors of the cattle’s activeness and inactiveness as iterated in Step 2.2. The action was carried out on the next frame after recognition of cow in the current frame has been ended, thenceforth; the implementation of cow detection was performed from Step 1.

Step 2.1: Eating and drinking behaviors recognition.

(1) Eating behavior recognition

A comparison was made between the IoU of the bounding box and the threshold value of 0.55 if and only if the bounding box’s horizontal length was greater than its vertical length. Or else, the comparison was made between IoU of the bounding box and a threshold value of 0.60. Afterward, if IoU> threshold value of 0.55 or IoU> threshold value of 0.60, the current behavior was recognized as eating. If not, the emphasis was laid on differentiating between behaviors of the cattle’s activeness and inactiveness as iterated in Step 2.2.

(2) Drinking behavior recognition

A comparison was made between the IoU of the bounding box and the threshold value of 0.65 if and only if the bounding box’s horizontal length was greater than its vertical length. Or else, the comparison was made between the IoU of the bounding box and the threshold value of 0.70. Afterward, if IoU> threshold value of 0.65 or IoU> threshold value of 0.70, the current behavior was recognized as drinking. If not, the emphasis was laid on differentiating between behaviors of the cattle’s activeness and inactiveness as iterated in Step 2.2.

Step 2.2: Activeness and inactiveness of cow behaviors recognition.

Activeness and inactiveness of cow behavior recognition were measured using Eq. (7). This is necessary where the intersection between the bounding box and the troughs was not established or the Step 2.1 conditions were not satisfied.

Where
d was the amount of cow movement which was compared with the threshold value of 0.80 and the activeness of cow behavior was established if d is greater than the threshold value of 0.80, if not, inactiveness behavior was established. The aforementioned thresholds, that is, 0.5 to 0.80 with mAP@X notation were essential for the cow behavior recognition output. In general, the features of bounding boxes and cow behaviors determine the thresholds and these thresholds were of different values due to different sizes of cow body and the way and manner in which the cow images were captured. All invalid frames were not considered in the experiment as they were all replaced with valid frames.

Intersection over union

Fig 5(a) shows the mask-based position distribution. To ease detection accuracy, the IoU was established as shown in Fig 5(b), where the confidence scores were assigned to individual cows in the frame and the precision-recall was computed only after all the matches for the image were established.

Fig 5: (a) Mask-based position distribution, (b) Mask-based coordinate system

RESULTS AND DISCUSSION

Table 1 shows the description of the experimental video clips. Table 2 shows both the software and hardware employed for the four models. Table 3 shows the training parameters for the four models with one category. The preliminary experiments confirmed the suitability of the models for cow behavior recognition considering the threshold values.

Table 1: Description of experimental video clips.

Table 2: Experimental software and hardware.

Table 3: Model hyper-parameters.

Fig 6 and Fig 7 show the ratio of the valid frame in the four models employed for this study. When Mask R-CNN, Faster R-CNN, YOLOv3 and YOLOv4 were used for cow detection respectively, Mask R-CNN_1, Faster R-CNN_1, YOLOv3_1 and YOLOv4_1 symbolized the ratio of valid frames to entire frames in video 1. Likewise, when Mask R-CNN, Faster R-CNN, YOLOv3 and YOLOv4 were used for cow detection respectively, Mask R-CNN_4, Faster_4, YOLOv3_4 and YOLOv4_4 symbolized the ratio of valid frames to entire frames in video 4.

Fig 6: Ratio of valid frame under different thresholds for generating a bounding box in feeding scenario.

Fig 7: Ratio of valid frame under different thresholds for generating a bounding box in non-feeding scenario.

Fig 6 and Fig 7 show the upward and downward movement of the four models. As presented in Table 4-7, comparing the models’ performance on cow detection in feeding and non-feeding scenarios, the results achieved in the non-feeding scenario were fairly equal to the results achieved in the feeding scenario due to the continuous morphological change in postures and behaviors exhibited by the cattle in the feeding scenario.

Precision and speed of the cow detection

The experiment for detecting cows was carried out on the feeding scenario video clips (video 1 and video 2) and non-feeding scenario video clips (video 4 and video 5). The ratio of valid frame of the six video clips of both feeding and non-feeding scenarios handled by the four models for cow detection is shown in Fig 6 and Fig 7 respectively, with Mask R-CNN and YOLOv4 achieving higher detection accuracies. The detection speed of Mask R-CNN, Faster R-CNN, YOLOv3 and YOLOv4 were 20 fps, 15 fps, 6 fps and 10 fps respectively. Therefore, Mask R-CNN was selected for the cow detection problem with a threshold value of 0.80 for bounding box generation.

Cow behavior recognition

Table 4-7 show the detailed results of the several cow behavior recognition experiments performed in this study using Eq. (5) and Eq. (6) with average recognition accuracies of 93.34%, 88.03%, 93.51% and 93.38% achieved by Mask R-CNN for eating, drinking, active and inactive behaviors recognition respectively, making our approach competes favorably well with the works of (Fuentes et al., 2020), (Jiang et al., 2020), (Jingqui et al., 2017), (Yang et al., 2018b), (Shen et al., 2020) and (Zhu et al., 2017).

Table 4: Analysis of eating behavior recognition.

Table 5: Analysis of drinking behavior recognition.

Table 6: Analysis of active behavior recognition.

Table 7: Analysis of inactive behavior recognition.

However, due to the uncontrollable contributory factors such as invalid frames, cattle overlapping and instability in the cattle feeding scenario, lots of cattle feeding and non-feeding behavior scenarios were misidentified.

CONCLUSION

Deep learning has been proposed in this study for recognizing group-ranched cattle behaviors from video sequences. Mask R-CNN and three other models, namely Faster R-CNN, YOLOv3 and YOLOv4 were employed as models to experiment with different behavior recognition scenarios such as eating, drinking, active and inactive behaviors. Mask R-CNN showed higher behavior recognition accuracy than other models under the behavior recognition scenarios with 20 fps. Future work includes mitigating the uncontrollable contributory factors that led to the misidentification of some behavior recognition scenarios.

REFERENCES

Bello, R., Talıb, A. and Mohamed, A. (2020a). Deep learning-based architectures for recognition of cow using cow nose image pattern. Gazi University Journal of Science. 33: 831-44. DOI: https://doi.org/10.35378/gujs.605631.

Bello, R.W., Olubummo, D.A., Seiyaboh, Z., Enuma, O.C., Talib, A.Z. and Mohamed, A.S.A. (2020b). Cattle identification: the history of nose prints approach in brief. In IOP Conference Series: Earth and Environmental Science. 594: 1-9. DOI: 10.1088/1755-1315/594/1/012026.

Bello, R.W., Talib, A.Z. and Mohamed, A.S.A. (2021a). Contour extraction of individual cattle from an image using enhanced Mask R-CNN instance segmentation method. IEEE Access. 9: 56984-57000. DOI: 10.1109/ACCESS.2021. 3072636.

Bello, R.W., Talib, A.Z.H. and Mohamed, A.S.A.B. (2021b). Deep belief network approach for recognition of cow using cow nose image pattern. Walailak Journal of Science and Technology. 18: 1-14. DOI: https://doi.org/10.48048/wjst.2021.8984.

Bochkovskiy, A., Wang, C.Y. and Liao, H.Y.M. (2020). YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. 1-17.

Chauhan, J.H., Hadiya, K.K., Dhami, A.J. and Sarvaiya, N.P. (2021). Ovarian dynamics, plasma endocrine profile and fertility response following synchronization protocols in crossbred cows with cystic ovaries. Indian Journal of Animal Research. 55: 127-133. DOI: 10.18805/ijar.B-3944.

Chouhan, D., Aich, R., Jain, R.K. and Chhabra, D. (2021). Acute phase protein as biomarker for diagnosis of sub-clinical mastitis in cross-bred cows. Indian Journal of Animal Research. 55: 193-198. DOI: 10.18805/ijar.B-3943.

Dohare, A.K., Bangar, Y.C., Sharma, V.B. and Verma, M.R. (2021). Modelling the effect of mastitis on milk yield in dairy cows using covariance structures fitted to repeated measures. Indian Journal of Animal Research. 55: 11-14. DOI: 10.18805/ijar.B-3919.

Fuentes, A., Yoon, S., Park, J. and Park, D.S. (2020). Deep learning- based hierarchical cattle behavior recognition with spatio- temporal information. Computers and Electronics in Agriculture. 177: 1-11. DOI: https://doi.org/10.1016/j.compag. 2020. 105627.

He, K., Gkioxari, G., Dollár, P., and Girshick, R. (2020). Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence. 42: 386-397. DOI: 10.1109/TPAMI.2018. 2844175.

He, K., Gkioxari, G., Dollár, P. and Girshick, R. (2017). Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961-2969. DOI: 10.1109/ICCV.2017.322.

Jiang, B., Wu, Q., Yin, X., Wu, D., Song, H. and He, D. (2019). FLYOLOv3 deep learning for key parts of dairy cow body detection. Computers and Electronics in Agriculture. 166: 1-8. DOI: https://doi.org/10.1016/j.compag.2019.104982.

Jiang, M., Rao, Y., Zhang, J. and Shen, Y. (2020). Automatic behavior recognition of group housed goats using deep learning. Computers and Electronics in Agriculture. 177: 1-13. DOI: https://doi.org/10.1016/j.compag.2020.105706.

Jingqiu, G., Zhihai, W., Ronghua, G. and Huarui, W. (2017). Cow behavior recognition based on image analysis and activities. International Journal of Agricultural and Biological Engineering.10: 165-174. DOI: 10.3965/j.ijabe. 2017 1003.3080.

Kashiha, M.A., Bahr, C., Ott, S., Moons, C.P.H., Niewold, T.A., Tuyttens, F. and Berckmans, D. (2014). Automatic monitoring of pig locomotion using image analysis. Livestock Science. 159: 141-148. https://doi.org/10.1016/j.livsci.2013.11.007.

Kim, J., Chung, Y., Choi, Y., Sa, J., Kim, H., Chung, Y., Park, D. and Kim, H. (2017). Depth based detection of standing-pigs in moving noise environments. Sensors. 17: 1-19. DOI: https://doi.org/10.3390/s17122757.

Lao, F., Brown-Brandl, T., Stinn, J.P., Liu, K., Teng, G. and Xin, H. (2016). Automatic recognition of lactating sow behaviors through depth image processing. Computers and Electronics in Agriculture. 125: 56-62. DOI: https://doi.org/10.1016/ j.compag.2016.04.026.

Nasirahmadi, A., Hensel, O., Edwards, S.A. and Sturm, B. (2017). A new approach for categorizing pig lying behaviour based on a Delaunay triangulation method. Animal. 11: 131-139. DOI: 10.1017/S1751731116001208.

Pedersen, L.J. (2018). Overview of commercial pig production systems and their main welfare challenges, Advances in Pig Welfare. 1-23. DOI: https://doi.org/10.1016/B978-0-08-1010129.00001-0.

Redmon, J. and Farhadi, A. (2018). YOLOv3: An incremental improvement. arXiv Prepr. arXiv 1804.02767, 1-6.

Ren, S., He, K., Girshick, R. and Sun, J. (2017). Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39: 1137-1149. DOI: 10.1109/TPAMI. 2016.2577031.

Russell, B.C., Torralba, A., Murphy, K.P. and Freeman, W.T. (2008). LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision. 77: 157-173. DOI: https://doi.org/10.1007/s11263-007-0090-8.

Saberioon, M.M. and Cisar, P. (2016). Automated multiple fish tracking in three-dimension using a structured light sensor. Computers and Electronics in Agriculture. 121: 215-221. DOI: https://doi.org/10.1016/j.compag.2015.12.014.

Shen, W., Cheng, F., Zhang, Y., Wei, X., Fu, Q. and Zhang, Y. (2020). Automatic recognition of ingestive-related behaviors of dairy cows based on triaxial acceleration. Information Processing in Agriculture. 7: 427-443. DOI: https://doi.org/10.1016/j.inpa.2019.10.004.

Simonyan, K. and Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv 1409.1556. 1-14.

Stavrakakis, S., Li, W., Guy, J.H., Morgan, G., Ushaw, G., Johnson, G.R. and Edwards, S.A. (2015). Validity of the Microsoft Kinect sensor for assessment of normal walking patterns in pigs. Computers and Electronics in Agriculture. 117: 1-7. DOI: https://doi.org/10.1016/j.compag.2015.07.003.

Thorat, A.B., Borikar, S.T., Siddiqui, M.F.M.F., Rajurkar, S.R., Moregaonkar, S.D., Ghorpade, P.B., and Khawale, T.S. (2021). A study on occurrence and haemato-biochemical alterations in SARA in cattle treated with different therapeutic regimens. Indian Journal of Animal Research. 55: 90-95. DOI: 10.18805/IJAR.B-3929.

Velarde, A., Fàbrega, E., Blanco-Penedo, I. and Dalmau, A. (2015). Animal welfare towards sustainability in pork meat production. Meat Science. 109: 13-17. DOI: https://doi.org/10.1016/j.meatsci.2015.05.010.

Yang, A., Huang, H., Zheng, B., Li, S., Gan, H., Chen, C., Yang, X. and Xue, Y. (2020). An automatic recognition framework for sow daily behaviours based on motion and image analyses. Biosystems Engineering. 192: 56-71. DOI: https://doi.org/10.1016/j.biosystemseng.2020.01.016.

Yang, Q., Xiao, D. and Lin, S. (2018b). Feeding behavior recognition for group-housed pigs with the Faster R-CNN. Computers and Electronics in Agriculture. 155: 453-460. DOI: https://doi.org/10.1016/j.compag.2018.11.002.

Zaborski, D. and Grzesiak, W. (2021). Utilization of boosted classification trees for the detection of cows with conception difficulties. Indian Journal of Animal Research. 55: 359- 363. DOI: 10.18805/ijar.B-1103.

Zheng, C., Zhu, X., Yang, X., Wang, L., Tu, S. and Xue, Y. (2018). Automatic recognition of lactating sow postures from depth images by deep learning detector. Computers and Electronics in Agriculture. 147: 51-63. DOI: https://doi.org/10.1016/j.compag.2018.01.023.
1. Zhu, W., Guo, Y., Jiao, P., Ma, C. and Chen, C. (2017). Recognition and drinking behaviour analysis of individual pigs based on machine vision. Livestock Science. 205: 129-136. DOI: .

Disclaimer :

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Copyright :

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Indian Journal of Animal Research

Research Article

Behavior Recognition of Group-ranched Cattle from Video Sequences using Deep Learning

ABSTRACT

KEYWORDS

INTRODUCTION

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

REFERENCES

Reviewed By

In this Article

APC

Publish With US

Become a Reviewer/Member

Open Access

Products and Services

Support and Policies

Editorial Board

Research Article

​Behavior Recognition of Group-ranched Cattle from Video Sequences using Deep Learning

ABSTRACT

KEYWORDS

INTRODUCTION

MATERIALS AND METHODS

RESULTS AND DISCUSSION

CONCLUSION

REFERENCES

Reviewed By

In this Article

APC

Publish With US

Become a Reviewer/Member

Open Access

Products and Services

Support and Policies

Editorial Board

Behavior Recognition of Group-ranched Cattle from Video Sequences using Deep Learning