Indian Journal of Animal Research

  • Chief EditorK.M.L. Pathak

  • Print ISSN 0367-6722

  • Online ISSN 0976-0555

  • NAAS Rating 6.50

  • SJR 0.263

  • Impact Factor 0.4 (2024)

Frequency :
Monthly (January, February, March, April, May, June, July, August, September, October, November and December)
Indexing Services :
Science Citation Index Expanded, BIOSIS Preview, ISI Citation Index, Biological Abstracts, Scopus, AGRICOLA, Google Scholar, CrossRef, CAB Abstracting Journals, Chemical Abstracts, Indian Science Abstracts, EBSCO Indexing Services, Index Copernicus

Comparative Efficiency of Conventional and Connectionist Models in Prediction of First Lactation 305-day Milk Yield in Murrah Buffaloes

Ekta Rana1,*, Ashok Kumar Gupta1, Anand Prakash Ruhil2, Aneet Kour1, Shweta Mall1, Gedam Ete1
1Division of Animal Genetics and Breeding, ICAR-National Dairy Research Institute, Karnal-132 001, Haryana, India.
2Dairy Economics, Division of Statistics and Management, ICAR-National Dairy Research Institute, Karnal-132 001, Haryana, India.

Background: Breeding programme is mainly structured and evaluated on the basis of first lactation 305-day milk yield (FL305DMY) of dairy animals. Early prediction of 305-day milk yield using test-day records is crucial for early evaluation of elite animals and to reduce the cost of data recording and animal rearing. 

Methods: In the study, the prediction efficiency of conventional models viz. centering date method (CDM), test interval method (TIM), ratio method (RM) and multiple linear regression (MLR) was compared with the newly evolved machine learning connectionist model named Artificial Neural Network (ANN). Data on 3,850 monthly test-day milk yield (MTDY) records of 809 Murrah buffaloes were utilized for the prediction of FL305DMY. The prediction efficiency of the models was compared based on absolute error, average error, root mean square error (RMSE) and their respective percentages. An attempt was thereafter made for the early-stage prediction of FL305DMY. 

Result: MLR was identified as the most accurate model with least error in prediction (4.19% RMSE), followed by ANN model (4.28% RMSE). The prediction accuracy for the regression equation incorporating all the 11 MTDY records was found to be 95.68 per cent. The optimal regression equation for early-stage prediction of FL305DMY consisted of four variables viz. MTDY-3 (65th day), MTDY-4 (95th day), MTDY-5 (125th day) and MTDY-7 (185th day) showed a R2 value of 87.02 per cent. It was inferred from the study that the most effective early-stage prediction of FL305DMY could be achieved by 185th day of lactation, offering a valuable tool for early and efficient selection of elite animals using monthly test-day milk yield records.

In breed improvement programme, genetically elite sires are predominantly selected and evaluated on the basis of 305-day milk yield records of their daughters. However, working on complete 305-day milk yield records of the progenies increases the generation interval and reduces the genetic response per unit time. This also necessitates the recording of milk yield data of the progenies on a daily basis (Dongre et al., 2012). Nevertheless, daily recording of the data is laborious, costly affair, time-taking and practically a challenging endeavour to execute under real-world field conditions. Previous studies revealed that test-day records could act as a potential substitute for daily data recording because of the high genetic association between test-day records and complete 305-day milk yield record of dairy animals (Torshizi and Mashhadi, 2015; Rana et al., 2021a). The statistical model formulated using test-day records has significant advantages: (i) considers the genetic and associated environmental factors, (ii) capable of evaluating the animals at an early stage resulting in decreased generation interval and increased genetic response per unit time and (iii) reduces time and expenditure for the study (Kaygisiz, 2013).
       
Prediction of 305-day milk yield records of dairy animals based on test-day milk yield records could enrich the selection and evaluation of elite sires at an early stage (Lidauer et al., 2003). Over the years, several conventional methods have been utilized by researchers to predict the first lactation 305-day milk yield (FL305DMY) of dairy animals based on test-day milk yield records. Conventional methods work on statistical models which require a predefined algorithm for computational transformation (Sharma et al., 2006). Recently, in this era of artificial intelligence, a machine learning connectionist model which is inspired by the working principles of biological neural network (human brain and its nerve cells) called Artificial Neural Network (ANN) has been introduced in the field of dairying to facilitate the prediction studies (Sharma et al., 2013). ANN learns itself from the given set of data and can even be applied to complex, non-linear, ambiguous and noisy data (Guevara et al., 2023). Milk yield is typically represented in the form of a non-linear lactation curve, therefore, ANN can be effectively utilized for the prediction of 305-day milk yield of dairy animals.
       
The information on the prediction of FL305DMY of buffaloes based on conventional and connectionist models is scanty till date. Therefore, the present study was conducted to predict the FL305DMY of Murrah buffaloes based on conventional viz. Centering Date Method (CDM), Test Interval Method (TIM), Ratio Method (RM) and Multiple Linear Regression (MLR); and connectionist viz. Artificial Neural Network (ANN) models using monthly test-day milk yield (MTDY) records. The prediction accuracy obtained by the models was then compared. Also, an attempt was made to evaluate the animals at an early stage by predicting the FL305DMY based on the optimal combination of early test-day milk yield records.
Data considered for the study comprised of 3,850 first lactation monthly test-day milk yield (MTDY) records of 809 Murrah buffaloes that calved at ICAR-National Dairy Research Institute (NDRI), Karnal, India. Processing of the raw data involved standardization followed by normalisation of the data. The raw data depicting lactation length less than 100 days, lactation yield under 900 kg and interrupted in the middle of lactation due to culling or death of the animal were excluded from the study. The cases of abortion, still-birth, or any other pathological conditions were deemed abnormal, hence, such records were also discarded from the data set used for the prediction analysis. The standardized data set was then subjected to normalisation by excluding the outliers beyond three standard deviations on both the tail ends of normally distributed data. A total of 11 MTDY records were taken from each animal at an interval of 30 days, viz. MTDY-1 on 6th day, MTDY-2 on 35th day, MTDY-3 on 65th day, MTDY-4 on 95th day, MTDY-5 on 125th day, MTDY-6 on 155th day, MTDY-7 on 185th day, MTDY-8 on 215th day, MTDY-9 on 245th day, MTDY-10 on 275th day and MTDY-11 was recorded on 305th day.
       
The prediction of first lactation 305-day milk yield (FL305DMY) was performed utilizing four conventional models, namely, centering date method, test interval method, ratio method and multiple linear regression.
 
Centering date method (CDM)
 
Production credits (PN) were calculated for each individual interval, thereafter, these production credits were added up to estimate the overall 305-day milk yield of the animal (O’connor and Lipton, 1960; Likhi et al., 1995).
For the first and last test-day interval,
 
PN = (DIM + ½ LI) Pn
 
For the intervening test-day intervals,
 
PN = (LI) (Pn)
 
Where,
DIM= Days from the first day of lactation to the first test-day in case of first test-day interval and days between last test-day and terminal day of lactation in case of  last test-day interval.
LI= Sampling interval
Pn= Milk yield on nth test-day.
 
Test interval method (TIM)
 
Like CDM, production credits were calculated for each test-day interval, followed by summing up of all the production credits to predict the overall lactation yield (Sargent et al., 1968; Likhi et al., 1995). The formulae used were basically the same as employed in CDM model, with the exception that the calculation of sampling intervals differed. For the first and last test-day intervals, sampling interval was calculated as the length of the first and last test-day period, respectively. For nth intervening test-day intervals, the sampling intervals were calculated as:
 
LI = ½ (DIMn+1 - DIMn-1)
 
Where,
DIMn+1 and DIMn-1= The days in milk up to and including the proceeding (n+1)th and preceding (n-1)th   test-day, respectively.
 
Ratio method (RM)
 
The complete 305-day milk yield was estimated by summing up the product of each test-day milk yield with its respective ratio factor (Dass and Sadana, 2003). Ratio factor (R) was calculated as the ratio of average 305-day milk yield to average test-day milk yield.
 
 
 
Where,
= Estimated 305-day milk yield of the ith animal.
Xi= Test-day milk yield of ith animal”.
 
Multiple linear regression (MLR)
 
Prediction equations based on MLR analysis were formulated by estimating the regression coefficients of respective test-day milk yield records in different combinations. The MLR analysis was conducted using statistical analysis system (SAS) enterprise guide version 4.3, 2003 software. Stepwise backward multiple linear regression analysis was performed to estimate the 305-day milk yield of the animal (Dongre et al., 2012; Rana et al., 2021b). The equations formulated were based on the following formula:
 
 
 
Where,
= Estimated first lactation 305-day milk yield of ith animal. xi= Test-day milk yield of ith animal.
a= Intercept
bi = Regression coefficient of first lactation 305-day milk yield on test-day record.
       
The coefficient of determination (R2), also denoted as the accuracy of fitting the regression models, was calculated by using the following formula: 
 
  
       
The above-mentioned conventional models were subjected to a comparative evaluation against the newly evolved machine learning connectionist model known as artificial neural network.
 
Artificial neural network (ANN)
 
It represented an intelligent data processing system that learned itself from the presented data set. The main constituents of the constructed ANN model were input layers, hidden layer(s) and an output layer. Each layer served a distinct role in the implementation of the neural network. In the back-propagation technique, the signals were sent forward while the errors were propagated backward. The network was trained using input variables and the corresponding target variable until it could effectively approximate a prediction function (Ruhil et al., 2013; Akilli and Hülya, 2020). An extensive study was carried out to predict the first lactation 305-day milk yield using the 11 monthly test-day milk yield records as input variables.
       
Weka software version 3.8.0 was utilized to develop a multilayer feed-forward neural network with back-propagation of error learning mechanism (Frank et al., 2016). The network was trained and simulated through a 10-fold cross-validation process and spanned up to 2500 epochs or until the algorithmic convergence was achieved. The default setting of the algorithms viz. 0.3 learning rate, 0.5 momentum and zero validation set size were utilized as the network parameters. It was observed that the algorithms were able to truly converge most of the time, which signified that the performance/error goal was achieved. The schematic representation of the workflow of the prediction analysis has been depicted in Fig 1.
 

Fig 1: Schematic workflow for prediction analysis.


 
Criteria for judging the models
 
The error in the prediction of FL305DMY was estimated as the deviation of predicted value from actual value of 305-day milk yield. The absolute error is the error without considering the positive or negative signs. The different criteria of error for judging the efficiency of prediction models are presented in Table 1.
 

Table 1: Estimation of error in prediction.

Prediction efficiency of different models
 
The prediction efficiency of conventional and connectionist models was evaluated based on monthly test-day milk yield records in Murrah buffalo. The error in prediction of FL305DMY was estimated by subtracting the actual value from the predicted value, therefore, a negative error denoted an underestimation of lactation yield by the prediction model, whereas, a positive error denoted an overestimation. The efficiency of different prediction models was also evaluated by absolute error, average error, root mean square error (RMSE) and their respective percentages as shown in Table 2. A perusal of the results revealed that all the models exhibited an RMSE lower than five per cent, however, the magnitudes of errors were minimal for MLR, followed by ANN, CDM, TIM and RM models. The scatter plots showing the prediction efficiency of different models by comparing the actual and predicted values of FL305DMY are presented in Fig 2. Clustering of the points closely around the diagonal line (y = x) in the scatter plots represents the accuracy of the prediction models.
       

Table 2: Error criteria to evaluate the prediction efficiency of different models.


 

Fig 2: Actual versus predicted first lactation 305-day milk yield (FL305DMY) by different models.


 
Tailor and Singh (2014) reported 34.34 kg average error in the prediction of lactation yield by TIM model based on systematic sampling scheme in Surti buffaloes. Atil (1999) reported that regression model was better than ratio model based on the study on 3,780 records of Holstein Friesian cows, which was in agreement with the present study. Murphy et al., (2014) conducted the study on 140 Holstein Friesian cows and reported that MLR with 10.62% RMSE was found to be a better prediction model than ANN model with 12.03% RMSE. Hemant and Hooda (2014) predicted the lifetime milk yield based on production and reproduction traits of 158 crossbred cows and reported that MLR (R2 = 90.93%) was better than ANN (R2 = 88.96%) model. Gandhi et al., (2012) showed 99.77% prediction accuracy by MLR and 99.18% by ANN model, hence, suggested that MLR could be preferred over ANN model in Sahiwal cattle because of better prediction accuracy and lesser complexity. Sanzogni and Kerr (2001) and Rana et al., (2021b) also showed that MLR model was better than ANN model for the prediction of lactation yield in dairy animals which was in agreement with the findings of the present study. On the contrary, Sharma et al., (2006) in Karan Fries cows, Dongre et al., (2012) in Sahiwal cows and Nosrati et al., (2021) in Holstein cows have documented that ANN model performed better than MLR model in the prediction of lactation yield of the dairy animals.
 
Prediction of first lactation 305-day milk yield
 
The results of the present study revealed that MLR model exhibited the highest prediction efficiency amongst all the tested models, therefore, the same was further utilized to achieve the pivotal objective i.e. to predict the FL305DMY in Murrah buffaloes. The MLR prediction models along with their respective estimated intercept value, regression coefficient, coefficient of determination (R2), Akaike information criterion (AIC), Bayesian information criterion (BIC) and root mean square error (RMSE) values are presented in Table 3. The MLR model for the prediction of FL305DMY utilizing all the 11 MTDY records was found to be the best with 90.1681 RMSE and 95.68% accuracy. However, utilizing all the 11 test-day records would not predict the FL305DMY at an early stage of lactation. To predict the FL305DMY record at an early stage, mid-lactation monthly test-day milk yields up to MTDY-7 were investigated by stepwise backward elimination regression model in SAS enterprise guide 4.3, 2003 software. For early prediction, the most optimal regression model with two variables consisted of MTDY-3 and MTDY-7 showed 81.58% accuracy. The observation indicated that the prediction accuracy with a single variable (MTDY-7) was about 62%, remarkably, the introduction of an additional variable to the model resulted in a significant increase of around 19% in prediction accuracy. On addition of one more variable to the model (MTDY-3, MTDY-5 and MTDY-7) further showed an increment of 3.84% and resulted in 85.42% R2. Regression model with four variables (MTDY-3, MTDY-4, MTDY-5 and MTDY-7) increased the prediction accuracy to 87.02%. Further addition of monthly test-day records did not show a significant increase in the accuracy of prediction. Therefore, it could be interpreted that the optimal model for early-stage prediction of FL305DMY was the MLR model consisting of four variables (MTDY-3, MTDY-4, MTDY-5 and MTDY-7) with 87.02% R2 and 154.7171 RMSE.
 

Table 3: Prediction equations along with their accuracy as estimated by stepwise backward regression model.


       
Singh et al., (2013) based on the study on 453 Surti buffaloes reported that the regression equation with all the test-day records showed 81.60% prediction accuracy, which was lower than the estimates obtained in the present study (R2 = 95.68%). They also reported that the most optimal equation for early prediction exhibited 76.90% R2 had three variables (MTDY-3, MTDY-6 and MTDY-7), in contrast, higher estimate was derived in the present study with three variables (R2 = 85.42%) under MTDY-7. Murphy et al., (2014) reported that the regression equation consisting of all the monthly test-day records exhibited 91.70% R2 in Holstein Friesian cows. Similarly, in agreement with the present study, Joshi et al., (1996) reported 93.07% prediction accuracy in Haryana cows when all the monthly test-day records were incorporated for the prediction of lactation yield. Elmaghraby (2009) studied on 175 Egyptian buffaloes and reported that the optimal regression equation for the early prediction of lactation yield consisted of five variables (MTDY-1, MTDY-2, MTDY-3, MTDY-4 and MTDY-6) with 78% accuracy. Dass and Sadana (2003) considered test-day milk yield records up to 8th month of lactation for early prediction of 305-day milk yield and reported that the optimal prediction equation incorporating four variables (MTDY-2, MTDY-4, MTDY-6 and MTDY-8) showed 89% accuracy in the study of 415 Murrah buffaloes. Gandhi et al., (2012) reported that the optimal equation for early prediction of lactation yield in Sahiwal cows consisted of five variables (MTDY-2, MTDY-3, MTDY-5, MTDY-7 and MTDY-8) showed 93.77% R2 and 126.98 kg RMSE. Saini et al., (2005) based on the study on 267 Rathi cows reported that the increase in prediction accuracy was 12.20% on addition of a second variable to the single variable regression equation. The regression equation incorporating all the test-day records showed 81.31% prediction accuracy. The optimal regression equation for early prediction of lactation yield was with three variables (MTDY-1, MTDY-2, MTDY-7) showed 78.42% R2, which was lower than the estimate obtained in the present study.
The study revealed that the most accurate prediction of the first lactation 305-day milk yield was achieved using MLR model, followed by ANN model. As the difference observed in the prediction accuracy was small, it is inferred that the ANN model has great potential to serve as an alternative for the prediction of FL305DMY if larger data is provided for better training of the network. The results showed that the optimum early prediction of FL305DMY could be achieved by 185th day of lactation by utilizing MTDY-3 (65th day), MTDY-4 (95th day), MTDY-5 (125th day) and MTDY-7 (185th day) for which the prediction accuracy (87.02%) was nearer to the one wherein all the monthly test-day records were utilized. Evaluation of sire and dam based on early monthly test-day records would result in reduced cost incurred on milk recording and animal rearing, reduced generation interval and increased response to selection. This study could serve as a foundation for further prediction studies in dairy farming.
The authors express gratitude to the Director of ICAR-National Dairy Research Institute (NDRI), Karnal, as well as the Head of Animal Genetics and Breeding division at ICAR-NDRI, Karnal, for providing the essential resources to conduct the study. The authors are also thankful to the Livestock Record Unit of ICAR-NDRI, Karnal, for their invaluable assistance in providing the data required for the analysis.
The authors declare that they have no conflict of interest.

  1. Akilli, A. and Hülya, A. (2020). Evaluation of normalization techniques on neural networks for the prediction of 305-day milk yield. Turkish Journal of Agricultural Engineering Research. 1(2): 354-367.

  2. Atil, H. (1999). Ratio and regression factors for predicting 305 day productions from part lactation milk records in a herd of Holstein Friesian Cattle. Pakistan Journal of Biological Science. 2(1): 31-37.

  3. Dass, G. and Sadana, D.K. (2003). Predictability of lactation milk yield based on test day values in Murrah buffaloes. Indian Journal of Animal Research. 37(2): 136-138.

  4. Dongre, V.B., Gandhi, R.S., Singh, A. and Ruhil, A.P. (2012). Comparative efficiency of artificial neural networks and multiple linear regression analysis for prediction of first lactation 305- day milk yield in Sahiwal cattle. Livestock Science. 147(1- 3): 192-197.

  5. Elmaghraby, M.M.A. (2009). Lactation persistency and prediction of total milk yield from monthly yields in Egyptian buffaloes.  Lucrări Ştiinţifice. 53(15): 242-249.

  6. Frank, E., Hall, M.A. and Witten, I.H. (2016). The Weka Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques (4th edn). (Morgan Kaufmann: Massachusetts).

  7. Gandhi, R.S., Monalisa, D., Dongre, V.B., Ruhil, A.P., Singh, A. and Sachdeva, G.K. (2012). Prediction of first lactation 305- day milk yield based on monthly test day records using artificial neural networks in Sahiwal cattle. Indian Journal of Dairy Science. 65(3): 229-233.

  8. Guevara, L., Castro-Espinoza, F., Fernandes, A., Benaouda, M., Muñoz-Benítez, A.L., Del Razo-Rodríguez, O.E., Peláez- Acero, A. and Angeles-Hernandez, J.C. (2023). Application of machine learning algorithms to describe the characteristics of dairy sheep lactation curves. Animals. 13(17): 2772. https://doi.org/10.3390/ani13172772.

  9. Hemant, K. and Hooda, B.K. (2014). Prediction of milk production using artificial neural network. Current Advances in Agricultural Sciences. 6(2): 173-175.

  10. Joshi, B.K., Tantia, M.S., Vij, P.K., Kumar, P., Gupta, N., Nivsarkar, A.E. and Sahai, R. (1996). Performance of Hariana cows under farmers’ herd condition. Indian Journal of Animal Sciences. 66(4): 393-397.

  11. Kaygisiz, A. (2013). Estimation of genetic parameters and breeding values for dairy cattle using test-day milk yield records. The Journal of Animal and Plant Sciences. 23(2): 345- 349.

  12. Lidauer, M., Mäntysaari, E.A. and Strandén, I. (2003). Comparison of test-day models for genetic evaluation of production traits in dairy cattle. Livestock Production Science. 79(1): 73-86.

  13. Likhi, A.K., Khanna, A.S. and Jaiswal, U.C. (1995). Evaluation of centring date, test-interval and sample-day production methods of computing lactation yield from AM and PM sample-day records. Indian Journal of Animal Sciences. 65(11): 1241-1246.

  14. Murphy, M.D., O’Mahony, M.J., Shalloo, L., French, P. and Upton, J. (2014). Comparison of modelling techniques for milk- production forecasting. Journal of Dairy Science. 97(6): 3352-3363.

  15. Nosrati, M., Hafezian, S.H. and Gholizadeh, M. (2021). Estimating heritabilities and breeding values for real and predicted milk production in Holstein dairy cows with artificial neural network and multiple linear regression models. Iranian Journal of Applied Animal Science. 11(1): 67-78.

  16. O’connor, L.K. and Lipton, S. (1960). The effect of various sampling intervals on the estimation of lactation milk yield and composition. Journal of Dairy Research. 27(3): 389-398. doi: https://doi.org/10.1017/S0022029900010463.

  17. Rana, E., Gupta, A.K., Singh, A., Chakravarty, A.K., Yousuf, S. and Karuthadurai, T. (2021a). Genetic analysis of first lactation monthly test day milk yields, peak yield and 305 days milk yield in murrah buffaloes. Indian Journal of Animal Research. 55(2): 134-138. doi: 10.18805/IJAR.B-3679.

  18. Rana, E., Gupta, A.K., Singh, A., Ruhil, A.P., Malhotra, R., Yousuf, S. and Ete, G. (2021b). Prediction of first lactation 305- day milk yield based on bimonthly test day milk yield records in murrah buffaloes. Indian Journal of Animal Research. 55(4): 486-490. doi: 10.18805/ijar.B-3963.

  19. Ruhil, A.P., Raja, T.V. and Gandhi, R.S. (2013). Preliminary study on prediction of body weight from morphometric measurements of goats through ANN models. Journal of the Indian Society of Agricultural Statistics. 67(1): 51-58.

  20. Saini, T., Gahlot, G.C. and Kachwaha, R.N. (2005). Prediction of 300 days lactation yield on the basis of test day milk yield in Rathi cows. Indian Journal of Animal Sciences. 75(9): 1087-1089.

  21. Sanzogni, L. and Kerr, D. (2001). Milk production estimates using feed forward artificial neural networks. Computers and Electronics in Agriculture. 32(1): 21-30.

  22. Sargent, F.D., Lytton, V.H. and Wall, J.O.G. (1968). Test interval method of calculating dairy herd improvement association records. Journal of Dairy Science. 51(1): 170-179.

  23. SAS, Statistical Analysis System (2003). ‘User’s Guide Statistics.’ (SAS Institute Inc.: Cary, NC).

  24. Sharma, A.K., Jain, D.K., Chakravarty, A.K., Malhotra, R. and Ruhil, A.P. (2013). Predicting economic traits in murrah buffaloes with connectionist models. Journal of the Indian Society of Agricultural Statistics. 67(1): 1-11.

  25. Sharma, A.K., Sharma, R.K. and Kasana, H.S. (2006). Empirical comparisons of feed-forward connectionist and conventional regression models for prediction of first lactation 305- day milk yield in Karan Fries dairy cows. Neural Computing  and Applications. 15: 359-365.

  26. Singh, S., Tailor, S.P., Mishra, S., Kothari, M.S. and Garg, M.K. (2013). Prediction of first lactation 305-day milk yield using monthly part and test day yields in Surti buffaloes. Indian Journal of Animal Sciences. 83(11): 1219-1220.

  27. Tailor, S.P. and Singh, S. (2014). Observed and predicted first lactation milk yield in Surti buffaloes. Indian Journal of Animal Sciences. 84(7): 775-778.

  28. Torshizi, M.E. and Mashhadi, M.H. (2015). Evaluation of various approaches in prediction of daily and lactation yields of milk and fat using statistical models in Iranian primiparous Holstein dairy cows. Iranian Journal of Applied Animal Science. 5(1): 81-87.

Editorial Board

View all (0)