Indian Journal of Animal Research

  • Chief EditorK.M.L. Pathak

  • Print ISSN 0367-6722

  • Online ISSN 0976-0555

  • NAAS Rating 6.50

  • SJR 0.263

  • Impact Factor 0.4 (2024)

Frequency :
Monthly (January, February, March, April, May, June, July, August, September, October, November and December)
Indexing Services :
Science Citation Index Expanded, BIOSIS Preview, ISI Citation Index, Biological Abstracts, Scopus, AGRICOLA, Google Scholar, CrossRef, CAB Abstracting Journals, Chemical Abstracts, Indian Science Abstracts, EBSCO Indexing Services, Index Copernicus
Indian Journal of Animal Research, volume 54 issue 4 (april 2020) : 488-493

Use of ordered logit model with time series data for determining the factors affecting the milk yield of Holstein Friesians

Özge Akkuş1,*, Volkan Sevinç1
1Department of Statistics, Muðla Sýtký Koçman University, 48000, Muðla, Turkey.
Cite article:- Akkuş Özge, Sevinç Volkan (2019). Use of ordered logit model with time series data for determining the factors affecting the milk yield of Holstein Friesians . Indian Journal of Animal Research. 54(4): 488-493. doi: 10.18805/ijar.B-1066.
This article aims to introduce the use of ordered logit model with time series data in milk productivity studies and determine the important factor levels affecting the milk yield of Holstein Friesians. The data consists of 2002 records collected for the years 2009-2015 from the reports of the Cattle Breeders’ Association of Turkey (CBAT) in Muðla province in Turkey. The direct and marginal effects of the variables: parity, lactation length and year of calving on milk yield are investigated and the probabilities regarding the milk yield production for a given specific parity, lactation length and calving year are calculated. The results show that milk yield slightly increases on the 4th parity of cows. As far as the years concerned, although there had mostly been a steady amount of milk production between 2009 and 2015 years, there was a significant decrease in 2011 and increase in 2014.
When the dependent variable is categorical with two or more levels, Ordered Logit Model (OLM) is an appropriate choice. In this paper, we mainly deal with the introduction and interpretation of the Ordered Logit Model (OLM) with time series data in milk yield estimation of Holstein-Friesian cows. The most important advantage of the ordered model is that it provides probabilities related to the dependent variable. Hence, the prediction probabilities of milk yield on the variables year of calving, parity and lactation length are provided through the model estimated as well as the the significance tests of the levels of those variables. These analyses can be used to establish the necessary improvements on the conditions affecting the milk yield related to those levels. Logistic regression models are the only models that are able to calculate odds ratio scores which make the OLM model also superior to the previously adopted models such as path analysis. In a study of Ýþçi et al., (2015) and Aytekin et al., (2016), path analysis is used to investigate the factors affecting the milk yield of Holstein Friesians. In this study we adopted the variables used in Ýþçi et al., (2015).
        
There are various studies about milk productivity in the literature. In some of the former studies in the related area, Banik and Tomar (2003) used path analysis for determining the factors for lifetime milk yield of Murrah Buffaloes. Koçak and Ekiz (2006) studied the factors affecting the milk yield and lactation curve of Holstein cows with Wood equation in the analysis of the lactation curve. Tahtalý et al., (2011) investigated the factors affecting the milk yield using path analysis. Solano et al., (2016) examined the correlation between lying behaviour and lameness of Canadian Holstein-Friesians. Mote et al., (2016) evaluated the effect of different climatic variables on milk yield and lactation length of Holstein Friesian cows using basic statistics. Unalan (2016) discussed the seasonal effects on behavioral estrus signs and estrus detection efficiencies in Holstein heifers using One-way Anova and Chi-Square tests. Ristevski et al., (2017) made a correlation analysis and examined the correlations between ultrasound measurement of thickness of fat over the tuber ischiadicum, body condition scoring and the risk of lameness developing in Holstein-Friesians. Öner et al., (2017) investigated the polymorphisms in seven genes related to reproductive traits in dairy heifers based on the mixed effect logistic regression. However, there is not a former work using the Ordered Logit Model using time series data for milk productivity.
In this study, a set of data records belonging to 2002 Holstein Friesians in Muðla province is analyzed to determine important factors and factor levels affecting the milk yield of Holstein Friesians. Milk yield for each cow is classified as “2501-4000”, “4001-5500”, “5501-7000” and “7000+” based on on herdmate deviations and expert opinions (Prof. Çiðdem Takma, personal communication). The dependent variable (Y) in our study is milk yield whose four categories are used in the model, after testing whether the variable (Y) is really ordered or not. The independent variables are lactation length, parity and the year of calving. In our work, a set of data records belonging to 2002 Holstein Friesians in Muðla province is used. The estimation of the appropriate OLM is made using NLOGIT 4.0 software with the significance tests of the levels of the variables. The direct and marginal effects of the lactation length, parity and the year of calving on the milk yield are also determined with the related probabilistic estimates.
        
OLM is applied to the data set taking into account to ordinal structure of the concerned variable “milk yield”. In the determination of the true model for categorical dependent variable with more than two levels, the following table summarizes the name of the model, type of the dependent variable and model assumptions. The decision of the true model for the data provides more accurate estimates with smaller errors (Table 1).
        
OLM is constructed on the basis of latent variable (Y*) approach. Latent variables represent qualities that are not directly measured but only inferred from the observed covariations among a set of variables (Tabachnick and Fidell, 2001). Latent variable is represented by the linear combination of the explanatory variables (X) (Liao, 1994; Greene, 2000).  

                                                                                                                                                               (1)

In Eq. (1),    denote the estimated coefficients of the explanatory variables Xk and  is the error term of the model. The dependent variable has J ordered categories in OLM. Following equation gives the derivation of the dependent variable categories in connection with the latent variables and threshold parameters (m) defined where “i” denotes ith observation in the model.
 
                                                                                                                                                                                                   (2)
 
The threshold parameters are decisive in the classification of unknown ordered categories. The statistical significance of the threshold parameters guarantees the ordinality of the dependent variables as thought at the beginning of the study. Thus, the accuracy of the model used for the data is validated. If the true model is OLM, it also allows the calculation of odds ratios which can only be interpreted in logistic regression models. The probability that the dependent variable belongs to the category “j” conditional to the explanatory variable is expressed as follows.
 
         (3)
 
In Eq. (3), F denotes the assumed distribution function for the error term. Constraint  is needed to ensure that all the predicted probabilities are positive. Greene (2000) suggested that the first threshold parameter (m1) should be set to “0”. At the beginning of the analysis, it must first be checked whether the threshold parameters are statistically significant or not. If they are significant, it is determined whether the dependent variable has an ordered structure as it is assumed. The probability that the dependent variable y belonging to the category j or a lower category can be calculated using the following equation.       

               (4)
 
If the logistic distribution denoted by y is specifically chosen for F in Eq. (3), OLM is obtained. The probabilities that the dependent variable belongs to the relevant categories are given by Eq. (5), Eq. (6) and Eq. (7) (Borooah, 2002; Powers and Xie, 2000).
 
 
 
NLOGIT 4.0 software is used to obtain the OLM results. Odds ratio values are calculated by taking the exponentials of the estimated model parameters.
Table 2 introduces all the variables, their levels and base categories used in the study. The most significant factor on milk yield is determined as the year of calving with a Pearson correlation coefficient of 0.402 and it is found to be statistically significant (p = 0.00 < 0.05). It is also observed that there is a significant relationship between the 4th parity (P4) and milk yield at a very critical level. The remaining variable levels do not appear to have significant effects on the milk yield. Before passing on to other interpretations, it is first necessary to test whether the model is statistically significant or not (Table 2).
 
        
When Table 3 is examined, it seems that the model is significant with a significance level of 5% (p = 0.0000208 < 0.05). It can also be realized that all the threshold parameters (,  and) are statistically significant with a level of 5% (p = 0.00 < 0.05). The significance of the threshold parameters indicates that the dependent variable milk yield is ordered as it is assumed at the beginning of the study. Also, it appears that OLM is a suitable model for the data structure. The first column of Table 3 contains the estimated coefficients of the explanatory variables. In the fourth column, the given probabilities are necessary for testing the significance of the estimated model parameters.
        
Accordingly, the most significant factors affecting the milk yield are found to be the year of calving and the parity. 2011 and 2014 calving years appear to be differing from the other years in that milk production is extraordinarily different in those years. Since, the estimated coefficient for the year of 2011 (-2.912) is a negative value, it is concluded that milk yield significantly decreases in this year. However, it extraordinarily increases in 2014, since the coefficient 1.316 is positive. There must be a factor in 2014 causing the milk production to increase. This factor can be explained as the age of the cows. That is, it can be concluded that the younger cows tend to produce more amount of milk. For the year of 2011 the factor causing the decrease may be a change in the conditions of the cows such as an illness, a change in the diet of the animals or climate conditions. Briefly, in 2011 there must have been an extraordinary reason causing a decrease in the milk yield in Muðla province. As far as Muðla is concerned, one of the most important causes of this decrease may be an increase in prices of the animal feeds, due to the higher inflation rate in the Turkish economy observed during the year 2011. Moreover, a serious flood occurred in Muðla province in 2010 which may also have affected the amount of animal feed stock.
        
As far as the parity is concerned, only the 4th parity appears to be significant with a p-value of 0.045 as seen in Table 3. Because the estimated coefficient (0.406) of this parity level is positive and statistically significant (p = 0.045 < 0.05), it can also be concluded that milk yield increases at this parity level. That is, we can conclude that the most amount of milk production is obtained on the fourth parity of the cow.
 
The Model
 
The linear combinations of the explanatory variables can be obtained by substituting the characteristics of each cow in the model equation given below.
 
 
 
The following probability equations are obtained depending on the expression in Eq. (8)
        
As an example, estimation of the probability that the milk yield of a cow will be between 2501 and 4000 kgs for lactation length between 201-250 days, for the 2nd parity and for the year 2014 can be given by using the estimated OLM model as
 
 
 
 
Similarly, the probability of milk yield being between 4001 and 5500 kgs is 0.124, the probability of milk yield being between 5501 and 7000 kgs is 0.676 and the probability of milk yield being over 7001 kg is found 0.308 using the suggested model above.

 




Interpretation of Odds Ratios
 
The odds ratios given in the last column of Table 3 are interpreted for 2014 and 2011 calving years and for 4th parity, which are the only statistically significant factors in the analysis.
                                                                         
                                               
                                               
P1 (base)
The odds ratio for P4 is 1.501. The probability that any cow on its 4th parity will produce an amount of milk more than 2501-4000 kgs is 1.5 times higher than a cow which is on its 1st parity.

 
 
2015 (base)
 
The odds ratio for 2011 calving year is 0.054. When this value is inverted to make the interpretation more understandable, a value of 18.5185 is obtained. This means, compared to the year 2011, the probability that any cow will produce an amount of milk more than 2501-4000 kgs will be 18.5 times higher for the year of 2015.

 
2015 (base)
The odds ratio for the calving year of 2014 is 3.727. This means that compared to the calving year 2015, the probability that any cow will produce an amount of milk more than 2501-4000 kgs will be 3.73 times higher for the calving year 2014.
 
The marginal effects of the explanatory variables on the probabilities of milk yield
 
Marginal effects measure the increase or decrease in a category along with a change in the amount of milk production levels. In Table 4, marginal effects of the significant levels of the explanatory variables on the probabilities related to the milk yield are given.
        
Among the marginal effects calculated for the milk yield and given in Table 4, only the marginal effect values belonging to years 2011, 2014 and 4th parity is significant. When those marginal effects are interpreted, we again observe a decrease in the milk yield for the year 2011 (-0.2075) and an increase (0.2359) for the year 2014. This result also supports our previous comment that older cows tend to produce less amount of milk than younger cows. The slight decrease in 2015 compared to the year of 2014 does not violate the comment that we have made. Because that slight difference does not appear to be significant. When we examine the effect of 4th parity on the milk productivity, we observe a very low increase (0.0690) in the probability of milk yield. The probabilities provided by the equations (5), (6) and (7) along with the marginal effects and odds ratios are used for interpretation of OLM. 
In conclusion, the classical linear methods give statistically incorrect results when the concerned dependent variable is categorical with more than two levels. This study is about the determination of the important factors affecting the milk yield of Holstein-Friesian cows using OLM. The OLM model we have estimated considers the significance analysis of the levels of the variables. The most important advantage of the OLM model is that it provides the probabilities of the related categories. It also provides the odds ratios which are not avaliable in the classical linear regression models. Morever, unlike the other models, marginal effects of the categories can also be calculated. The most remarkable point of the OLM model that we provide in this study is the use of the model with time series data. The model was applied to milk productivity area. The model can be used to examine the factors affecting the milk production, to determine the unusual years and to see the decreasing and increasing years of milk production in order to make the necessary improvements. It also allows the institutions to determine the milk production standards and productivity of the past years. Moreover, in this manuscript, in order to introduce the use of the OLM in milk productivity improvement studies, we adopted the three variables. However, the existence and significance of other variables can be examined in further studies.
The authors acknowledge the contributions of Prof. Çiğdem TAKMA (Ege University, Faculty of Agriculture, Department of Animal Science, Biometry and Genetics, Izmir/Turkey) for manipulating the milk yield data and for review of the study design.

  1. Akkuş, Ö. and Özkoç, H. (2012). A Comparison of the models over the data on the interest level in politics in Turkey and countries that are members of the European Union: Multinomial or ordered logit model? Res J Appl Sci Eng Tech. 4 (19): 3646-3657.

  2. Aytekin, I., Mammadova, N.M., Altay, Y., Topuz, D., Keskin, Ý. (2016). Determination of factors effecting lactation milk yield of Holstein Friesian cows by path analysis. Selcuk J Agr and Food Sci. 30 (1):44-48.

  3. Banik, S. and Tomar, S.S. (2003). Path analysis of lifetime milk yield in Murrah Buffaloes. Asian Journal of Dairy and Food Research. 22 (3&4): 176-180.

  4. Borooah, V.K. (2002). Logit and probit (ordered and multinomial models). Sage University Papers, 07-138, London. 

  5. Greene, W.H. (2000). Econometric analysis. New York University, Prince Hall, Upper Saddle River, New Jersey 07458, ISBN: 0-13-    013297-7. 

  6. Ýþçi Güneri, Ö., Takma, Ç. and Akbaº, Y. (2015). Siyah alaca sýðýrlarda 305 günlük süt verimini etkileyen faktörlerin Path (Ýz) analizi ile belirlenmesi. Kafkas Univ Vet Fak Derg. 21 (2):219-224.

  7. Koçak, Ö. and Ekiz, B. (2006). Studies on factors affecting the milk yield and lactation curve of Holstein cows in intensive conditions. J. Fac. Vet. Med. Istanbul Univ. 32 (2):61-69.

  8. Liao, T.F. (1994). Interpreting probability models (Logit, probit and other generalized linear models). Sage Publications, Thousand Oaks, London. 

  9. Mote, S.S., Chauhan, D.S., Ghost, N. (2016). Effect of environment factors on milk production and lactation length under different seasons in crossbred cattle. Indian J. Anim. Res. 50 (2): 175-180.

  10. Öner, Y., Yilmaz, O., Okut, H., Ata, N., Mecitoglu, G., Keskin, A. (2017). Association between GH, PRL, STAT5A, OPN, PIT-1, LEP and FGF2 polymorphisms and fertility in Holstein-Friesians Heifers. Kafkas Univ Vet Fak Derg, 23 (4): 527-534.

  11. Powers, D.A. and Xie, Y. (2000). Statistical methods for categorical data analysis. Academic Press, London.

  12. Ristevski, M., Toholj, B., Cincoviv, M., Bobos, S., Trojacanec, P., Stevancevic, M. and Ozren, S. (2017). Influence of body condition score and ultrasound-determined thickness of body fat deposit in Holstein-Friesians cows on the risk of lamaness developing. Kafkas Univ Vet Fak Derg, 23 (1): 69-75.

  13. Solano, L., Barkema, H.W., Pajor, E.A., Mason, S., LeBlanc, S.J., Nash, C.G.R., Haley, D.B., Pellerin, D., Rushen, J., de Passille, A.M., Vasseur, E., Orsel, K. (2016). Association between lying behavior and lameness in Canadian Holstein-Friesian cows housed in freestall barns. J Dairy Sci. 99 (3): 2086-2101.

  14. Tabachnick, B.G. and Fidell, L.S. (2001). Using multivariate analysis, Allyn and Bacon, Boston.

  15. Tahtali, Y., ªahin, A., Ulutaş, Z., ªirin, E., Abaci, S.H. (2011). Esmer irki sigirlarda süt verimi üzerine etkili faktörlerin path analizi ile belirlenmesi. Kafkas Univ Vet Fak Derg, 17 (5): 859-864.

  16. Unalan, A. (2016). Seasonal effects on behavioral estrus signs and estrus detection efficiency in Holstein heifers. Indian J. Anim. Res., 50 (2): 185-189. 

Editorial Board

View all (0)