The final boosted trees model comprised 28 single-split trees. The classification matrix for the T set is presented in Table 3. Se (the percentage of correctly detected cows from the poor category), Sp (the percentage of correctly classified cows from the good category) and Acc (the percentage of correctly classified cows from both categories) on the independent T set (a randomly selected part of the original dataset used for the objective verification of the model predictive performance) were 82.93%, 84.46% and 83.91%, respectively. These values were relatively higher in comparison with other reports. This means that the developed model was quite effective in detecting cows with conception difficulties and generated a small number of false positives. In the study by Grzesiak
et al. (2010) on the classification of AI outcomes in dairy cows using different types of models [discriminant function analysis, logistic regression, artificial neural networks and multivariate adaptive regression splines (MARS)], the values of Se, Sp and Acc on the T set amounted to 77.78 - 87.30%, 79.31 - 85.06% and 78.67 - 86.00%, respectively, so they were similar to those obtained in the present study. In their next study on the same subject using the naïve Bayes classifier and regression and classification trees (CART), Grzesiak
et al. (2011) observed Se, Sp and Acc of 72.0 - 83.0%, 86.0 - 90.0% and 85.0 - 90.0%, respectively. In this case, Se determined in the present study was within the ranges reported by
Grzesiak et al., (2011), whereas Sp and Acc were somewhat lower.
On the contrary,
Fenlon et al., (2017), who used the logistic regression model encompassing a relatively large number of predictor variables, obtained Sp on the T set amounting to 48.12% and 48.87% for the base (including the most significant variables from a univariate analysis) and final (including additional combinations of predictor variables) models, respectively. They further noticed the narrow central range of conception rates in the dairy herds. Also, the percentage of correctly classified cases (60.7 - 72.3% and 63.5 - 73.6% for primiparous and multiparous cows, respectively) in Holsteins
(Shahinfar et al., 2014), resulting from the application of different machine learning algorithms (naïve Bayes, Bayesian networks, decision trees, and random forest), was generally lower than the accuracy observed in the present study.
The values of PSTP and PSTN (indicating the reliability of predictions made by the model) in the present study were 74.73% and 89.93%, respectively. This showed that the detection of cows with potential conception problems after AI was quite reliable and a relatively small number of false alarms (cows that did not experience any difficulties but were indicated as problematic by the model) was generated by the boosted trees classifier. The PSTP and PSTN values reported by
Grzesiak et al., (2010) amounted to 73.13 - 80.88% and 83.13 - 90.24%, respectively, whereas those observed by
Grzesiak et al., (2011) in their next study ranged from 72.46% to 76.79% and from 87.68% to 92.00%, respectively. So, PSTP and PSTN determined in the present study were within the ranges presented by the cited authors. However, the PSTN values observed by
Fenlon et al., (2017) were 56.56% and 56.84% for the base and final logistic regression models, respectively, which means that only every second service classified by their predictive model as successful was really effective. These results were clearly inferior to ours due to the reasons mentioned above.
The ROC curve (showing the relationship between Se and Sp for the different cut-off values) with the AUC of 0.89 is shown in Fig 1. The AUC obtained in the present study confirmed quite good discrimination ability of the boosted trees model. The best cut-off point ensuring the highest sensitivity at the lowest level of false alarms was 0.45. The AUC values reported by
Grzesiak et al., (2010) in their study on conception failure prediction in dairy cows ranged from 0.87 to 0.91. These results are in line with the results of the present study, whereas the AUC values observed by
Fenlon et al., (2017) were 0.61 and 0.62 for the base and final logistic regression models, respectively, which reflected their significantly lower predictive performance. Some previous attempts at predicting conception success to a given insemination in an Irish population of Holstein-Friesian, Jersey, Norwegian Red and crossbred cows, using six different machine learning methods (C4.5 decision trees, naïve Bayes classifier, Bayesian networks, support vector machines, random forest and rotation forest) and a more traditional logistic regression model, were made by
Hempstalk et al., (2015). In general, the greatest AUC values (ranging from 0.50 to 0.67 depending on the set of predictors and testing variant) were found by the cited authors for logistic regression, whose predictive performance was superior to that of all the remaining models. In particular, a random forest, which is similar to the boosted trees method, was characterized by a lower AUC value (0.49 - 0.68) than logistic regression in the cited study and the boosted trees model developed in the present study (AUC=0.89). However,
Shahinfar et al., (2014) recorded a higher AUC value (averaged over five folds of cross-validation and equal to 0.75) for their random forest model that was clearly superior to all other tested machine learning methods.
The most important predictor variables identified in the present study are shown in Fig 2. An average calving interval (CLVI) exerted the greatest influence on a conception difficulty class, followed by GL, BCSI, FAT_PROT and AGE, whereas the effect of the remaining predictor variables was much smaller. A significant influence of CLVI and GL on conception rate has already been described
(Grzesiak et al., 2010). Grzesiak et al., (2010) reported that calving interval was the most important predictor of AI outcome indicated by all the classifiers investigated in their study (discriminant function analysis, logistic regression, neural networks and MARS). However, GL was usually ranked lower by the above-mentioned models (the third or fourth position depending on the classifier). The BCSI variable was also relatively significant in the cited study (the second position in the majority of cases), while FAT_PROT and AGE were indicated only by MARS (the sixth and fourth position, respectively). In the study on the application of the naïve Bayes classifier and CART to the detection of cows with conception difficulties
(Grzesiak et al., 2011), calving-to-conception interval was the most influential factor affecting the class of conception, according to which the first split in the decision tree was made. The next two splits were based on calving interval (the most important variable in the present study) and BCSI (the third most significant predictor in the present study).
Fenlon et al., (2017) used the logistic regression model for the prediction of successful AI outcome and considered a much larger set of potential predictors at the first stage of research (a univariate analysis). The base model included six predictor variables (lactation number, days in milk, interservice interval, calving difficulty score and predicted transmitting abilities for calving interval and milk production), whereas the final one (showing the best predictive performance) additionally included BCS at service. From among these factors, the greatest effect on the probability of conception was exerted by days in milk (after logarithmic transformation, odds ratio equal to 2.81) and BCS at service (odds ratio equal to 28.35). Similar results were obtained by
Hempstalk et al., (2015), who found that more days in milk and higher BCS were associated with a greater chance of conception, whereas increasing parity (corresponding to LACT in the present study) and the number of AI during the previous and current lactation resulted in a lower probability of successful AI outcome. The later months of the mating season (corresponding to SEASON) were associated with the lowest likelihood of conception.
Shahinfar et al., (2014) also identified a herd average conception rate, the incidence of ketosis, the number of previous unsuccessful inseminations, days in milk at breeding and the occurrence of mastitis as the most influential predictors of AI outcome.
The HF, FCM, LACT and SEASON variables had the smallest influence on the probability of conception. The first one was indicated as the fifth most important variable by the artificial neural network in the study by
Grzesiak et al., (2010). Fat-corrected milk, on the other hand, was identified as the fifth most influential predictor variable by MARS in the same study. Lactation number was ranked second and seventh by the neural networks and MARS, respectively. Finally, SEASON was not indicated as an influential predictor by any of the models
(Grzesiak et al., 2010). Also, in the study by
Fenlon et al., (2017), parity (corresponding to LACT in the present study) and predicted transmitting ability for milk yield were included in the base and final prediction models.