A correlation matrix was created to describe the relationships between the traits. Regarding the correlation matrix of the training dataset (Fig 1), while the marketing live weight of lambs (D120) had the highest correlation (r=+0.79, P<0.05) with the 90th-day weaning weight, it showed very low and insignificant correlations with dam age and birth weight. At the same time, D120 showed significant correlations with D30 (r=+0.72, P<0.05) and D60 (r=+0.78, P<0.05) (Fig 1a). However, D30 and D60 predictors were excluded from the data set as they were highly correlated variables (Fig 1b).
Correlation matrices and a boruta algorithm based on random forests identified the study’s most significant variables. This was done to avoid multicollinearity and reduce the variables needed for optimal forecasting performance. This approach reduces time and effort by allowing variable selection to determine important variables, removing unnecessary measurements (
Çakmakçı, 2022). The Boruta method has been proven effective in determining the optimal subset of traits to build a high-accuracy prediction model (
Cao, 2019). The findings of
Wolc et al., (2011), who found that the correlation between live weights of sheep of various ages increased with increasing days of age, were consistent with the high correlations between live weights at 30, 60, 90 and 120 days of age determined in this study. On the other hand, in the study conducted by
Tırınk et al., (2023b) on romane lambs, only relatively high correlations between weaning weight and suckling weight (0.79) and low-to-moderate correlations between final weight and birth weight, suckling weight and weaning weight (0.29, 0.42, 0.52, respectively) were determined.
The importance scores of the variables are given in Table 2. Initially, the data set was cleared of highly correlated predictors. The boruta algorithm was then used to apply variable selection processes to the remaining predictors. The ML models were trained by selecting important predictors. The result of boruta analysis showed that dam age, birth weight, 90-day weight, and birth type were the variables confirmed to be significant as final predictors. However, the predictor sex was excluded from the dataset because it was insignificant. Herd, sex, birth type, dam age, birth weight, live weight at 60 days and weaning weight (90 days) were used by ANN to predict marketing weight (120 days) of hair goat kids, but only herd, sex, birth type and dam age with significant effects (P<0.05) were included in the model (
Erdoğan Ataç et al., 2022).
One of the most popular data resampling techniques to estimate the generalizability of a predictive model and to avoid overfitting is cross-validation (
Berrar, 2019). Repeated (5 times) 5-fold cross-validation was used to evaluate the performance of each model in the study. The results of repeated 5 times 5-fold cross-validation resampling for datasets across models are given in Table 3. In various studies where the live weight of sheep was estimated with different machine learning algorithms, the best n value was obtained by the cross-validation method and the n value was selected as 5 (
Sant’Ana et al., 2021;
Coşkun et al., 2023) or 10 (
Huma and Iqbal, 2019;
Camacho-Perez et al., 2022; Hamadani and Ganai, 2023).
The CART model was the fastest, taking only 4.52064 seconds. The ANN algorithm processed the same dataset in 15.05579 minutes, while the RF algorithm took 8.421965 minutes and the SVMR algorithm only needed 5.769735 minutes of runtime. The ANN model was the most time-consuming but had the highest prediction accuracy using the test dataset. It outperformed all other models with the lowest RMSE and MAPE values. The MAE values for the ANN, CART, SVMR and RF datasets were 2.504, 2.567, 2.498 and 2.589, respectively. Training a machine learning model usually requires significant time and space (
Yang and Shami, 2020). The runtime performance of the models in this study was like the study’s findings that applied the same models to a smaller dataset (
Çakmakçı, 2022). Reporting that the models’ processing time was a factor to consider,
Sant’Ana et al., (2021) found that the extreme gradient boost regressor (XGBR) was the fastest at 0.435 seconds, while the random forest regressor (RFR) was the most time-consuming at 5.950 s.
The cross-validation results were analyzed and after extracting all the metrics, the mean statistical value of each metric (MAE, RMSE, and MAPE) was calculated. Accuracy metrics are commonly used to assess machine learning predictions. The sensitivity of the seven accuracy measures was listed as MSE > SMAPE = MAPE > MAE > RMSE > R2 > R
(Jierula et al., 2021). On the other hand,
Camacho-Perez et al., (2022) reported that the expected errors in predictions and experimental measurements were errors of low magnitude and that RMSE was a suitable indicator to evaluate the algorithm’s performance. In this study, the ANN model with the lowest RMSE (3.181) and MAPE (0.076) values had the best predictive performance in terms of prediction accuracy on the test dataset (Table 3). The order of superiority of the algorithms in prediction accuracy was found as ANN > CART > SVMR > RF. Similarly, it was determined that marketing live weight of hair goat kids (
Erdoğan Ataç et al., 2022) and the growth of baluchi lambs (
Behzadi and Aslaminejad, 2010) could be predicted successfully by ANN. However, according to the goodness of fit criteria, CART was the best model in estimating the ideal final weight at 4 months of age in Romane male and female breeding lambs (
Tırınk et al., 2023b). In contrast to the study findings, RF performed better in predicting the live weight of different sheep breeds, as it had the lowest values of the accuracy metrics (
Huma and Iqbal, 2019,
Sant’Ana et al., 2021;
Çakmakçı, 2022).
Fig 2 shows the variable importance scores based on permutation. Including the minimum possible number of predictors that give acceptable results can reduce the data acquisition cost or improve the software’s efficiency (
Çakmakçı, 2022). According to the variable importance scores used to analyze the relative importance of the predictors, 90
th day live weight was the most important predictor of marketing live weight in all models. However, birth weight, birth type (twin) and dam age were determined as predictors with low relative importance. The sensitivity analysis results for the support vector regression algorithm in the study by
Tırınk et al., 2023b) showed that the most effective variable on final weight was the age of final weight, and the second variable was weaning weight. Sex, suckling weight, weaning age and birth weight were also found to be important while the age of suckling weight, birth type (2, 3 and 4) and the number of co-suckled lambs were the least effective variables. On the other hand, in studies in which live weight was estimated from morphological measurements, it was reported that the most important variables in predicting live weight were chest width, chest depth (
Çakmakçı, 2022) and chest circumference (
Tırınk et al., 2023a).
According to the correlation matrix (Fig 3a), the correlation coefficients between the values estimated by the SVMR, CART, RF and ANN models and the actual marketing live weight values were determined as 0.82 (P<0.05), 0.82 (P<0.05), 0.82 (P<0.05) and 0.84 (P<0.05), respectively. Based on the ANOVA results, there was a statistically significant difference between the marketing live weight values estimated by the RF and SVMR models (P<0.05) (Fig 3b). When the performance of each model as a weight predictor was analyzed, it was found that all models used in this study had similar prediction trends. Contrary to the study findings, there was no difference between the actual live weight values and the values predicted by machine learning models in Norduz sheep (
Çakmakçı, 2022).
In the study in which fat tail weight was estimated in sheep using ANN and MLR (multiple linear regression) models, the mean relative error between actual and model-predicted values was significantly (P<0.01) lower for ANN than for the MLR model and the ANN model gave a better estimation (
Norouzian and Alavijeh, 2016). The mean error of the measured values compared to the actual value was reported to be less than 10% in the study, where the live weight of the sheep was estimated from biometric data
(Camacho-Perez et al., 2022).
The optimal model identified in this study was artificial neural n etwork (ANN) with a 4-5-1 architecture, consisting of four input nodes corresponding to the predictor variables, a single hidden layer with five neurons and one output node. This configuration resulted in a total of 31 trainable parameters. The model employed a weight decay coefficient of 0.01 to mitigate overfitting.