Descriptive of soil parameters
The soils contained very high sand (mean = 63.56%) and low silt (23.93%) and clay (12.58%) contents (Table 1). Clay and silt exhibited high spatial variability with coefficients of variation (CV) of 78.49% and 52.91%, respectively. The predominance of coarse textures can be partly attributed to the region’s steep, rugged topography and sparse vegetation, which promote downslope transport of runoff and fine soil particles.
Mean electrical conductivity (EC) and pH were 0.12 dS m
-1 and 7.41, respectively. These indicate non saline and mildly alkaline soil conditions, which are conducive for most plants and crops.
The soils exhibited mean SOC of 0.71%, ranging from 0.04% to 1.53% (SD = 0.37, CV = 52.48%). SOC showed strong positive correlation with altitude (r = 0.580) and negative with temperature (r = -0.582), which highlights the role of cooler, higher elevation environments in promoting soil carbon sequestration. This aligns with previous findings from the region
(Nuguse et al., 2019) and with global studies emphasizing the influence of temperature and elevation on SOC
(Malla et al., 2022; Chen et al., 2023; Pellikka et al., 2023). Similar poor SOC levels have been reported in the region
(Nuguse et al., 2019; Tesfay et al., 2024) and in the Gash Barka region of the country
(Tesfay et al., 2025a,b). These low levels likely result from a combination of factors including widespread unsustainable management practices such as crop residue removal, overgrazing, conventional tillage and continuous monocropping coupled with high erosion rates and an arid climate that limits soil development
(Nuguse et al., 2019; Tesfay et al., 2024, 2025a,b).
Soil quality can be enhanced through crop residue retention, manure application, crop rotation, agroforestry, cover cropping, minimal tillage and biochar. For grazing lands, beneficial strategies include adequate resting, rotational grazing, enclosures, cut and carry systems and SOC dynamics models
(Tesfay et al., 2024). In arid regions, external organic inputs via composting or biochar production are essential to improve soil health and productivity.
Effect of land use types on SOC
Different land use types exhibited significantly varied mean SOC levels. Soils from irrigated farming (IF) had the highest average SOC, 1.03%, followed by enclosures (EC), 0.84%, rainfed farming (RF), 0.59% and communal grazing (CG) lands, 0.57% (Fig 3).
Tesfay et al., (2024) also found that SOC was notably higher in irrigated soils at Keren subzone, which is part of the study area. This indicates the necessity of irrigation and soil moisture for soil development and carbon sequestration. Irrigated farming systems benefit from superior agronomic management compared to rainfed farming, which leads to higher crop yields, enhanced carbon sequestration, reduced CO
2 emissions and decreased sensible heat
(Weldewahid et al., 2023; Abdellatif et al., 2023). However,
(Tesfay et al., 2025a) reported that irrigated farms had statistically as low SOC as that of RF and CG lands in the Western Lowlands of the country. Moreover,
Haile et al., (2025) reported that even though irrigation expanded but yield couldn’t at Dge subzone in the Western Lowlands but increased at Ghala Nefhi in the Highlands. Enclosures also demonstrated significant potential for carbon sequestration.
Nuguse et al., (2019) and
Tesfay et al. (2025a, 2026) also reported considerably higher SOC levels in natural forestlands and enclosures. However,
Hoang (2024) reported that SOC levels were higher in cropping fields than in natural forestlands. These highlight the importance of management in all types of land use systems.
RF and CG soils displayed significantly lower SOC levels, 0.59% and 0.57%, respectively. This suggests their potential for carbon release, contributing to increased atmospheric CO
2 concentrations and global warming, which support the reports of various research
(Nuguse et al., 2019; Weldewahid et al., 2023; Abdellatif et al., 2023; Prabakaran et al., 2023; Urgessa et al., 2023; Tesfay et al., 2024, 2025a,b, 2026). Therefore, converting rainfed to irrigated farming and grazing lands to enclosures, would provide environmental and socio-economic benefits. Additionally, endorsing enhanced management practices for IF and EC land uses is essential to maximize their capacity for carbon storage.
Performance of models
Cubist model with 46 variables (Cubist-46) attained the uppermost accuracy (R
2 = 0.7465, RMSE = 0.1790, RPD = 2.0895 and MD = 0.21%). In contrast, PLS-11 model produced the lowest prediction accuracy (R
2 = 0.5930, RMSE of 0.2368, RPD = 1.6491 and MD = 11.53%) (Table 2, Fig 4).
Generally, the RF model showed the least model discrepancy (MD) changes when the number of input variables changed. This indicates its stability. The Cubist model exhibited a decreasing trend in MD when number of covariates decreased (Fig 4b) with MD values of 5.42, 4.02, 2.51% for the Cubist-46, Cubist-28 and Cubist-11 models, respectively. However, the lowest and highest MD were recorded by the PLS-46 and PLS-11 models with MD values of 0.21 and 11.53%, respectively. This indicates the model’s sensitivity with changing number of covariates. The PLS model demonstrated a consistent decline in performance as the number of input variables decreased (Fig 4b). Model Discrepancy (MD) represents the disparity in performance of a machine learning model when evaluated on testing data compared to training data. This metric is essential for determining how effectively a model captures unobserved data. The tolerance level for model discrepancy varies with the level of precision needed by the specific task
(Tesfay et al., 2025a).
Based on the RPD scheme classification
(Viscarra et al., 2007), all the RF and Cubist models were categorized as good, whereas PLS models as fair. Consequently, the Cubist and RF models are suitable for SOC prediction, quantification and monitoring applications in the study region. The PLS models, while less accurate, may still be useful for broader, general purpose SOC assessments.
Variables’ importance
Variable importance for SOC prediction, based on Random Forest permutation, is shown in Fig 5. When using 46 input variables, the ten most important predictors were (in order) temperature, land use, altitude, SOCI, B10, rainfall, BR2, positive openness (PvOp), SAVI and pH. Together, these accounted for 73.58% of the variance in SOC, with the top five explaining 57.41% and temperature alone contributing 19.73% (Fig 5a).
With 28 variables, the top ten predictors were temperature, land use, SOCI, B10, rainfall, BR2, PvOp, sand, MSAVI2 and NDVI, explaining 79.95% of SOC variance. Here, temperature contributed 26.67% and the top five variables explained 62.15% (Fig 5b).
Using the 11 variable set, the most important predictors were temperature, land use, SOCI, altitude, B10, BR2, rainfall, sand, negative openness (NvOp) and CI. These collectively explained 97.01% of the variance, with the top five accounting for 71.96% and temperature alone representing 21.34% (Fig 5c).
Across all variable sets, temperature emerged as the most important predictor of SOC in the study area, which underscores the strong influence of climate on soil carbon dynamics in arid regions. This finding aligns with other studies worldwide that have identified temperature as a critical factor in SOC modeling (
e.g.
Galluzzi et al., 2024; von Fromm et al., 2024).
Fig 5 (d and e) illustrates variable importance by variable type category. Climate variables (temperature and rainfall) were the most influential, accounting for 25.08% of the explained variance. This was followed by topography (19.66%), bare-soil related indices (19.32%), land use (12.80%) and Landsat 8 bands (9.42%), among other categories. Temperature alone contributed 19.73%, while its surrogate variables altitude (11.10%), B10 (5.62%) and positive openness (3.27%) together raised the total contribution of temperature related predictors to 39.27%.
The same predictor variables were used in a previous study conducted in the Western Lowlands of Eritrea
(Tesfay et al., 2025a). In that setting, rainfall emerged as the most important predictor of SOC, whereas temperature dominated in the current highland study area. These findings underscore the need for climate smart agricultural planning that accounts for regional climatic differences to enhance ecosystem and societal resilience.
In the Western Lowlands, the key SOC predictors were (in order) rainfall, temperature, altitude, soil taxonomy, SWIR2, sand, clay and NIR
(Tesfay et al., 2025a). In the Highlands, the main predictors were temperature, land use, altitude, SOCI, B10, rainfall, BR2 and positive openness. Climate related variables (temperature, rainfall, altitude) were important in both regions, but land use ranked highly only in the Highlands. This likely reflects the longer history of established irrigated farming in the highlands, whereas irrigation in the lowlands is more recent and often poorly managed
(Tesfay et al., 2025a).
Predicted SOC and mapping
The average predicted SOC by the Cubist-46 model was recorded at 0.70%, while the other two models had a mean of 0.71%. Related to the measured soil C, the skewness, CV and SD values for the predicted soil C decreased across all three models. Among them, the RF-28 model demonstrated the lowest CV (39.23%), followed by the PLS-28 model (42.13%) in contrast to the observed SOC with CV of 52.48%. Regression models tend to compress the data closer to the average.
Following SOC prediction using the Cubist-46 model, we applied Regression Kriging (RK) to generate spatial maps across the study area (Fig 6). SOC increased with elevation (Fig 6a), a pattern inversely related to the temperature gradient across the study area. The lowest SOC levels were found in the lower elevation sub zones of Keren and Hamelmalo, whereas the highest concentrations occurred in the upper parts of the watershed. The highlands receive 450-500 mm annual rainfall with cooler temperatures that favour vegetation growth and reduce carbon decomposition, whereas the midlands receive 350-400 mm, with high temperature and evapotranspiration where aridity limits plant cover and accelerates SOC loss.
The RK approach effectively combined Cubist predictions with residual (observed SOC-predicted SOC) spatial interpolation. Residuals narrowed substantially from -0.41 to 0.44% before kriging to -0.08 to 0.07% afterward, with most values falling between -0.05 and 0.05% (Fig 6b). This reduction confirms that kriging successfully removed spatially structured errors, enhancing prediction accuracy.