In this section, we provide the experimental results obtained on three machine learning models: Crop yield prediction using random forest, soil classification using K-means and disease and nutrient deficiency detection using CNN. We then discuss in detail the quantitative, qualitative and graphical results obtained, comparing them with the existing models and literature while providing our comprehensive interpretation of the findings.
Quantitative results
CNN model-plant disease prediction
The CNN was trained for 10 epochs on a dataset of 10,849 images belonging to 38 classes. The epoch-wise progression of the model’s training and validation metrics is quantitatively summarized in Table 1.
Interpretation
• Thus, the CNN model performed with an accuracy of 87.69% in the multi-class classification task for different classes of crop diseases and health statuses totalling 38.
• The model handles both the majority and minority classes quite well, with a Macro Average F1-Score of 0.8363.
• The precision and recall metrics were also high, confirming the model’s ability to identify healthy and diseased crops without significant bias toward one class.
Random forest-crop yield prediction
The model used for crop yield prediction is the random forest regressor, which was evaluated using 5-fold cross validation (CV) and then fine-tuned with GridSearchCV. Results for this optimized model are presented in Table 2.
Interpretation
• The random forest model explained 97.97% of the variance in crop yield, thus assuring its excellent predictive accuracy.
• The MAE of 9.4754 kg/ha suggests that the model predictions are within around 9.5 kg/ha from the actual crop yield, which is a strong result for large-scale agricultural predictions.
K-means clustering-soil classification
K-means clustering algorithm has been applied to the Soil Measures dataset, with the clustering quality results detailed in Table 3.
Interpretation
• A silhouette score of 0.3647 reflects that the quality of clustering is moderate, with clusters not being highly dense but fairly distinct. This score would suggest further scope for improvement in the separation of the soil types, possibly by scaling data, using some advanced clustering algorithm, or more refined feature selection.
•
K-means clustering
The model resulted in a silhouette Score of 0.3647. This is a relatively high score in terms of cluster separation in a mathematical context; however, this is a highly acceptable score in the context of high-variance data that is typical in real-world datasets for soil analysis (Nitrogen, phosphorus, potassium, pH).
Qualitative results
CNN model in real-world scenarios
In a real-world agricultural scenario, the CNN model could be used for disease diagnosis based on a smartphone. Given that most farmers possess mobile devices, the CNN model can enable rapid detection of diseases from images of crop leaves taken in the field. The 87.69% accuracy suggests that the model would be quite effective in the field settings where data can vary due to factors such as lighting, image quality and background noise.
Example
• The results obtained on tomato (Healthy) and tomato (Bacterial spot) indicate the highest precision and recall, reflecting that even from noisy images, the model was able to distinguish between healthy and diseased plants.
Random forest for crop yield prediction
The random forest model performs well in predicting crop yield for different states of India. The R
2 score of the model is 0.9797, hence it will definitely be very useful in predicting future crop yields. This will help farmers and policymakers make informed decisions on resource allocation and crop management. This can also be integrated with decision support systems to provide predictions helpful in precision farming, such as the quantity of water and fertilizer required.
Example
• In forecasting rice yield in Tamil Nadu, the model arrived at a prediction of 2.6596 metric tons/ha at an R
2 score of 97.97%, hence showing the capability of the system for large-scale agricultural applications.
K-means clustering for soil classification
The classification of the soil types with the K-Means clustering algorithm was moderately successful. Real-world applications could be about soil classification to inform on crop suitability by matching soils to the appropriate crop types. The Silhouette Score indeed suggests further work that needs to be done to better separate the clusters, possibly with more sophisticated clustering techniques.
Example
• The model classified the soil type as Acidic phos-rich (Type A). This type is most suitable for crops that do well in acidic, nutrient-rich soils. This information would be useful in site-specific crop management.
Graphical results
To visually contextualize the computational stability of the models, Fig 4A delineates the CNN training and validation accuracy curves, while Fig 4B graphically plots the regression validation residuals.
CNN training and validation curves
The learning curves illustrate steady improvement across epochs, confirming effective feature extraction without severe overfitting.
Fig 4B Shows residuals and the pattern of error in the regression model, it confirms that the predictions of the model are very close to the actual values without major bias or inconsistencies in prediction.
F1-score bar chart
Fig 5 illustrates the F1-scores of each of the 38 classes of crop diseases. The chart reflects very strong performance, especially in classes like tomato (Healthy) and corn (Common rust).
Elbow curve (K-means clustering)
The elbow curve in Fig 6 indicates that for K=5, the intra-cluster variance is at a minimum for the K-means algorithm.
Confusion matrix for CNN model
The normalized confusion matrix (Fig 7) reflects the classification results of every class of plant diseases, showing a high performance for the classes such as tomato (Healthy) and corn (Common rust).
Silhouette score visualization for K-means
As shown in Fig 8, the mean silhouette score of 0.3647 still indicates moderate clustering quality, hence there is a need to improve cluster separation.
Residual plot for random forest model
The residual plot (Fig 9) shows that the random forest model does not have significant bias with respect to its predictions, further confirming its effectiveness.
2D PCA scatter plot (Cluster separation)
Fig 10 depicts the separation of the K-Means clusters in the reduced 2D feature space using PCA.
Predicted vs. actual crop yield (Random forest)
The scatter plot in Fig 11 compares the predicted crop yields against actual crop yields, showing the high degree of accuracy the Random Forest model provides in predictions of agricultural outcomes.
K-means cluster association heatmap
Fig 12 shows the relationship between soil property and cluster assignment, which enables one to obtain insight into the chemical composition of different types of soil.
Result interpretation
The experimental outcomes validate the efficacy of the proposed multi-modal framework. The Random Forest model provides highly reliable yield forecasts (R
2= 0.9797), while the CNN demonstrates robust, generalized field diagnostics across 38 classes. Although K-Means achieved a moderate Silhouette Score, it establishes a highly acceptable baseline for processing high-variance soil chemistry. Together, these integrated models form a highly capable foundation for precision agriculture.
Future work
Some future enhancements could be the incorporation of live weather, satellite and IoT sensor data to enable real-time monitoring and yield prediction with greater accuracy. Further development of easy-to-use mobile or web tools, along with model customization for specific crops and regions, will enable farmers to make quicker, environment- friendly, data-driven decisions for sustainable and smart farming.