Indian agriculture boasts a rich history, beginning with the Indus Valley Civilization (2600-1600 BCE), which cultivated rice, wheat, barleyand cotton using irrigation systems
(Bheemabai, 2017). During the Mughal Empire (1526-1857), new crops like tobacco and potatoes were introduced, alongside advancements in farming techniques such as crop rotation (selfstudyhistory)
(Batra et al., 2021). Colonial British policies focused on exporting raw materials, leading to decreased food production and famine in the 19
th century
(Shobanadevi et al., 2023). The Green Revolution of the 1960s and 1970s introduced high-yield crops and modern farming methods, significantly boosting food production
(Howlett, 2008).
Despite its historical importance, Indian agriculture faces challenges today, particularly from climate change. Erratic rainfall, droughts and floods are reducing crop yields and productivity
(Shook et al., 2021, Dwivedi et al., 2022). Dependence on monsoon rains makes agriculture vulnerable to weather variations, leading to crop failures and reduced farmer income
(Mahdi et al., 2020). Rising temperatures are degrading soil quality and affecting crop growth
(Elavarasan et al., 2020). While other countries adopt irrigation and modern technologies to mitigate climate impacts, many Indian farmers still use traditional methods, struggling to adapt
(Majeed et al., 2021; Durai and Shamili, 2022).
Machine learning, a subset of artificial intelligence, enables computers to identify patterns and insights from data without explicit programming, improving over time with experience
(Keerthana et al., 2021). Deep learning, a subset of machine learning, trains neural networks to understand complex data representations and is used for tasks like image recognition, recommender system, natural language processingand predictive analytics
(Shen et al., 2017; Nevavuori et al., 2019; Baek et al., 2020). Recent advances include feature engineering-based LSTM models that enhance crop harvest forecasting by creating new features from existing data
(Iniyan et al., 2023).
Related works
Jhajharia and Mathur (2022) conducted a research work in Rajasthan, India, implementing various machine learning techniques to estimate crop yield on five identified crops. The study discovered that the Random Forest, SVM and Lasso Regression models performed better in predicting agricultural yield than deep learning models such as Gradient Descent and LSTM. However, the study suggested that a larger dataset and further investigation into soil and rainfall are required for practical applications of prediction models in crop production
Panigrahi et al., (2023) formulated a forecasting model for the Indian state of Telangana from 2016 to 2018 for Bengal gram, groundnutsand maize. It utilized six supervised regression models: gradient boosting regression, random forest regression, linear regression, decision tree regression, XGBoost regression and voting regression. The research found that the XGBoost Regression and Random Forest Regression models were the most precise. A two-step approach to enhance agricultural yield, involving forecasting seasonal rainfall using modular artificial neural networks (MANNs), followed by using the rainfall data and crop-specific land area to forecast the yield of major kharif crops with support vector regression (SVR)
(Khosla et al., 2020).
In a study by
Gopal and Bhargavi (2019), a hybrid MLR-ANN model was developed that utilized the coefficients and bias from a multiple linear regression (MLR) model to initialize the weights and bias in the input layer of the artificial neural network (ANN) model, in place of random weights and bias. This approach improved the accuracy of the model over traditional methods.
Nigam et al., (2019) investigated various machine-learning algorithms for forecasting crop yield based on variables such as temperature, rainfall, seasonand area. Simple RNN and LSTM were used to predict temperature and precipitation initially.
Keerthana et al., (2021) discussed various machine learning algorithms, including AdaBoost regressor, Random Forest, Gradient Boosting, Decision Treesand KNN classifiers. The study found that an ensemble model consisting of Decision Tree Regressor and AdaBoost Regressor produced the most precise outcomes.
Khaki et al., (2020) suggested a novel approach that merges CNNs and RNNs. CNN captures both the spatial relationships among soil data gathered at various depths and the temporal dependencies in meteorological data, while RNN represents the rising trend in crop production over time due to ongoing advancements in plant breeding and management techniques.
Sivanantham et al., (2022) developed a new method called QRECF-DFFMPC to improve prediction accuracy while minimizing time consumption. This approach comprises an input layer, hidden layersand an output layer. The empirical orthogonal function in hidden layer 1 is used to select appropriate features. Quantile regression is then applied in hidden layer 2 to evaluate the features and produce the regression result for each data point.
Satpathi et al., (2023) conducted a comparative analysis for Chhattisgarh using ANN, LASSO, ELNETand ridge regression with 21 years of historical rice data from three districts: Raipur, Surgujaand Bastar. The study found that ANN performed better with Raipur and Surguja data, while ELNET performed better with Bastar data. Additionally, different ensemble models were used, with performance being comparable for Raipur and Surguja, while Bastar performed better with Random Forest.