The various agrometeorological variables and spectral indices from the year 2006 to 2016 are shown in Table 4. The ground truth information about wheat crop yield was not available for the year 2016-17.
Land use and land cover classification
Fig 3 shows the classified image of Saharanpur district. We can see that the wheat crop covers most of the district with some urban area in the central part and forests in the Northern part of the district.
Accuracy assessment was also performed after supervised classification. The classified image is accepted only when accuracy is above 85%. The Accuracy assessment report is shown in Table 5. In the table, the accuracy assessment is based on the comparison of two maps; one based on the analysis of remote sensing data known as classified map and second based on information derived from actual ground also known as the reference map. Reference total is true class on reference map and classified total is total class on classified map. Based on this, the accuracy of the wheat crop was found to be 85.71% with the overall average accuracy of 81.22%.
Acreage estimation and relative deviation
For each vegetation index (VI), using the respective range or the wheat crop, the thematic map is prepared with two layers, as shown in Fig 4. Wheat pixels were collected using the attribute table and the wheat acreage was calculated by multiplying the total number of pixels with the spatial resolution (30×30) of the input Image. The estimated acreage using various VIs are shown in Table 5. The NDVI shows the least relative deviation of 5.97, whereas TNDVI shows the maximum deviation of 9.79.
Correlation analysis
Initially, correlation coefficients between wheat yield and Spectral indices/agrometeorological variables were obtained, as shown in Fig 5. From the figure, we can see that the crop yield is strongly correlated to NDVI and minimum temperature, having correlation coefficients of 0.89 and -0.71, respectively. The yield has a feeble dependency on sunshine hours and temperature difference. Although rainfall does play an important role in the crop yield production, it is not very effective in the Saharanpur district as crop yield shows a small positive correlation of 0.54 with rainfall, probably due to availability of other water sources in the district.
Stepwise linear regression
Step-1 of regression
In the first step, all eight variables were included and multivariate linear regression was performed. The general equation of the regression is given below:
Yield = b1 + b2 × Tmin + b3 × Tmax + b4 × Rain + b5 × SH +b6 × GDD + b7 × TD + b8 × HTU + b9 × NDVI
The best-fitted equation of regression is shown below.
Yield ={401.4-50.32×Tmin+12.99 × Tmax+7.69 × Rain +10.45 × SH +23.96 × GDD +21.93 × TD+1.09 × HTU+2348.7 × NDVI}
The plot of observed and fitted wheat crop yield is shown in Fig 6 and the coefficients along with their uncertainty are given in Table 6. The t-stat is the coefficient divided by its standard deviation in the regression and the p-value represents the significance of the coefficient. The t-stat value of more than 1 and p-value of less is 0.05 is considered a good significance of the coefficient and corresponding variable.
With all the variables included in the regression, the model shows an RMSE of about 28.6 kg/ha and an R
2 value of 0.993. However, from the Table 7, we can see that the coefficients b
5 and b6 (coefficients of SH and GDD) have very small t-stat (less than 0.1) and stand out of other variables, therefore, in the next step of regression, they were removed from the analysis.
Step-2 of regression
In the second step, six variables were included after discarding SH and GDD and regression was performed. The general equation for the regression is:
Yield =b1 + b2 × Tmin + b3 × Tmax + b4 × Rain + b5 × TD+ b6 × HTU + b7 × NDVI
After performing the linear regression, the best-fitted equation is given below.
Yield= {379.13 - 35.90 × Tmin + 18.21 × Tmax + 7.69 × Rain + 26.49 × TD + 1.71 × HTU + 2348.7 × NDVI}
The plot of observed and fitted crop yield is shown in Fig 7 and the coefficients with their uncertainty statistics are shown in Table 8.
By excluding the SH and GDD from regression, the model shows an improved RMSE of about 24.85 and R
2 value of 0.993. However, from Table 8, we can see that the coefficients b
1, b
4 and b
6 (Constant, Rain and HTU) have small t-stat and p-value more than 0.05. Thus, they are not significant at a 95% Confidence Level; therefore, in the next step of regression, these variables were removed from the analysis.
Step-3 of regression
In the third step, only four variables were included after discarding the constant term, Rain and HTU and regression was performed. The general form of the regression equation is:
Yield = b1 × Tmin + b2 × Tmax + b3 × TD + b4 × NDVI
The best-fitted equation is given below.
Yield = 37.66 × Tmin + 25.51 × Tmax +34.21 ×TD ×2728.5 ×NDVI
The plot of observed and fitted crop yield is shown in Fig 8 and the coefficients with their uncertainty statistics are shown in Table 9.
Even by incorporating only four variables, the model shows an RMSE of about 49 and an R2 value of 0.95. Hence it explains the importance of NDVI and temperature in the estimation of wheat crop yield for Saharanpur district. The p-value of NDVI is the smallest and hence has the most significant coefficient in the regression. To further check the sole dependence of NDVI on wheat crop yield, another spectral yield model was evaluated by incorporating just the NDVI index.
Spectral yield model
In the spectral yield model, a linear relationship between crop yield and NDVI was assumed.
Yield= b1 + b2 × NDVI
The best-fitted regression equation is given below:
Yield= 2009 + 2399.56 × NDVI
The plot of observed and fitted crop yield is shown in Fig 9 and the coefficients with their uncertainty statistics are shown in Table 10.
We found an RMSE of about 88 and R
2 of 0.808 by just incorporating NDVI. Hence although NDVI is the most important index for crop yield estimation, the performance of the model was improved by including temperature and its indices.