Primary statistical analysis
The weekly series of jute prices for the Cooch Behar market of West Bengalis shown in Fig 1. The Cooch Behar market series depicts an up-and-down pattern, with two sharp rises between 2013 and 2016 and another between 2020 and 2022.The descriptive statistics to summarize information from the weekly jute price data are listed in Table 1. As Table 1 shows, the series of jute prices in the Cooch Behar market has a mean of 4188, a standard deviation of 1518.44 and, 36.26% coefficient of variation, suggesting that it has been volatile. In addition, skewness and kurtosis statistics show that the price series is not normally distributed.
To begin with the implementation of ARIMA and ARFIMA models, the data series are divided into two sets: The training set and the testing set. First, the model is fitted using the training data with 538 observations (
i.
e., from the 1
st week of 2009 to the 10
th week of 2020) and then it is predicted over the validation period using a testing set with the last 134 observations (
i.
e., from the 11
th week of 2020 to the 48
th week of 2022).
Test for stationarity
The first step in applying ARIMA and ARFIMA models is to check whether the time series is stationary or not. In order to test for stationarity, we first conducted Augmented Dickey-Fuller and Phillips-perron unit root tests on the training dataset of the series. According to the results, the p-values (Test statistic) of the ADF and PP tests are 0.494(-2.20) and 0.251(-15.39), respectively, indicating that the time series under consideration is clearly non-stationary. The study, therefore, proceeded to find the stationary series.
Test for long memory and estimation
The presence of long memory in a time series (training set) was confirmed by investigating the autocorrelation function (ACF) plot, which shows that the correlations decay very slowly towards zero up to 250 lags (Fig 2), indicating the presence of long memory processes. Accordingly, the presence of long memory is tested as discussed in methodology and it is found that the R/S Hurst value (H = 0.873) is higher than 0.5, which firmly concludes the existence of the long memory characteristic of the jute prices. The models that consider the long memory property are very sensitive to the estimation of the long-memory parameter (
i.
e., the fractional differencing parameter) and for this reason, in this study, it has been estimated by using the wavelet-based ordinary least squares estimator (d
wavelet) andis found to be 0.327.
After determining the fractional differencing parameter we obtained the fractional and first-order differencing time series shown in Fig 3. For that, the stationary test results are shown in Table 2. The -values of the ADF and PP tests are less than 5%, which reveal the series has become stationary, which is also confirmed by Fig 3.
Model Identification
To establish ARIMA and ARFIMA models, the values of, and must be determined. In the above section, we have identified the value of . Now in this section, we are going to find the optimal value of and which are order of autoregressive and moving average terms. We used the training set as in-sample data for the determination of the parameters and of the ARIMA and ARFIMA models. First, we computed the values of autocorrelation and partial autocorrelation for fractionally differenced series and first-order differenced time series, as illustrated in Fig 4-5. On computation of ACF and PACF for each estimated difference parameter, it is observed that the decay rate of ACF has improved as compared to the decay of ACF in the actual training set (Fig 4). The orders of non-seasonal parameters and (q) are obtained by looking for significant spikes in autocorrelation and partial autocorrelation functions.
In the identification stage, we estimated different ARIMA and ARFIMA specifications with different combinations of (AR terms) and (MA terms), which are listed in Table 3 and selected the appropriate model from each method as having the minimum values of AIC and BIC values. Thus, the models selected for the training period are ARFIMA (1,0.327,1), ARFIMA (3,0.327,1) and ARIMA (3,1,0).
Validation and diagnostic checking
After appropriate ARFIMA and ARIMA models have been obtained, the next step is to see their ability to forecast the data. The model verification process is concerned with examining residuals obtained from fitted models to see if they contain any systematic pattern that could still be removed to improve the chosen models. This has been done through the Ljung-box diagnostic test and it is found that the -value of the Ljung-box test is more than 5% (Table 4), which means that the model residual meets the assumption of white noise residuals. The evaluation of forecasting performance has been done for the test set as an out of-sample period of 134 observations. Table 4 represents the results of the models based on the three different accuracy performance measures: RMSE, MAE and MAPE.
As shown in Table 4, comparing the validation results of all three models indicates that all are likely to perform well in the forecasting phase and, it is observed that the ARFIMA(3,0.327,1) model produces the lowest RMSE, MAE and MAPE, which are164.42, 105.66 and 1.70, respectively. It can be concluded that the wavelet method based ARFIMA (3,0.327,1) model is the most accurate compared to other models, where predictions indicate that there are narrow variations between the actual and predicted values of jute prices (Fig 6). The strength of the ARIMA model in forecasting jute prices in the Cooch Behar market is considerable, but Table 4 shows that the ARIMA does not perform well. That the most accurate model is conclude to forecast the weekly jute prices in the Cooch Behar market of West Bengal is the ARFIMA (3,0.327,1) model, which is given as: