Selection of the prediction variables and model fitting
The results of the VIF test showed no multicollinearity amongst the predictive variables. Therefore, we considered season, SBS, SBT, WD, chlorophyll concentration and pH as candidate predictive variables for modelling.
The predictive variables and their relative importance are different amongst the three models (Table 1). ANNs retained more predictive variables and their fitting degree was the highest amongst the three models. The fitting results of the three models showed that season and SBT were the two most important factors of the relative abundance of
P. trituberculatus (Table 1). GAMs showed a relatively simple relationship between
P. trituberculatus and environmental factors, whereas RFs and ANNs showed more complex relationships (Fig 2). The cross-validation results showed that the median R
2 of RFs was the largest and the median RRE of RFs was the smallest, which indicated that the predictive performance of RFs was the best amongst the three models (Fig 3a). BY contrast, the median R
2 of ANNs was the smallest and the RRE of ANNs was the largest, which indicated that the predictive performance of ANNs was poor (Fig 3b). DR
2 of GAMs, RFs and ANNs were 0.21, 0.15 and 0.46, respectively. RFs and ANNs had the lowest and highest overfitting degree, respectively. Considering the predictive performance and overfitting degree of the models, RFs were the best amongst the three models.
Mapping of P. trituberculatus distributions
Differences in
P. trituberculatus distributions in different seasons were observed with regard to time and the relative abundance of
P. trituberculatus in summer was significantly higher than that in the other seasons (Fig 4). With regard to space, the relative abundance of
P. trituberculatus in the northern sea area was higher than that in the southern sea area, which was most apparent in spring and winter (Fig 4).
Comparison between RFs and ANNs in different seasons and number of stations
Based on the variation of RRE and R
2, the predictive performance of RFs and ANNs was significantly affected by the number of stations and differences were observed amongst different seasons (Fig 5 and Fig 6). With the increase of the number of stations, R
2 of RFs and ANNs gradually increased (Fig 5) and the corresponding RRE gradually decreased (Fig 6). Therefore, with the increase of the number of stations, the predictive performances of the two models were gradually improved and the predictive performances of the two models in spring and winter were better than those in summer and autumn (Fig 5 and Fig 6). The predictive performance of RFs was better than that of ANNs.
Comparison of models
Amongst the three models used in this study, ANNs have the best fitting performance and RFs have the best predictive performance. The results show that the fitting effect of the models on the training data set cannot guarantee the same predictive effect for the test data set and the performances of the three models are better on the training data set than on the test data set, which indicates an overfitting phenomenon. The models interpreted the sample noise but deviated from the interpretation of the real value; thus, the training data had a good fitting effect, but the prediction ability outside the training data set was not as good as that of training data
(Luan et al., 2018). Based on the value of ΔR
2, RFs have slight overfitting because of its ensemble learning, which improves the predictive accuracy by aggregating the results of multiple regression trees (
Cai 2012).
Distribution of P. trituberculatus and influencing factors
The distribution of
P. trituberculatus in the northern East China Sea has a clear seasonal variation because of the comprehensive effects of the environmental physicochemical factors, ocean currents and water masses in different seasons and the change of temperature amongst seasons
(Yuan et al., 2016).
Spring is the peak spawning season of
P. trituberculatus. In spring,
P. trituberculatus are less distributed along the Yangtze River Estuary, which may be affected by the runoff of the Yangtze River and no suitable spawning environment for
P. trituberculatus is available. Summer is the foraging season of
P. trituberculatus and the young
P. trituberculatus hatched in that year fatten in the coastal shallow waters
(Yuan et al., 2016). The relative abundance of
P. trituberculatus is significantly higher in summer than in other seasons, which is closely related to the high SBT in summer. In autumn, the juvenile
P. trituberculatus gradually grow and move to deep water. The water temperature along the coast gradually drops with the cold air going south.
P. trituberculatus also migrate from north to south and from shallow water to deep water. In winter, the density of P. trituberculatus is evidently higher in the north than in the south and a banded area with less
P. trituberculatus is observed in the south. This observation is consistent with the location of the Taiwan warm current entering the northern East China Sea from south to north in winter, which indicates that the Taiwan warm current may affect the distribution of
P. trituberculatus.
In this study, SBT is considered as an important environmental factor affecting the distribution of
P. trituberculatus. Most
P. trituberculatus inhabit the sea floor and they are greatly affected by the bottom environmental factors. Thus, SBT is a factor affecting
P. trituberculatus distribution.
In the four seasons, the predictive performance of RFs and ANNs showed a gradually increasing trend with the increase of the number of stations. The predictive performance of RFs and ANNs in spring and winter was significantly higher than that in summer and autumn, which may be related to the distribution of
P. trituberculatus. In summer and autumn, the resource of
P. trituberculatus is evenly distributed, whereas in spring and winter, the resource density is high in the north and low in the south. In general, fishery data with high contrast are suitable for stock assessment models to obtain accurate results. The degrees of difference of the distributions of
P. trituberculatus are higher in spring and winter than those in summer and autumn. Therefore, the modelling of RFs and ANNs is more reliable in spring and winter, which may need relatively few survey stations. This result also indicates that setting different number of survey stations according to the resource distribution of
P. trituberculatus is necessary to save costs.
The three models established in this study do not directly involve the ecological process and their interpretations depend on the existing understanding of the life history characteristics of
P. trituberculatus. The environmental requirements of
P. trituberculatus in different growth stages are also different. Therefore, in our future research, the life history of
P. trituberculatus will be combined with the environmental factors to explore the comprehensive effect of
P. trituberculatus at different growth stages.