Trait Based Modelling Approach for Selection of Elite Germplasm Accessions in Soybean [Glycine max (L). Merrill]

DOI: 10.18805/LR-4567    | Article Id: LR-4567 | Page : 822-827
Citation :- Trait Based Modelling Approach for Selection of Elite Germplasm Accessions in Soybean [Glycine max (L). Merrill].Legume Research.2022.(45):822-827
K. Shruthi, R. Siddaraju, K. Naveena, T.M. Ramanappa, C. Gireesh, K. Vishwanath, K.S. Nagaraju shruthikns3@gmail.com
Address : University of Agricultural Sciences, National Seed Project, Bengaluru-560 065, Karnataka, India.
Submitted Date : 9-12-2020
Accepted Date : 14-04-2021


Background: Identification of suitable factors that influence significantly to the response is crucial for the traits based breeding program to make a better decision about improvement in productivity. Multiple linear regression (MLR) is the benchmark method commonly using to identify suitable factors for crop improvement. It doesn’t work always due to stringent assumption (Multicollinearity, Linearity) behind the MLR model. Here we tried to develop an efficient model for the selection of major traits that contribute to seed yield in soybean by comparing different models.
Methods: Field experiment was conducted using 98 soybean core population through augmented design.18 morphometric traits obtain from soybean core population were considered under the study as regressors.Multiple linear regression (MLR), Principle component Regression (PCR), Regression tree and Random Forest models were compared to select traits based on prediction accuracy.
Result: All the models identified the number of pods per plant (NPP) has the most influencing variable to the soybean yield. However random forest has a much higher prediction power (RMSE=4.59, MAPE=0.18) compared to other models under study. The results of random forest revealed that the number of pods per plant, number of branches per plant and other associated characters like plant height at harvest as highly influencing traits for seed yield in soybean.Finally, tried to identify genotypesthat possess superiority about most influencing morphological characters on seed yield using cluster analysis.


Multiple linear regression Principle component analysis Random forest Regression tree Seed yield


  1. Anonymous (2009). Guidelines for the conduct of test for distinctiveness, uniformity and stability (DUS) on Soybean [Glycine max (L.) Merrill]. Plant Variety Journal of India. 3(10): 289-98.
  2. Crane-Droesch, A. (2018). Machine learning methods for crop yield prediction and climate change impact assessment in agriculture. Environmental Research Letters. 13(11): 114003.
  3. Eledum, H. (2016). A comparison study of ridge regression and principle component regression with application. International Journal of Research. 3B(8): 283.
  4. Ghanbari, S., Nooshkam, A., Fakheri, B.A. and Nafiseh M. (2018). Assessment of yield and yield component of soybean genotypes (Glycine Max L.) in north of khuzestan. J. Crop Sci. Biotechnol. 21: 435-441.
  5. Gireesh, C.S., Husain, M. Shivakumar, M., Satpute, G.K., Giriraj Kumawat, Mamta Arya, Agarwal, D.K. and Bhatia, V.S. (2015). Integrating principal component score strategy with power core method for development of core collection in Indian soybean germplasm. Plant Genetic Resources: Characterization and Utilization. 11: 1-9.
  6. Goyal, M. and Verma, U. (2018). Principal component technique for pre-harvest crop yield estimation based on weather input. Advances in Research. pp.1-8.
  7. Hasan, M.M., Yusop, M.R., Ismail, M.R., Mahmood, M., Rahim, H.A. and Latif, M.A. (2015). Performance of yield and yield contributing characteristics of BC2F3 population with addition of blast resistant gene. Ciência e Agrotecnologia. 39(5): 463-476.
  8. Husain, S.M. and Shrivastav, R.N. (2011). Personal communication, Directorate of Soybean Research (ICAR). pp. 1-13.
  9. Jeong, J.H., Resop, J.P., Mueller, N.D., Fleisher, D.H., Yun, K., Butler, E.E. andKim, S.H. (2016). Random forests for global and regional crop yield predictions. PLoS One. 11(6): 1-9.
  10. Johnston, R., Jones, K. andManley, D. (2015). Confounding and collinearity in regression analysis: a cautionary tale and an alternative procedure, illustrated by studies of British voting behaviour. Qual Quant. 52(4): 1957-1976. 
  11. Michel, L. and Makowski, D. (2013). Comparison of Statistical Models for Analyzing Wheat Yield Time Series. Plos One, 8(10): e78615.
  12. Naveena, K., Singh Subedar, Rathod Santosha and Singh Abhishek (2017). Hybrid time series modelling for forecasting the price of washed coffee (Arabica plantation coffee) in India, Intl. J. of Agril. Sci. 9(10): 4004-4007.
  13. Olivoto, T., de Souza, V.Q., Nardino, M., Carvalho, I.R., Ferrari, M., de Pelegrin, A.J. andSchmidt, D. (2017). Multicollinearity in path analysis: a simple method to reduce its effects. Agronomy Journal. 109(1): 131-142.
  14. Pearl, J. Pearl, (2000). Causality: Models, Reasoning and Inference. Cambridge University Press, New York.
  15. Roberts, M.J., Noah, O Braun, N.O., Sinclair, T.R., Lobell, B.D. and Wolfram Schlenker, (2017). Comparing and combining process-based crop models and statistical models with some implications for climate change. Environ. Res. Letters. 12(9): 095010.
  16. Shruthi, K., Siddaraju, R., Naveena, K., Ramanappa, T.M. and Vishwanath, K. (2021). Assessment of variability based on morphometric characteristics in the core set of soybean germplasm accessions. Legume Research. 44(4): 375-381. DOI: 10.18805/LR-4286.
  17. Singh, R.J. and Hymowitz, T. (1999). Soybean genetic resources and crop improvement. Genome. 42: 605-616.
  18. Shi, W., Tao, F. and Zhang, Z. (2013). A review on statistical models for identifying climate contributions to crop yields. Journal of geographical sciences. 23(3): 567-576.
  19. Vu, T.T. H., Le, T.T.C., Vu, D.H., Nguyen, T.T. and Ngoc, T. (2019). Correlations and path coefficients for yield related traits in soybean progenies. Asian Journal of Crop Science. 11(2): 32-39.

Global Footprints