Agricultural Science Digest

  • Chief EditorArvind kumar

  • Print ISSN 0253-150X

  • Online ISSN 0976-0547

  • NAAS Rating 5.52

  • SJR 0.156

Frequency :
Bi-monthly (February, April, June, August, October and December)
Indexing Services :
BIOSIS Preview, Biological Abstracts, Elsevier (Scopus and Embase), AGRICOLA, Google Scholar, CrossRef, CAB Abstracting Journals, Chemical Abstracts, Indian Science Abstracts, EBSCO Indexing Services, Index Copernicus
Agricultural Science Digest, volume 44 issue 3 (june 2024) : 445-452

Assessing Genetic Relationships, Trait Associations and Diversity Patterns in Sorghum Germplasm Through Correlation, Cluster and Principal Component Analysis

Neladri Sekhar Sarkar1,*, T. Kalaimagal1, D. Kavithamani1, R. Chandirakala1, S. Manonmani1, M. Raveendran2, A. Senthil3
1Centre for Plant Breeding and Genetics, Tamil Nadu Agricultural University, Coimbatore-641 003, Tamil Nadu, India.
2Directorate of Research, Tamil Nadu Agricultural University, Coimbatore-641 003, Tamil Nadu, India.
3Department of Crop Physiology, Tamil Nadu Agricultural University, Coimbatore-641 003, Tamil Nadu, India.
Cite article:- Sarkar Sekhar Neladri, Kalaimagal T., Kavithamani D., Chandirakala R., Manonmani S., Raveendran M., Senthil A. (2024). Assessing Genetic Relationships, Trait Associations and Diversity Patterns in Sorghum Germplasm Through Correlation, Cluster and Principal Component Analysis . Agricultural Science Digest. 44(3): 445-452. doi: 10.18805/ag.D-5896.

Background: The loss of biodiversity has a significant impact on the fundamental services provided by ecosystems to humanity, including plant development and genetic improvement. Germplasm serves as the foundational material for identifying genetic variations. In this context, the examination of sorghum germplasm diversity has been conducted.

Methods: The study involved 86 different sorghum germplasm samples that were assessed alongside three control groups, each replicated three times, during the Rabi season of 2021. This evaluation was conducted using augmented block design I (ABD I) at the Department of Millets, Tamil Nadu Agricultural University in Coimbatore.

Result: The results of the Pearson correlation analysis revealed significant relationships notably, traits such as plant height, number of leaves, panicle length, panicle width, panicle weight, hundred seed weight and dry fodder yield exhibited positive and significant correlations with grain yield per plant. Cluster analysis identified four distinct groupings among the 86 accessions, with clusters 1 and 4 displaying the greatest diversity. Principal component analysis indicated that PC1 accounted for the largest variability. The genotypes were identified through PCA analysis is having greater variation and better performance. The findings of this study suggest that the identified sorghum genotypes could serve as valuable genetic resources for enhancing sorghum productivity in dry and semi-arid regions, particularly in the face of unpredictable climate change.

Sorghum bicolor, commonly referred to as sorghum, is cultivated for its grain. This versatile crop serves various purposes, including human consumption as food, animal feed and the production of ethanol. It is also known by other names such as great millet, broomcorn, guinea corn, durra, jowar. The semi-arid regions of Africa and Asia heavily rely on sorghum, a crucial staple food crop that provides sustenance for millions. As the fifth most significant millet crop globally its cultivated in approximately 100 countries underscoring its importance for communities throughout the world (Hariprasanna and Patil, 2015). One of the reasons for its importance lies in its suitability for low-input cultivation and its adaptability to diverse environmental conditions. Being a diploid (2n=20), short-day C4 plant, sorghum exhibits a higher photosynthetic rate and robust resistance to abiotic stresses (Nagy et al., 1995). Its versatility is evident as it serves various purposes, such as fiber, food, fodder and fuel. The grain portion is utilized for animal and poultry feed, while rural communities rely on it as fodder for livestock and the grains themselves are a staple dietary component. Sorghum exhibits remarkable adaptability to a wide range of environments and demonstrates a high yield of dry fodder, making it a promising resource for supplementing fodder availability. Cultivated sorghum varieties vary in height, ranging from 0.5 meters to 4.6 meters. These varieties have been categorized into five races, namely Caudatum, Durra, Kafir, Bicolor and Guinea, primarily based on morphological traits such as glume, grain and panicle characteristics (Harlan and de Wet, 1972). India and Africa collectively account for approximately 70% of the global sorghum cultivation. Notable sorghum-producing countries include the USA, Mexico, Sudan, Ethiopia and Nigeria (Kumar et al., 2011).
 
Germplasm is the base material for finding any genetic variability. It is defined as a gene pool of the species including land races, good cultivars, advanced breeding lines and weedy relatives (Upadhyaya et al., 2010). Sorghum has been cultivated in India for many generations because of that India is home to a diverse array of sorghum varieties. This diversity gives valuable resource to plant breeder for developing new and improved varieties of sorghum that can be used all over the world. India is considered as a secondary centre of origin as it having vast diversity of Sorghum (Vavilov, 1951). It has a rich genetic diversity for this it is a valuable source of resistance genes and desired traits. Perhaps this diversity can be used to produce varieties high yield, good grain quality, disease resistance and adaptation to environmental stresses. A variability study is very crucial to understand the phenotypic and genotypic diversity in a population of sorghum accessions, the most efficient multivariate analyses to assess trait interaction and genotype performance are correlation, PCA biplot and clustering, which are also widely used to analyse the relationship between traits in various agricultural plants. With the help of association study it’s possible to identify the relationships of the traits and genetic marker in the population.
The experimental material consists of 86 sorghum germplasm accessions (Table 1). The accessions were evaluated with three checks with three replications along during Rabi-Summer 2021-2022 in Augmented block design I (ABD I) at Department of Millets, Tamil Nadu Agricultural University, Coimbatore. Each line was sown in 4 meter length with 45 cm. x 15 cm. spacing in two rows. All the data were taken based on descriptor guidelines. Observations were taken on five randomly selected plants of each genotype for quantitative parameters for the two seasons. Then the means were adjusted.

Table 1: The list of 86 sorghum germplasm.


 
The degree of association of the studied traits was determined by correlation coefficients among them. The correlation coefficient matrix were visualized using the packages “factoextra”, “Performance Analytics” and “psych”. The hierarchical cluster algorithm was used to group related genotypes into groups.  Although the germplasm inside the extracted clusters was generally comparable to one another, the extracted clusters are unique from one another. The packages were adapted to the library are “dendextend”, “circlize”, “factoextra” and “cluster” to generate robust hierarchial clustering was done using the  RStudio (v 4.1.1). To correlate the standardised morphological characteristic, principal component analysis (PCA) was utilised recommended by Sneath and Sokal (1973). The dataset’s dimensionality was decreased using PCA without sacrificing valuable information. Latent vectors, eigenvalues and PCA-biplot are extracted from the PCA using the software packages “ggplot2”, “factoextra” and “FactoMineR” in RStudio v 4.1.1 by (Wickham, 2016 and Le et al., 2008).
Correlation analysis
 
To gain fresh insights into our dataset, we employed scatter plots and histograms to examine the variables. This analysis led to two important findings. Firstly, we observed a significant overlap between our data groups. Secondly, each variable displayed remarkably similar growth patterns overall. To further explore the relationships between these characteristics, we now intend to utilize Pearson correlation analysis. By employing this method, we aim to acquire comprehensive information about the degree of interdependence among the traits, potentially uncovering novel connections that can enhance scientific knowledge. The obtained correlation coefficients from this analysis measure the extent of association between the traits. Notably Fig 1 illustrates the outcomes of the Pearson correlation analysis, which indicate a substantial relationship among the observed traits. Specifically, plant height (PH), number of leaves (NOL), panicle length (PL), panicle width (PW), panicle weight (PWt.), hundred seed weight (HSW) and dry fodder yield (DFY) exhibit a positive and significant correlation with grain yield per plant. The connection between panicle width and The characters plant height showing r=0.23, number of leaves showing r= 0.31, panicle length showing r=0.57, panicle width showing  r= 0.45, panicle weight showing r=0.30, hundred seed weight showing r=0.38 and dry fodder yield showing r=0.85 with the grain yield per plant(GYPP). It is evident from these discoveries that there is a correlation between grain yield and the aforementioned traits. Selecting for any of these traits that contribute to grain production will lead to an increase in all other traits. Mathivathana et al., (2015), Chalachew and Rebuma (2018), Ashik et al., (2023) and Mulualem et al. (2018) had shown similar results with grain yield per plant.

Fig 1: Scatter Plot, frequesncy distribution and correlation analysis of the 86 genotypes on the basis of agromorphological traits.


 
Cluster analysis
 
Hierarchical clustering dendrogram is a type of unsupervised learning algorithm that groups similar objects into clusters. The dendrogram can be a useful visualization tool to understand the structure of the data and to identify natural groupings among the observations. Here in this study total four clusters were formed by the 86 germplasm (Fig 2). Cluster I and Cluster IV led the maximum number of germplasm having 63 and 9 genotypes. Cluster II and  Cluster III possessed the lowest number of germplasm having 6 and 8 genotypes each cluster (Table 2).

Fig 2: Clustering depicting genetic relationship among sorghum germplasm.



Table 2: Clustering of germplasm lines.


 
The mean values of four groups of 86 genotypes were calculated and the cluster means were presented (Table 3). Cluster 2 exhibited the maximum mean for days to maturity (DM), days to fifty percent flowering (DFF) and flag leaf length (FLL), while cluster 3 showcased the highest mean for the number of leaves (NOL), plant height (PH) and panicle length(PL). Moreover, cluster 4 demonstrated the maximum mean for stem girth (SG),  panicle weight (PWt), dry fodder yield (DFY), grain yield per plant (GYPP), flag leaf width (FLW), panicle width (PW) and hundred seed weight (HSW). Hence, the genotypes linked with these clusters can be utilized as parental in a breeding plan (Navya et al., 2021).

Table 3: Cluster mean of all 86 germplasm.


 
Principal component analysis (PCA)
 
A multivariate statistical approach known as PCA is used to examine and deconstruct complicated and sizable datasets. Using PCA, the genetic diversity of the sorghum genotypes and their relationship to the observed attributes were assessed based on the correlation between the traits and the pattern of variation in the genotypes.
 
Scree plots are straightforward line segment plots that display the percentage of overall variance in the data (Fig 3). The correlation matrix’s eigenvalues are plotted in descending order of magnitude. In order to deal with fewer components, removing components having an eigenvalue of <1as recommended by Brejda et al., (2000) data with Eigen values greater than 1 were taken into account for each component. It was stated that PCA highlights the what is significance of the main source of variation at each differentiation axis. PCA condenses a big set of variables into smaller sets of components that summarise the correlations. The first five eigenvalues of the PCA scree plot correspond to the entire proportion of the variance in the dataset.
 

Fig 3: Scree plot for the principal components.



Thirteen principle components (PCs) in all were found, although only five of these were deemed significant by having eigenvalues greater than 1 (Table 4). The remaining non-significant PCs (eigenvalue 1) weren’t interesting enough to warrant additional analysis. These five PCs were responsible for 68.743% of the variation in the sorghum germplasm measured for various morphological features. These results were consistent with those from Nachimuthu et al. (2014). However, only 31.256% of the overall morphophysiological diversity for this collection of sorghum germplasm was provided by the remaining 8 components. Also, the main component analysis demonstrated that a limited number of characteristics cannot fully account for the heterogeneity in germplasm accessions.

Table 4: Plant characteristic extracted eigenvalues and latent vectors connected to the first five main components.


 
The first five PCs were significant in the overall variability of many agromorphological parameters in sorghum, according to Ayana and Bekele (1999) and Fathima et al., (2023). The most crucial characteristics for grain sorghum’s ability to withstand drought, however, were discovered to be head width, head weight, grain production per plant, fresh shoot weight and dry shoot weight (Ali et al., 2011). The use of principal component analysis (PCA) to analyze data from multiple genotypes has proven useful in identifying those with the most desirable traits for breeding programs. Chikuta et al., (2015) analyzed 25 forage and 45 grain sorghum genotypes and found that the first four main components with eigenvalues greater than one accounted for the highest variation in the traits studied. Similarly, Abraha et al., (2015) found that the first four main components accounted for over 75% of the variation in grain yield, biomass, stay-green, leaf area, peduncle exertion, days to flowering and maturity. These findings suggest that PCA can be an effective tool in identifying genotypes with desirable traits for breeding programs.
 
From Table 4 it was found that the PC I had the greatest impact on variability (25.56%), followed by the PC II (13.602%), PC III (11.889%), PC IV (9.321%) and PC V (8.361%). Selecting qualities that can be divided into primary groups and subgroups based on homogeneity and dissimilarity can be done using a PCA biplot analysis. In order to maximise the diversity in the data, principal component analysis attempts to resolve the complete variation of a set of qualities into linear, independent composite traits (Johnson, 2012). The PCA biplot that took into account PC1 and PC2 at the same time, five groups of attributes were found in our data set. PCA biplot showed the PC1 exhibited about 25% of the total variability and explained principally by panicle length, panicle width, panicle weight, dry fodder yield, single plant yield. The second principal component accounted for about 13% and explained principally by days to fifty percent flowering, number of leaves, days to maturity. The third principal components explained about 11% of total variability and are contributed by hundred seed weight, plant height, stem girth and flag leaf length. The fourth principal component accounted for 9% and loaded partially on stem girth, flag leaf width and flag leaf width and the fifth principal components account for around 8% mainly through plant height and flag leaf width.
 
Accessions are dispersed extensively over each quadrant (Fig 4). In this data set, in our PCA biplot 3 groups of traits were identified considering both PC1 and PC2 (Fig 5). Days to fifty per cent flowering, flag leaf length, panicle weight, flag leaf width, stem girth, days to maturity clustered in group I. Then panicle length, hundred seed weight, Dry fodder yield, single plant yield, panicle width were brought under the group II and plant height and number of leaves fell in group II.  Interestingly through the PCA biplot it was revealed that the group I traits contributors to the PC2 significantly correlated with the genotypes of cluster I, II and IV. Whereas the genotypes of clusters I, III and IV were linked to the features of group II that was also a contributor to the PC1 and the genotypes of clusters IV were shown to be most strongly connected with the contribution of group III that belongs to the PC 1. The genotypes  ICSP 28 MFR, MR22/1, ICSV 209, YT 82164, M 26405, MR 119C, A 3822, CSV 4, ICSV 61, ICSV 202, MR 119C, MR 87, AS 512, A 3822 and B 35 were selected from the PCA analysis as good performer in the diversity analysis. PCA, is a statistical technique used to identify the most important variables or features in a given dataset. The insights gained from the studies conducted by Malik et al., (2011) could prove beneficial when choosing parental genotypes for breeding methods aimed at producing superior genotypes with desirable traits.

Fig 4: Biplot showing distribution of 86 sorghum germplasm.



Fig 5: Biplot of sorghum genotypes based on quantitative traits.

The results concluded that there is a connection between grain yield and the traits mentioned earlier. Two clusters, cluster 1 and cluster 4 are significantly distinct from each other, thus, a cross between these two clusters would result in a wide range of offspring. The results of the hierarchical cluster analysis closely matched those of the PCA. From the PCA analysis we found that, the PCA biplot also showed a substantial correlation between the genotypes of clusters I, II, III and IV and the group I characteristics contributors to the PC2. The genotypes of cluster I, II and some genotypes of cluster IV were revealed to be most closely associated with the contribution of group I to the PC 1, in contrast to the genotypes of clusters III and IV which were linked to the characteristics of group II and group III, which is also a contributor to the PC1 and PC2. Based on the PCA analysis, the genotypes ICSP 28 MFR, MR22/1, ICSV 209, YT 82164, M 26405, MR 119C, A 3822, CSV 4, ICSV 61, ICSV 202, MR 119C, MR 87, AS 512, A 3822 and B 35 were identified as strong performers in the diversity assessment. Overall this study offers valuable insights into the genetic diversity of sorghum germplasm and identifies genotypes that show promise in improving grain yield and important nutritional traits. These findings have significant implications for enhancing crop productivity and implementing biofortification initiatives in dry and semi-arid regions. This research highlights the importance of sorghum in sustainable agriculture and underscores its potential as a key crop for addressing food security and nutrition challenges in challenging environments. 
This passage outlines the roles of different individuals in a research project. Neladri Sekhar Sarkar, T. Kalaimagal, D. Kavithamani, M. Raveendran, S. Manonmani, A. Senthil and R. Chandirakala are mentioned as contributing in various ways. T. Kalaimagal, S. Manonmani and D. Kavithamani were involved in conceptualizing and designing the experiments, while Neladri Sekhar Sarkar, T. Kalaimagal, D. Kavithamani and R. Chandirakala executed them and collected data. Neladri Sekhar Sarkar and T. Kalaimagal analyzed the data and interpreted the results. Finally, Neladri Sekhar Sarkar, T. Kalaimagal, D. Kavithamani, M. Raveendran, S. Manonmani, A. Senthil and R. Chandirakala prepared the manuscript for publication.
Upon a reasonable request, the corresponding author can provide the data that backs up the findings of this study.
 
 The authors declare there is no conflict of interest.

  1. Abraha, T., Githiri, S.M, Kasili, R., Araia, W. and Nyende, A.B. (2015). Genetic variation among sorghum [Sorghum bicolor (L.) Moench] landraces from eritrea under post flowering drought stress conditions. American Journal of Plant Sciences. 6(9): 1410. doi:10.4236/ajps.2015.6914.

  2. Ali, M.A., Jabran, K., Awan, S.I., Abbas, A., Zulkiffal, M., Acet, T. and Rehman, A. (2011). Morpho-physiological diversity and its implications for improving drought tolerance in grain sorghum at different growth stages. Australian Journal of Crop Science. 5(3): 311-320.

  3. Ashik, T., Islam, M., Rana, S., Jahan, K., Urmi, T.A., Jahan, N.A. and Rahman, M. (2023). Evaluation of salinity tolerant wheat (Triticum aestivum L.) genotypes through multivariate analysis of agronomic traits. Agricultural Science Digest. 43(4): 417-423. doi: 10.18805/ag.D-365.

  4. Ayana, A. and Bekele, E. (1999). Multivariate analysis of morphological variation in sorghum [Sorghum bicolor (L.) Moench] germplasm from ethiopia and eritrea. Genetic Resources and Crop Evolution. 46(3): 273-84. doi: 10.1023/A:10086 57120946.

  5. Brejda, J.J., Moorman, T.B., Karlen, D.L. and Dao, T.H. (2000). Identification of regional soil quality factors and indicators. I. Central and Southern high-plains. Soil Science Society of America Journal. 64: 2115-2124.

  6. Chikuta, S., Odong, T., Kabi, F. and Rubaihayo, P. (2015). Phenotypic diversity of selected dual purpose forage and grain sorghum genotypes. American Journal of Experimental Agriculture. 9: 6. doi:10.9734/AJEA/2015/20577.

  7. Chalachew, E. and Rebuma, M. (2018). Productivity of sweet sorghum genotypes under contrasting fertility management for food and ethanol production. Advances in Crop Science and Technology. 6(2): 1-7. 

  8. Fathima, A.F., Pugalendhi, L., Saraswathi, T., Manivannan, N. and Raveendran, M. (2023). Unraveling the relationship between fruit yield and yield related components in snake gourd genotypes using multivariate analysis. Agricultural Science Digest. 43(5): 661-667. doi: 10.18805/ag.D-5753.

  9. Harlan, J.R. and de Wet, J.M.J. (1972). A simple classification of cultivated sorghum. Crop Science. 12: 172-176.

  10. Hariprasanna, K. and Patil, J.V. (2015). Sorghum: Origin, classification, biology and improvement. Sorghum Molecular Breeding. 3-20. 

  11. Johnson, D.E. (2012). Applied Multivariate Methods for Data Analysis. New York: Duxbury Press.

  12. Kumar, C.V.S, Sreelakshmi, C. and Shivani, D. (2011). Assessment of variability and cause and effect relationship in interspecific crosses of sorghum. Journal of Research ANGRAU. 39: 48-52. 

  13. Le, S., Josse, J. and Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software. 25: 1-18.

  14. Mathivathana, M.K., Shunmugavalli, N., Muthuswamy, A. and Harris, C.V. (2015). Correlation and path analysis in black gram. Agricultural Science Digest-A Research Journal. 35(2): 158-160. doi: 10.5958/0976-0547.2015.00030.0.

  15. Malik, W., Iqbal, M.Z., Khan, A.A., Noor, E., Qayyum, A.  and Hanif. M. (2011). Genetic basis of variation for seedling traits in Gossypium hirsutum L. African Journal of Biotechnology. 10: 1099-1105.

  16. Mulualem, T., Alamrew, S., Tadesse, T. and Wegary, D. (2018). Correlation and path coefficient analysis for agronomical traits of lowland adapted ethiopian sorghum genotypes [Sorghum bicolor (L.) Moench] genotypes. Greener Journal of Agricultural Sciences. 8: 155-159. 

  17. Nachimuthu, V.V., Robin, S., Sudhakar, D., Raveendran, M., Rajeswari, S. and Manonmani, S. (2014). Evaluation of rice genetic diversity and variability in a population panel by principal component analysis. Indian J. Sci. Technol. 10: 1555-1562. 

  18. Nagy, Z.Z., Tuba, F., Soldos, Z. and Erdei, L. (1995). CO2 exchange and water relation responses of sorghum and maize during water and salt stress. Journal of Plant Physiology. 145: 539-544. 


  19. Upadhyaya, H.D., Yadav, D., Dronavalli, N., Gowda, C.L.L and Singh, S. (2010). Mini core germplasm collections for infusing genetic diversity in plant breeding programs. Electronic Journal of Plant Breeding. 1(4): 1294-130.

  20. Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy: The Principles and Practice of Numerical Classification. WF Freeman and Co., San Francisco, 573 p.

  21. Vavilov, N.I. (1951). The origin, variation, immunity and breeding of cultivated plants  (Translated by Chestitee, S.K.). Chronica Botonica. 13: 1-366. 

  22. Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA.

Editorial Board

View all (0)