Correlation analysis
To gain fresh insights into our dataset, we employed scatter plots and histograms to examine the variables. This analysis led to two important findings. Firstly, we observed a significant overlap between our data groups. Secondly, each variable displayed remarkably similar growth patterns overall. To further explore the relationships between these characteristics, we now intend to utilize Pearson correlation analysis. By employing this method, we aim to acquire comprehensive information about the degree of interdependence among the traits, potentially uncovering novel connections that can enhance scientific knowledge. The obtained correlation coefficients from this analysis measure the extent of association between the traits. Notably Fig 1 illustrates the outcomes of the Pearson correlation analysis, which indicate a substantial relationship among the observed traits. Specifically, plant height (PH), number of leaves (NOL), panicle length (PL), panicle width (PW), panicle weight (PWt.), hundred seed weight (HSW) and dry fodder yield (DFY) exhibit a positive and significant correlation with grain yield per plant. The connection between panicle width and The characters plant height showing r=0.23, number of leaves showing r= 0.31, panicle length showing r=0.57, panicle width showing r= 0.45, panicle weight showing r=0.30, hundred seed weight showing r=0.38 and dry fodder yield showing r=0.85 with the grain yield per plant(GYPP). It is evident from these discoveries that there is a correlation between grain yield and the aforementioned traits. Selecting for any of these traits that contribute to grain production will lead to an increase in all other traits.
Mathivathana et al., (2015), Chalachew and Rebuma (2018),
Ashik et al., (2023) and
Mulualem et al. (2018) had shown similar results with grain yield per plant.
Cluster analysis
Hierarchical clustering dendrogram is a type of unsupervised learning algorithm that groups similar objects into clusters. The dendrogram can be a useful visualization tool to understand the structure of the data and to identify natural groupings among the observations. Here in this study total four clusters were formed by the 86 germplasm (Fig 2). Cluster I and Cluster IV led the maximum number of germplasm having 63 and 9 genotypes. Cluster II and Cluster III possessed the lowest number of germplasm having 6 and 8 genotypes each cluster (Table 2).
The mean values of four groups of 86 genotypes were calculated and the cluster means were presented (Table 3). Cluster 2 exhibited the maximum mean for days to maturity (DM), days to fifty percent flowering (DFF) and flag leaf length (FLL), while cluster 3 showcased the highest mean for the number of leaves (NOL), plant height (PH) and panicle length(PL). Moreover, cluster 4 demonstrated the maximum mean for stem girth (SG), panicle weight (PWt), dry fodder yield (DFY), grain yield per plant (GYPP), flag leaf width (FLW), panicle width (PW) and hundred seed weight (HSW). Hence, the genotypes linked with these clusters can be utilized as parental in a breeding plan
(Navya et al., 2021).
Principal component analysis (PCA)
A multivariate statistical approach known as PCA is used to examine and deconstruct complicated and sizable datasets. Using PCA, the genetic diversity of the sorghum genotypes and their relationship to the observed attributes were assessed based on the correlation between the traits and the pattern of variation in the genotypes.
Scree plots are straightforward line segment plots that display the percentage of overall variance in the data (Fig 3). The correlation matrix’s eigenvalues are plotted in descending order of magnitude. In order to deal with fewer components, removing components having an eigenvalue of <1as recommended by
Brejda et al., (2000) data with Eigen values greater than 1 were taken into account for each component. It was stated that PCA highlights the what is significance of the main source of variation at each differentiation axis. PCA condenses a big set of variables into smaller sets of components that summarise the correlations. The first five eigenvalues of the PCA scree plot correspond to the entire proportion of the variance in the dataset.
Thirteen principle components (PCs) in all were found, although only five of these were deemed significant by having eigenvalues greater than 1 (Table 4). The remaining non-significant PCs (eigenvalue 1) weren’t interesting enough to warrant additional analysis. These five PCs were responsible for 68.743% of the variation in the sorghum germplasm measured for various morphological features. These results were consistent with those from
Nachimuthu et al. (2014). However, only 31.256% of the overall morphophysiological diversity for this collection of sorghum germplasm was provided by the remaining 8 components. Also, the main component analysis demonstrated that a limited number of characteristics cannot fully account for the heterogeneity in germplasm accessions.
The first five PCs were significant in the overall variability of many agromorphological parameters in sorghum, according to
Ayana and Bekele (1999) and
Fathima et al., (2023). The most crucial characteristics for grain sorghum’s ability to withstand drought, however, were discovered to be head width, head weight, grain production per plant, fresh shoot weight and dry shoot weight
(Ali et al., 2011). The use of principal component analysis (PCA) to analyze data from multiple genotypes has proven useful in identifying those with the most desirable traits for breeding programs.
Chikuta et al., (2015) analyzed 25 forage and 45 grain sorghum genotypes and found that the first four main components with eigenvalues greater than one accounted for the highest variation in the traits studied. Similarly,
Abraha et al., (2015) found that the first four main components accounted for over 75% of the variation in grain yield, biomass, stay-green, leaf area, peduncle exertion, days to flowering and maturity. These findings suggest that PCA can be an effective tool in identifying genotypes with desirable traits for breeding programs.
From Table 4 it was found that the PC I had the greatest impact on variability (25.56%), followed by the PC II (13.602%), PC III (11.889%), PC IV (9.321%) and PC V (8.361%). Selecting qualities that can be divided into primary groups and subgroups based on homogeneity and dissimilarity can be done using a PCA biplot analysis. In order to maximise the diversity in the data, principal component analysis attempts to resolve the complete variation of a set of qualities into linear, independent composite traits
(Johnson, 2012). The PCA biplot that took into account PC1 and PC2 at the same time, five groups of attributes were found in our data set. PCA biplot showed the PC1 exhibited about 25% of the total variability and explained principally by panicle length, panicle width, panicle weight, dry fodder yield, single plant yield. The second principal component accounted for about 13% and explained principally by days to fifty percent flowering, number of leaves, days to maturity. The third principal components explained about 11% of total variability and are contributed by hundred seed weight, plant height, stem girth and flag leaf length. The fourth principal component accounted for 9% and loaded partially on stem girth, flag leaf width and flag leaf width and the fifth principal components account for around 8% mainly through plant height and flag leaf width.
Accessions are dispersed extensively over each quadrant (Fig 4). In this data set, in our PCA biplot 3 groups of traits were identified considering both PC1 and PC2 (Fig 5). Days to fifty per cent flowering, flag leaf length, panicle weight, flag leaf width, stem girth, days to maturity clustered in group I. Then panicle length, hundred seed weight, Dry fodder yield, single plant yield, panicle width were brought under the group II and plant height and number of leaves fell in group II. Interestingly through the PCA biplot it was revealed that the group I traits contributors to the PC2 significantly correlated with the genotypes of cluster I, II and IV. Whereas the genotypes of clusters I, III and IV were linked to the features of group II that was also a contributor to the PC1 and the genotypes of clusters IV were shown to be most strongly connected with the contribution of group III that belongs to the PC 1. The genotypes ICSP 28 MFR, MR22/1, ICSV 209, YT 82164, M 26405, MR 119C, A 3822, CSV 4, ICSV 61, ICSV 202, MR 119C, MR 87, AS 512, A 3822 and B 35 were selected from the PCA analysis as good performer in the diversity analysis. PCA, is a statistical technique used to identify the most important variables or features in a given dataset. The insights gained from the studies conducted by
Malik et al., (2011) could prove beneficial when choosing parental genotypes for breeding methods aimed at producing superior genotypes with desirable traits.