Principal component analysis (PCA)
PCA was first introduced by Karl Pearson (1901) to reduce data dimensionality and identify patterns in multivariate data. In the current study, the dataset was divided into 13 components (Table 2 and Fig 2a), corresponding to various parameters. Out of the 13 principal components, the first three PCs (PC1, PC2 and PC3) occupied most of the variability (75.5%,). According to Kaiser’s rule for PCA, PC’s with eigenvalues ≥ are considered significant and account for most of the variability. Similar results were reported by
Chakraborty et al., (2021), where the first three PCs accounted for the maximum variability indicating genetic consistency. PC1 had the highest eigenvalue of 6.946, contributing 53.4% of the total variance, having major contribution from the traits, main shoot length, plant height, number of siliquae per plant, siliquae length, test weight, number of seeds per siliqua, number of secondary branches, biological yield and number of primary branches. PC2 with an eigenvalue of 1.867, contributing 14.4% of the variance, for which harvest index, economic yield per plant, germination percentage and days to 50% flowering were the major contributors. Similar traits were studied to contribute more to PC2 in the studies by
Chakraborty et al., (2021) and
Saikrishna et al. (2021). PC3 with an eigenvalue of 1.001, accounting for 7.7% of the variance, primarily represented by germination percentage, days to 50% flowering, biological yield, economic yield per plant and number of primary branches. PC4 to PC13 had eigenvalues less than 1 and accounted for the remaining 24.5% of the total variability. The scree plot in principal component analysis displays the percentage of total variance explained by each principal component (Fig 2c). Among the thirteen PC’s, PC1 had the highest contribution (53.4%) to the total variance (Fig 2e) followed by PC2 contributing 14.4% to the variance, PC3 accounting for 7.7% of the variance and increasing the cumulative variance to 75.5% (Fig 2c).
PCA-biplot analysis
The PCA biplot (Fig 1a and b) displayed the relationships among 13 traits and the distribution of 53 genotypes defined by PC1 and PC2 (Fig 2,b,d, e and f). The traits, days to 50% flowering strongly influenced PC1 in the negative direction, while economic yield and biological yield contributed positively along PC1 and PC2 and were strongly correlated. Harvest index and germination percentage indicated negative strong correlation. The genotypes Pusa Bold, RH 119, 53-14-24D were closely located on the biplot indicating similar traits. The genotypes, 53-14-24D, Durga Mani, 55-14-101 were located close to the economic yield per plant indicating high yield.
D2 analysis
D
2 analysis was carried out according to methods given by
Mahalanobis (1936), which was later revised by
Rao (1952). All the 53 genotypes were clustered into seven different clusters by using Tocher method (Table 3 and Fig 3). Maximum number of genotypes were present in Cluster-I, which contained 17 genotypes followed by Cluster-III (16), Cluster-IV (12). Least number of genotypes were present in Cluster-II (5), Cluster-III (1), Cluster-V, Cluster-VI (1), Cluster-VII (1) and Cluster-IV (1). The clustering pattern revealed the presence of enough divergence to enable formation of individual clusters.
Contribution of various traits towards total genetic divergence
The contribution of various traits towards total genetic divergence (Table 4a) revealed that the trait germination percentage (85.85%) contributed the highest for divergence, followed by biological yield (2.47%) followed by harvest index (1.96%), economical yield (1.67%), number of siliquae per plant (1.89%), siliqua length (1.52%), number of seeds per siliqua (1.09%), thousand seed weight (1.02%), main shoot length (0.94%), number of primary branches (0.65%), plant height (0.51%), days to 50% flowering (0.29%) and number of secondary branches (0.15%).
Kumari and Kumari (2018) reported similar values for number of primary branches.
Cluster means
The range of mean values among different clusters was recorded for different characters in (Table 5). Cluster-I had the highest mean value for the trait days to 50% flowering (68.16). Cluster-II had the highest mean values for main shoot length (179.98), thousand seed weight (3.84) and biological yield (272.81). Cluster-III exhibited the lowest mean value for economic yield (45.65). Cluster-IV showed lowest mean for number of siliquae per plant (8.82).
Cluster-V had the highest plant height (189.04), number of primary branches (6.53) and economic yield per plant (50.89). Cluster-VI had the highest mean value for siliqua length (4.67). Cluster VII exhibited the highest value for germination percentage (34.61), number of secondary branches (7.07), number of seeds per siliqua (256.6), harvest index (19.45). Similarly,
Reddy et al. (2025) reported the highest cluster mean for plant height in Cluster VI, while Cluster I exhibited the highest mean value for the number of siliqua length. In contrast, Cluster II recorded the maximum mean for the number of seeds per siliqua. For 1000-seed weight, the highest mean was noted in Cluster VI. It’s indicating that the genetic potential for that trait is higher in the genotypes of this cluster compared to others.
Inter and intra-cluster distances
The inter and intra-cluster distances revealed the extent of degree of genetic diversity among clusters (Table 6 and Fig 4). The maximum inter-cluster distance was found between clusters VI and VII (36.09) (
Priyanka et al. 2021) followed by that between II and VI (31.93), III and IV (29.44), II and III (25.29) and IV and VII (23.17)
Rout et al. 2019). whereas, minimum inter-cluster distance was found between clusters II and VII (4.98), followed by I and V (5.22), I and IV (6.66), III and IV (7.33) and III and VI (7.47). The intra-cluster distance observed were 3.28 (cluster I), 2.81 (cluster II), 4.14 (cluster III ), 3.5 (cluster IV). The clusters V, VI and VII contained single genotypes and therefore, their intra-cluster distances were zero. The inter- and intra-cluster distances revealed considerable genetic diversity among the genotypes, with the highest inter-cluster distance observed between Clusters VI and VII (36.09), indicating their suitability as potential parents for hybridization to exploit heterosis. The highest intra-cluster distance was recorded in Cluster III (4.14), suggesting greater genetic variability within this cluster. Current findings are in line with
Reddy et al. (2025).